WorldWideScience

Sample records for gene family annotation

  1. JGI Plant Genomics Gene Annotation Pipeline

    Energy Technology Data Exchange (ETDEWEB)

    Shu, Shengqiang; Rokhsar, Dan; Goodstein, David; Hayes, David; Mitros, Therese

    2014-07-14

    Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward this aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.

  2. Ixodes scapularis tick serine proteinase inhibitor (serpin gene family; annotation and transcriptional analysis

    Directory of Open Access Journals (Sweden)

    Chalaire Katelyn C

    2009-05-01

    Full Text Available Abstract Background Serine proteinase inhibitors (Serpins are a large superfamily of structurally related, but functionally diverse proteins that control essential proteolytic pathways in most branches of life. Given their importance in the biology of many organisms, the concept that ticks might utilize serpins to evade host defenses and immunizing against or disrupting their functions as targets for tick control is an appealing option. Results A sequence homology search strategy has allowed us to identify at least 45 tick serpin genes in the Ixodes scapularis genome that are structurally segregated into 32 intronless and 13 intron-containing genes. Nine of the intron-containing serpins occur in a cluster of 11 genes that span 170 kb of DNA sequence. Based on consensus amino acid residues in the reactive center loop (RCL and signal peptide scanning, 93% are putatively inhibitory while 82% are putatively extracellular. Among the 11 different amino acid residues that are predicted at the P1 sites, 16 sequences possess basic amino acid (R/K residues. Temporal and spatial expression analyses revealed that 40 of the 45 serpins are differentially expressed in salivary glands (SG and/or midguts (MG of unfed and partially fed ticks. Ten of the 38 serpin genes were expressed from six to 24 hrs of feeding while six and fives genes each are predominantly or exclusively expressed in either MG and SG respectively. Conclusion Given the diversity among tick species, sizes of tick serpin families are likely to be variable. However this study provides insight on the potential sizes of serpin protein families in ticks. Ticks must overcome inflammation, complement activation and blood coagulation to complete feeding. Since these pathways are regulated by serpins that have basic residues at their P1 sites, we speculate that I. scapularis may utilize some of the serpins reported in this study to manipulate host defense. We have discussed our data in the context of

  3. Genome-wide annotation of the soybean WRKY family and functional characterization of genes involved in response to Phakopsora pachyrhizi infection.

    Science.gov (United States)

    Bencke-Malato, Marta; Cabreira, Caroline; Wiebke-Strohm, Beatriz; Bücker-Neto, Lauro; Mancini, Estefania; Osorio, Marina B; Homrich, Milena S; Turchetto-Zolet, Andreia Carina; De Carvalho, Mayra C C G; Stolf, Renata; Weber, Ricardo L M; Westergaard, Gastón; Castagnaro, Atílio P; Abdelnoor, Ricardo V; Marcelino-Guimarães, Francismar C; Margis-Pinheiro, Márcia; Bodanese-Zanettini, Maria Helena

    2014-09-10

    Many previous studies have shown that soybean WRKY transcription factors are involved in the plant response to biotic and abiotic stresses. Phakopsora pachyrhizi is the causal agent of Asian Soybean Rust, one of the most important soybean diseases. There are evidences that WRKYs are involved in the resistance of some soybean genotypes against that fungus. The number of WRKY genes already annotated in soybean genome was underrepresented. In the present study, a genome-wide annotation of the soybean WRKY family was carried out and members involved in the response to P. pachyrhizi were identified. As a result of a soybean genomic databases search, 182 WRKY-encoding genes were annotated and 33 putative pseudogenes identified. Genes involved in the response to P. pachyrhizi infection were identified using superSAGE, RNA-Seq of microdissected lesions and microarray experiments. Seventy-five genes were differentially expressed during fungal infection. The expression of eight WRKY genes was validated by RT-qPCR. The expression of these genes in a resistant genotype was earlier and/or stronger compared with a susceptible genotype in response to P. pachyrhizi infection. Soybean somatic embryos were transformed in order to overexpress or silence WRKY genes. Embryos overexpressing a WRKY gene were obtained, but they were unable to convert into plants. When infected with P. pachyrhizi, the leaves of the silenced transgenic line showed a higher number of lesions than the wild-type plants. The present study reports a genome-wide annotation of soybean WRKY family. The participation of some members in response to P. pachyrhizi infection was demonstrated. The results contribute to the elucidation of gene function and suggest the manipulation of WRKYs as a strategy to increase fungal resistance in soybean plants.

  4. Functional Annotation, Genome Organization and Phylogeny of the Grapevine (Vitis vinifera Terpene Synthase Gene Family Based on Genome Assembly, FLcDNA Cloning, and Enzyme Assays

    Directory of Open Access Journals (Sweden)

    Toub Omid

    2010-10-01

    Full Text Available Abstract Background Terpenoids are among the most important constituents of grape flavour and wine bouquet, and serve as useful metabolite markers in viticulture and enology. Based on the initial 8-fold sequencing of a nearly homozygous Pinot noir inbred line, 89 putative terpenoid synthase genes (VvTPS were predicted by in silico analysis of the grapevine (Vitis vinifera genome assembly 1. The finding of this very large VvTPS family, combined with the importance of terpenoid metabolism for the organoleptic properties of grapevine berries and finished wines, prompted a detailed examination of this gene family at the genomic level as well as an investigation into VvTPS biochemical functions. Results We present findings from the analysis of the up-dated 12-fold sequencing and assembly of the grapevine genome that place the number of predicted VvTPS genes at 69 putatively functional VvTPS, 20 partial VvTPS, and 63 VvTPS probable pseudogenes. Gene discovery and annotation included information about gene architecture and chromosomal location. A dense cluster of 45 VvTPS is localized on chromosome 18. Extensive FLcDNA cloning, gene synthesis, and protein expression enabled functional characterization of 39 VvTPS; this is the largest number of functionally characterized TPS for any species reported to date. Of these enzymes, 23 have unique functions and/or phylogenetic locations within the plant TPS gene family. Phylogenetic analyses of the TPS gene family showed that while most VvTPS form species-specific gene clusters, there are several examples of gene orthology with TPS of other plant species, representing perhaps more ancient VvTPS, which have maintained functions independent of speciation. Conclusions The highly expanded VvTPS gene family underpins the prominence of terpenoid metabolism in grapevine. We provide a detailed experimental functional annotation of 39 members of this important gene family in grapevine and comprehensive information

  5. COGNATE: comparative gene annotation characterizer.

    Science.gov (United States)

    Wilbrandt, Jeanne; Misof, Bernhard; Niehuis, Oliver

    2017-07-17

    The comparison of gene and genome structures across species has the potential to reveal major trends of genome evolution. However, such a comparative approach is currently hampered by a lack of standardization (e.g., Elliott TA, Gregory TR, Philos Trans Royal Soc B: Biol Sci 370:20140331, 2015). For example, testing the hypothesis that the total amount of coding sequences is a reliable measure of potential proteome diversity (Wang M, Kurland CG, Caetano-Anollés G, PNAS 108:11954, 2011) requires the application of standardized definitions of coding sequence and genes to create both comparable and comprehensive data sets and corresponding summary statistics. However, such standard definitions either do not exist or are not consistently applied. These circumstances call for a standard at the descriptive level using a minimum of parameters as well as an undeviating use of standardized terms, and for software that infers the required data under these strict definitions. The acquisition of a comprehensive, descriptive, and standardized set of parameters and summary statistics for genome publications and further analyses can thus greatly benefit from the availability of an easy to use standard tool. We developed a new open-source command-line tool, COGNATE (Comparative Gene Annotation Characterizer), which uses a given genome assembly and its annotation of protein-coding genes for a detailed description of the respective gene and genome structure parameters. Additionally, we revised the standard definitions of gene and genome structures and provide the definitions used by COGNATE as a working draft suggestion for further reference. Complete parameter lists and summary statistics are inferred using this set of definitions to allow down-stream analyses and to provide an overview of the genome and gene repertoire characteristics. COGNATE is written in Perl and freely available at the ZFMK homepage ( https://www.zfmk.de/en/COGNATE ) and on github ( https

  6. Annotation, Phylogeny and Expression Analysis of the Nuclear Factor Y Gene Families in Common Bean (Phaseolus vulgaris

    Directory of Open Access Journals (Sweden)

    Carolina eRípodas

    2015-01-01

    Full Text Available In the past decade, plant nuclear factor Y (NF-Y genes have gained major interest due to their roles in many biological processes in plant development or adaptation to environmental conditions, particularly in the root nodule symbiosis established between legume plants and nitrogen fixing bacteria. NF-Ys are heterotrimeric transcriptional complexes composed of three subunits, NF-YA, NF-YB and NF-YC, which bind with high affinity and specificity to the CCAAT box, a cis element present in many eukaryotic promoters. In plants, NF-Y subunits consist of gene families with about ten members each. In this study, we have identified and characterized the NF-Y gene families of common bean (Phaseolus vulgaris, a grain legume of worldwide economical importance and the main source of dietary protein of developing countries. Expression analysis showed that some members of each family are up-regulated at early or late stages of the nitrogen fixing symbiotic interaction with its partner Rhizobium etli. We also showed that some genes are differentially accumulated in response to inoculation with high or less efficient R. etli strains, constituting excellent candidates to participate in the strain-specific response during symbiosis. Genes of the NF-YA family exhibit a highly structured intron-exon organization. Moreover, this family is characterized by the presence of upstream ORFs when introns in the 5' UTR are retained and miRNA target sites in their 3' UTR, suggesting that these genes might be subjected to a complex post-transcriptional regulation. Multiple protein alignments indicated the presence of highly conserved domains in each of the NF-Y families, presumably involved in subunit interactions and DNA binding. The analysis presented here constitutes a starting point to understand the regulation and biological function of individual members of the NF-Y families in different developmental processes in this grain legume.

  7. Discovering gene annotations in biomedical text databases

    Directory of Open Access Journals (Sweden)

    Ozsoyoglu Gultekin

    2008-03-01

    Full Text Available Abstract Background Genes and gene products are frequently annotated with Gene Ontology concepts based on the evidence provided in genomics articles. Manually locating and curating information about a genomic entity from the biomedical literature requires vast amounts of human effort. Hence, there is clearly a need forautomated computational tools to annotate the genes and gene products with Gene Ontology concepts by computationally capturing the related knowledge embedded in textual data. Results In this article, we present an automated genomic entity annotation system, GEANN, which extracts information about the characteristics of genes and gene products in article abstracts from PubMed, and translates the discoveredknowledge into Gene Ontology (GO concepts, a widely-used standardized vocabulary of genomic traits. GEANN utilizes textual "extraction patterns", and a semantic matching framework to locate phrases matching to a pattern and produce Gene Ontology annotations for genes and gene products. In our experiments, GEANN has reached to the precision level of 78% at therecall level of 61%. On a select set of Gene Ontology concepts, GEANN either outperforms or is comparable to two other automated annotation studies. Use of WordNet for semantic pattern matching improves the precision and recall by 24% and 15%, respectively, and the improvement due to semantic pattern matching becomes more apparent as the Gene Ontology terms become more general. Conclusion GEANN is useful for two distinct purposes: (i automating the annotation of genomic entities with Gene Ontology concepts, and (ii providing existing annotations with additional "evidence articles" from the literature. The use of textual extraction patterns that are constructed based on the existing annotations achieve high precision. The semantic pattern matching framework provides a more flexible pattern matching scheme with respect to "exactmatching" with the advantage of locating approximate

  8. GoGene: gene annotation in the fast lane.

    Science.gov (United States)

    Plake, Conrad; Royer, Loic; Winnenburg, Rainer; Hakenberg, Jörg; Schroeder, Michael

    2009-07-01

    High-throughput screens such as microarrays and RNAi screens produce huge amounts of data. They typically result in hundreds of genes, which are often further explored and clustered via enriched GeneOntology terms. The strength of such analyses is that they build on high-quality manual annotations provided with the GeneOntology. However, the weakness is that annotations are restricted to process, function and location and that they do not cover all known genes in model organisms. GoGene addresses this weakness by complementing high-quality manual annotation with high-throughput text mining extracting co-occurrences of genes and ontology terms from literature. GoGene contains over 4,000,000 associations between genes and gene-related terms for 10 model organisms extracted from more than 18,000,000 PubMed entries. It does not cover only process, function and location of genes, but also biomedical categories such as diseases, compounds, techniques and mutations. By bringing it all together, GoGene provides the most recent and most complete facts about genes and can rank them according to novelty and importance. GoGene accepts keywords, gene lists, gene sequences and protein sequences as input and supports search for genes in PubMed, EntrezGene and via BLAST. Since all associations of genes to terms are supported by evidence in the literature, the results are transparent and can be verified by the user. GoGene is available at http://gopubmed.org/gogene.

  9. Combining gene prediction methods to improve metagenomic gene annotation

    Directory of Open Access Journals (Sweden)

    Rosen Gail L

    2011-01-01

    Full Text Available Abstract Background Traditional gene annotation methods rely on characteristics that may not be available in short reads generated from next generation technology, resulting in suboptimal performance for metagenomic (environmental samples. Therefore, in recent years, new programs have been developed that optimize performance on short reads. In this work, we benchmark three metagenomic gene prediction programs and combine their predictions to improve metagenomic read gene annotation. Results We not only analyze the programs' performance at different read-lengths like similar studies, but also separate different types of reads, including intra- and intergenic regions, for analysis. The main deficiencies are in the algorithms' ability to predict non-coding regions and gene edges, resulting in more false-positives and false-negatives than desired. In fact, the specificities of the algorithms are notably worse than the sensitivities. By combining the programs' predictions, we show significant improvement in specificity at minimal cost to sensitivity, resulting in 4% improvement in accuracy for 100 bp reads with ~1% improvement in accuracy for 200 bp reads and above. To correctly annotate the start and stop of the genes, we find that a consensus of all the predictors performs best for shorter read lengths while a unanimous agreement is better for longer read lengths, boosting annotation accuracy by 1-8%. We also demonstrate use of the classifier combinations on a real dataset. Conclusions To optimize the performance for both prediction and annotation accuracies, we conclude that the consensus of all methods (or a majority vote is the best for reads 400 bp and shorter, while using the intersection of GeneMark and Orphelia predictions is the best for reads 500 bp and longer. We demonstrate that most methods predict over 80% coding (including partially coding reads on a real human gut sample sequenced by Illumina technology.

  10. Rfam: annotating families of non-coding RNA sequences.

    Science.gov (United States)

    Daub, Jennifer; Eberhardt, Ruth Y; Tate, John G; Burge, Sarah W

    2015-01-01

    The primary task of the Rfam database is to collate experimentally validated noncoding RNA (ncRNA) sequences from the published literature and facilitate the prediction and annotation of new homologues in novel nucleotide sequences. We group homologous ncRNA sequences into "families" and related families are further grouped into "clans." We collate and manually curate data cross-references for these families from other databases and external resources. Our Web site offers researchers a simple interface to Rfam and provides tools with which to annotate their own sequences using our covariance models (CMs), through our tools for searching, browsing, and downloading information on Rfam families. In this chapter, we will work through examples of annotating a query sequence, collating family information, and searching for data.

  11. Large-scale inference of gene function through phylogenetic annotation of Gene Ontology terms: case study of the apoptosis and autophagy cellular processes.

    Science.gov (United States)

    Feuermann, Marc; Gaudet, Pascale; Mi, Huaiyu; Lewis, Suzanna E; Thomas, Paul D

    2016-01-01

    We previously reported a paradigm for large-scale phylogenomic analysis of gene families that takes advantage of the large corpus of experimentally supported Gene Ontology (GO) annotations. This 'GO Phylogenetic Annotation' approach integrates GO annotations from evolutionarily related genes across ∼100 different organisms in the context of a gene family tree, in which curators build an explicit model of the evolution of gene functions. GO Phylogenetic Annotation models the gain and loss of functions in a gene family tree, which is used to infer the functions of uncharacterized (or incompletely characterized) gene products, even for human proteins that are relatively well studied. Here, we report our results from applying this paradigm to two well-characterized cellular processes, apoptosis and autophagy. This revealed several important observations with respect to GO annotations and how they can be used for function inference. Notably, we applied only a small fraction of the experimentally supported GO annotations to infer function in other family members. The majority of other annotations describe indirect effects, phenotypes or results from high throughput experiments. In addition, we show here how feedback from phylogenetic annotation leads to significant improvements in the PANTHER trees, the GO annotations and GO itself. Thus GO phylogenetic annotation both increases the quantity and improves the accuracy of the GO annotations provided to the research community. We expect these phylogenetically based annotations to be of broad use in gene enrichment analysis as well as other applications of GO annotations.Database URL: http://amigo.geneontology.org/amigo. © The Author(s) 2016. Published by Oxford University Press.

  12. Gene Ontology annotation of the rice blast fungus, Magnaporthe oryzae

    Directory of Open Access Journals (Sweden)

    Deng Jixin

    2009-02-01

    Full Text Available Abstract Background Magnaporthe oryzae, the causal agent of blast disease of rice, is the most destructive disease of rice worldwide. The genome of this fungal pathogen has been sequenced and an automated annotation has recently been updated to Version 6 http://www.broad.mit.edu/annotation/genome/magnaporthe_grisea/MultiDownloads.html. However, a comprehensive manual curation remains to be performed. Gene Ontology (GO annotation is a valuable means of assigning functional information using standardized vocabulary. We report an overview of the GO annotation for Version 5 of M. oryzae genome assembly. Methods A similarity-based (i.e., computational GO annotation with manual review was conducted, which was then integrated with a literature-based GO annotation with computational assistance. For similarity-based GO annotation a stringent reciprocal best hits method was used to identify similarity between predicted proteins of M. oryzae and GO proteins from multiple organisms with published associations to GO terms. Significant alignment pairs were manually reviewed. Functional assignments were further cross-validated with manually reviewed data, conserved domains, or data determined by wet lab experiments. Additionally, biological appropriateness of the functional assignments was manually checked. Results In total, 6,286 proteins received GO term assignment via the homology-based annotation, including 2,870 hypothetical proteins. Literature-based experimental evidence, such as microarray, MPSS, T-DNA insertion mutation, or gene knockout mutation, resulted in 2,810 proteins being annotated with GO terms. Of these, 1,673 proteins were annotated with new terms developed for Plant-Associated Microbe Gene Ontology (PAMGO. In addition, 67 experiment-determined secreted proteins were annotated with PAMGO terms. Integration of the two data sets resulted in 7,412 proteins (57% being annotated with 1,957 distinct and specific GO terms. Unannotated proteins

  13. The GATO gene annotation tool for research laboratories

    Directory of Open Access Journals (Sweden)

    A. Fujita

    2005-11-01

    Full Text Available Large-scale genome projects have generated a rapidly increasing number of DNA sequences. Therefore, development of computational methods to rapidly analyze these sequences is essential for progress in genomic research. Here we present an automatic annotation system for preliminary analysis of DNA sequences. The gene annotation tool (GATO is a Bioinformatics pipeline designed to facilitate routine functional annotation and easy access to annotated genes. It was designed in view of the frequent need of genomic researchers to access data pertaining to a common set of genes. In the GATO system, annotation is generated by querying some of the Web-accessible resources and the information is stored in a local database, which keeps a record of all previous annotation results. GATO may be accessed from everywhere through the internet or may be run locally if a large number of sequences are going to be annotated. It is implemented in PHP and Perl and may be run on any suitable Web server. Usually, installation and application of annotation systems require experience and are time consuming, but GATO is simple and practical, allowing anyone with basic skills in informatics to access it without any special training. GATO can be downloaded at [http://mariwork.iq.usp.br/gato/]. Minimum computer free space required is 2 MB.

  14. A Resource of Quantitative Functional Annotation for Homo sapiens Genes.

    Science.gov (United States)

    Taşan, Murat; Drabkin, Harold J; Beaver, John E; Chua, Hon Nian; Dunham, Julie; Tian, Weidong; Blake, Judith A; Roth, Frederick P

    2012-02-01

    The body of human genomic and proteomic evidence continues to grow at ever-increasing rates, while annotation efforts struggle to keep pace. A surprisingly small fraction of human genes have clear, documented associations with specific functions, and new functions continue to be found for characterized genes. Here we assembled an integrated collection of diverse genomic and proteomic data for 21,341 human genes and make quantitative associations of each to 4333 Gene Ontology terms. We combined guilt-by-profiling and guilt-by-association approaches to exploit features unique to the data types. Performance was evaluated by cross-validation, prospective validation, and by manual evaluation with the biological literature. Functional-linkage networks were also constructed, and their utility was demonstrated by identifying candidate genes related to a glioma FLN using a seed network from genome-wide association studies. Our annotations are presented-alongside existing validated annotations-in a publicly accessible and searchable web interface.

  15. Gene calling and bacterial genome annotation with BG7.

    Science.gov (United States)

    Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo

    2015-01-01

    New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).

  16. Gene coexpression network analysis as a source of functional annotation for rice genes.

    Directory of Open Access Journals (Sweden)

    Kevin L Childs

    Full Text Available With the existence of large publicly available plant gene expression data sets, many groups have undertaken data analyses to construct gene coexpression networks and functionally annotate genes. Often, a large compendium of unrelated or condition-independent expression data is used to construct gene networks. Condition-dependent expression experiments consisting of well-defined conditions/treatments have also been used to create coexpression networks to help examine particular biological processes. Gene networks derived from either condition-dependent or condition-independent data can be difficult to interpret if a large number of genes and connections are present. However, algorithms exist to identify modules of highly connected and biologically relevant genes within coexpression networks. In this study, we have used publicly available rice (Oryza sativa gene expression data to create gene coexpression networks using both condition-dependent and condition-independent data and have identified gene modules within these networks using the Weighted Gene Coexpression Network Analysis method. We compared the number of genes assigned to modules and the biological interpretability of gene coexpression modules to assess the utility of condition-dependent and condition-independent gene coexpression networks. For the purpose of providing functional annotation to rice genes, we found that gene modules identified by coexpression analysis of condition-dependent gene expression experiments to be more useful than gene modules identified by analysis of a condition-independent data set. We have incorporated our results into the MSU Rice Genome Annotation Project database as additional expression-based annotation for 13,537 genes, 2,980 of which lack a functional annotation description. These results provide two new types of functional annotation for our database. Genes in modules are now associated with groups of genes that constitute a collective functional

  17. Gene annotation from scientific literature using mappings between keyword systems.

    Science.gov (United States)

    Pérez, Antonio J; Perez-Iratxeta, Carolina; Bork, Peer; Thode, Guillermo; Andrade, Miguel A

    2004-09-01

    The description of genes in databases by keywords helps the non-specialist to quickly grasp the properties of a gene and increases the efficiency of computational tools that are applied to gene data (e.g. searching a gene database for sequences related to a particular biological process). However, the association of keywords to genes or protein sequences is a difficult process that ultimately implies examination of the literature related to a gene. To support this task, we present a procedure to derive keywords from the set of scientific abstracts related to a gene. Our system is based on the automated extraction of mappings between related terms from different databases using a model of fuzzy associations that can be applied with all generality to any pair of linked databases. We tested the system by annotating genes of the SWISS-PROT database with keywords derived from the abstracts linked to their entries (stored in the MEDLINE database of scientific references). The performance of the annotation procedure was much better for SWISS-PROT keywords (recall of 47%, precision of 68%) than for Gene Ontology terms (recall of 8%, precision of 67%). The algorithm can be publicly accessed and used for the annotation of sequences through a web server at http://www.bork.embl.de/kat

  18. Evaluating Functional Annotations of Enzymes Using the Gene Ontology.

    Science.gov (United States)

    Holliday, Gemma L; Davidson, Rebecca; Akiva, Eyal; Babbitt, Patricia C

    2017-01-01

    The Gene Ontology (GO) (Ashburner et al., Nat Genet 25(1):25-29, 2000) is a powerful tool in the informatics arsenal of methods for evaluating annotations in a protein dataset. From identifying the nearest well annotated homologue of a protein of interest to predicting where misannotation has occurred to knowing how confident you can be in the annotations assigned to those proteins is critical. In this chapter we explore what makes an enzyme unique and how we can use GO to infer aspects of protein function based on sequence similarity. These can range from identification of misannotation or other errors in a predicted function to accurate function prediction for an enzyme of entirely unknown function. Although GO annotation applies to any gene products, we focus here a describing our approach for hierarchical classification of enzymes in the Structure-Function Linkage Database (SFLD) (Akiva et al., Nucleic Acids Res 42(Database issue):D521-530, 2014) as a guide for informed utilisation of annotation transfer based on GO terms.

  19. Algal Functional Annotation Tool: a web-based analysis suite to functionally interpret large gene lists using integrated annotation and expression data

    Directory of Open Access Journals (Sweden)

    Merchant Sabeeha S

    2011-07-01

    Full Text Available Abstract Background Progress in genome sequencing is proceeding at an exponential pace, and several new algal genomes are becoming available every year. One of the challenges facing the community is the association of protein sequences encoded in the genomes with biological function. While most genome assembly projects generate annotations for predicted protein sequences, they are usually limited and integrate functional terms from a limited number of databases. Another challenge is the use of annotations to interpret large lists of 'interesting' genes generated by genome-scale datasets. Previously, these gene lists had to be analyzed across several independent biological databases, often on a gene-by-gene basis. In contrast, several annotation databases, such as DAVID, integrate data from multiple functional databases and reveal underlying biological themes of large gene lists. While several such databases have been constructed for animals, none is currently available for the study of algae. Due to renewed interest in algae as potential sources of biofuels and the emergence of multiple algal genome sequences, a significant need has arisen for such a database to process the growing compendiums of algal genomic data. Description The Algal Functional Annotation Tool is a web-based comprehensive analysis suite integrating annotation data from several pathway, ontology, and protein family databases. The current version provides annotation for the model alga Chlamydomonas reinhardtii, and in the future will include additional genomes. The site allows users to interpret large gene lists by identifying associated functional terms, and their enrichment. Additionally, expression data for several experimental conditions were compiled and analyzed to provide an expression-based enrichment search. A tool to search for functionally-related genes based on gene expression across these conditions is also provided. Other features include dynamic visualization of

  20. HMM-Based Gene Annotation Methods

    Energy Technology Data Exchange (ETDEWEB)

    Haussler, David; Hughey, Richard; Karplus, Keven

    1999-09-20

    Development of new statistical methods and computational tools to identify genes in human genomic DNA, and to provide clues to their functions by identifying features such as transcription factor binding sites, tissue, specific expression and splicing patterns, and remove homologies at the protein level with genes of known function.

  1. Construction of coffee transcriptome networks based on gene annotation semantics

    Directory of Open Access Journals (Sweden)

    Castillo Luis F.

    2012-12-01

    Full Text Available Gene annotation is a process that encompasses multiple approaches on the analysis of nucleic acids or protein sequences in order to assign structural and functional characteristics to gene models. When thousands of gene models are being described in an organism genome, construction and visualization of gene networks impose novel challenges in the understanding of complex expression patterns and the generation of new knowledge in genomics research. In order to take advantage of accumulated text data after conventional gene sequence analysis, this work applied semantics in combination with visualization tools to build transcriptome networks from a set of coffee gene annotations. A set of selected coffee transcriptome sequences, chosen by the quality of the sequence comparison reported by Basic Local Alignment Search Tool (BLAST and Interproscan, were filtered out by coverage, identity, length of the query, and e-values. Meanwhile, term descriptors for molecular biology and biochemistry were obtained along the Wordnet dictionary in order to construct a Resource Description Framework (RDF using Ruby scripts and Methontology to find associations between concepts. Relationships between sequence annotations and semantic concepts were graphically represented through a total of 6845 oriented vectors, which were reduced to 745 non-redundant associations. A large gene network connecting transcripts by way of relational concepts was created where detailed connections remain to be validated for biological significance based on current biochemical and genetics frameworks. Besides reusing text information in the generation of gene connections and for data mining purposes, this tool development opens the possibility to visualize complex and abundant transcriptome data, and triggers the formulation of new hypotheses in metabolic pathways analysis.

  2. Annotation and analysis of a large cuticular protein family with the R&R Consensus in Anopheles gambiae

    Directory of Open Access Journals (Sweden)

    He Ningjia

    2008-01-01

    Full Text Available Abstract Background The most abundant family of insect cuticular proteins, the CPR family, is recognized by the R&R Consensus, a domain of about 64 amino acids that binds to chitin and is present throughout arthropods. Several species have now been shown to have more than 100 CPR genes, inviting speculation as to the functional importance of this large number and diversity. Results We have identified 156 genes in Anopheles gambiae that code for putative cuticular proteins in this CPR family, over 1% of the total number of predicted genes in this species. Annotation was verified using several criteria including identification of TATA boxes, INRs, and DPEs plus support from proteomic and gene expression analyses. Two previously recognized CPR classes, RR-1 and RR-2, form separate, well-supported clades with the exception of a small set of genes with long branches whose relationships are poorly resolved. Several of these outliers have clear orthologs in other species. Although both clades are under purifying selection, the RR-1 variant of the R&R Consensus is evolving at twice the rate of the RR-2 variant and is structurally more labile. In contrast, the regions flanking the R&R Consensus have diversified in amino-acid composition to a much greater extent in RR-2 genes compared with RR-1 genes. Many genes are found in compact tandem arrays that may include similar or dissimilar genes but always include just one of the two classes. Tandem arrays of RR-2 genes frequently contain subsets of genes coding for highly similar proteins (sequence clusters. Properties of the proteins indicated that each cluster may serve a distinct function in the cuticle. Conclusion The complete annotation of this large gene family provides insight on the mechanisms of gene family evolution and clues about the need for so many CPR genes. These data also should assist annotation of other Anopheles genes.

  3. Protein Annotation from Protein Interaction Networks and Gene Ontology

    OpenAIRE

    Nguyen, Cao D.; Gardiner, Katheleen J.; Cios, Krzysztof J.

    2011-01-01

    We introduce a novel method for annotating protein function that combines Naïve Bayes and association rules, and takes advantage of the underlying topology in protein interaction networks and the structure of graphs in the Gene Ontology. We apply our method to proteins from the Human Protein Reference Database (HPRD) and show that, in comparison with other approaches, it predicts protein functions with significantly higher recall with no loss of precision. Specifically, it achieves 51% precis...

  4. Automated Eukaryotic Gene Structure Annotation Using EVidenceModeler and the Program to Assemble Spliced Alignments

    Energy Technology Data Exchange (ETDEWEB)

    Haas, B J; Salzberg, S L; Zhu, W; Pertea, M; Allen, J E; Orvis, J; White, O; Buell, C R; Wortman, J R

    2007-12-10

    EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation.

  5. Protein annotation from protein interaction networks and Gene Ontology.

    Science.gov (United States)

    Nguyen, Cao D; Gardiner, Katheleen J; Cios, Krzysztof J

    2011-10-01

    We introduce a novel method for annotating protein function that combines Naïve Bayes and association rules, and takes advantage of the underlying topology in protein interaction networks and the structure of graphs in the Gene Ontology. We apply our method to proteins from the Human Protein Reference Database (HPRD) and show that, in comparison with other approaches, it predicts protein functions with significantly higher recall with no loss of precision. Specifically, it achieves 51% precision and 60% recall versus 45% and 26% for Majority and 24% and 61% for χ²-statistics, respectively. Copyright © 2011 Elsevier Inc. All rights reserved.

  6. Automatic annotation of protein motif function with Gene Ontology terms

    Directory of Open Access Journals (Sweden)

    Gopalakrishnan Vanathi

    2004-09-01

    Full Text Available Abstract Background Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. Results This paperpresents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifsis viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association isfound to be a very useful feature. We take advantageof the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correctassociation. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. Conclusions In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about thefunctions of newly discovered candidate protein motifs.

  7. A robust data-driven approach for gene ontology annotation.

    Science.gov (United States)

    Li, Yanpeng; Yu, Hong

    2014-01-01

    Gene ontology (GO) and GO annotation are important resources for biological information management and knowledge discovery, but the speed of manual annotation became a major bottleneck of database curation. BioCreative IV GO annotation task aims to evaluate the performance of system that automatically assigns GO terms to genes based on the narrative sentences in biomedical literature. This article presents our work in this task as well as the experimental results after the competition. For the evidence sentence extraction subtask, we built a binary classifier to identify evidence sentences using reference distance estimator (RDE), a recently proposed semi-supervised learning method that learns new features from around 10 million unlabeled sentences, achieving an F1 of 19.3% in exact match and 32.5% in relaxed match. In the post-submission experiment, we obtained 22.1% and 35.7% F1 performance by incorporating bigram features in RDE learning. In both development and test sets, RDE-based method achieved over 20% relative improvement on F1 and AUC performance against classical supervised learning methods, e.g. support vector machine and logistic regression. For the GO term prediction subtask, we developed an information retrieval-based method to retrieve the GO term most relevant to each evidence sentence using a ranking function that combined cosine similarity and the frequency of GO terms in documents, and a filtering method based on high-level GO classes. The best performance of our submitted runs was 7.8% F1 and 22.2% hierarchy F1. We found that the incorporation of frequency information and hierarchy filtering substantially improved the performance. In the post-submission evaluation, we obtained a 10.6% F1 using a simpler setting. Overall, the experimental analysis showed our approaches were robust in both the two tasks. © The Author(s) 2014. Published by Oxford University Press.

  8. Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression.

    Science.gov (United States)

    Arnaiz, Olivier; Van Dijk, Erwin; Bétermier, Mireille; Lhuillier-Akakpo, Maoussi; de Vanssay, Augustin; Duharcourt, Sandra; Sallet, Erika; Gouzy, Jérôme; Sperling, Linda

    2017-06-26

    The 15 sibling species of the Paramecium aurelia cryptic species complex emerged after a whole genome duplication that occurred tens of millions of years ago. Given extensive knowledge of the genetics and epigenetics of Paramecium acquired over the last century, this species complex offers a uniquely powerful system to investigate the consequences of whole genome duplication in a unicellular eukaryote as well as the genetic and epigenetic mechanisms that drive speciation. High quality Paramecium gene models are important for research using this system. The major aim of the work reported here was to build an improved gene annotation pipeline for the Paramecium lineage. We generated oriented RNA-Seq transcriptome data across the sexual process of autogamy for the model species Paramecium tetraurelia. We determined, for the first time in a ciliate, candidate P. tetraurelia transcription start sites using an adapted Cap-Seq protocol. We developed TrUC, multi-threaded Perl software that in conjunction with TopHat mapping of RNA-Seq data to a reference genome, predicts transcription units for the annotation pipeline. We used EuGene software to combine annotation evidence. The high quality gene structural annotations obtained for P. tetraurelia were used as evidence to improve published annotations for 3 other Paramecium species. The RNA-Seq data were also used for differential gene expression analysis, providing a gene expression atlas that is more sensitive than the previously established microarray resource. We have developed a gene annotation pipeline tailored for the compact genomes and tiny introns of Paramecium species. A novel component of this pipeline, TrUC, predicts transcription units using Cap-Seq and oriented RNA-Seq data. TrUC could prove useful beyond Paramecium, especially in the case of high gene density. Accurate predictions of 3' and 5' UTR will be particularly valuable for studies of gene expression (e.g. nucleosome positioning, identification of cis

  9. Annotating the Function of the Human Genome with Gene Ontology and Disease Ontology.

    Science.gov (United States)

    Hu, Yang; Zhou, Wenyang; Ren, Jun; Dong, Lixiang; Wang, Yadong; Jin, Shuilin; Cheng, Liang

    2016-01-01

    Increasing evidences indicated that function annotation of human genome in molecular level and phenotype level is very important for systematic analysis of genes. In this study, we presented a framework named Gene2Function to annotate Gene Reference into Functions (GeneRIFs), in which each functional description of GeneRIFs could be annotated by a text mining tool Open Biomedical Annotator (OBA), and each Entrez gene could be mapped to Human Genome Organisation Gene Nomenclature Committee (HGNC) gene symbol. After annotating all the records about human genes of GeneRIFs, 288,869 associations between 13,148 mRNAs and 7,182 terms, 9,496 associations between 948 microRNAs and 533 terms, and 901 associations between 139 long noncoding RNAs (lncRNAs) and 297 terms were obtained as a comprehensive annotation resource of human genome. High consistency of term frequency of individual gene (Pearson correlation = 0.6401, p = 2.2e - 16) and gene frequency of individual term (Pearson correlation = 0.1298, p = 3.686e - 14) in GeneRIFs and GOA shows our annotation resource is very reliable.

  10. The duplicated genes database: identification and functional annotation of co-localised duplicated genes across genomes.

    Directory of Open Access Journals (Sweden)

    Marion Ouedraogo

    Full Text Available BACKGROUND: There has been a surge in studies linking genome structure and gene expression, with special focus on duplicated genes. Although initially duplicated from the same sequence, duplicated genes can diverge strongly over evolution and take on different functions or regulated expression. However, information on the function and expression of duplicated genes remains sparse. Identifying groups of duplicated genes in different genomes and characterizing their expression and function would therefore be of great interest to the research community. The 'Duplicated Genes Database' (DGD was developed for this purpose. METHODOLOGY: Nine species were included in the DGD. For each species, BLAST analyses were conducted on peptide sequences corresponding to the genes mapped on a same chromosome. Groups of duplicated genes were defined based on these pairwise BLAST comparisons and the genomic location of the genes. For each group, Pearson correlations between gene expression data and semantic similarities between functional GO annotations were also computed when the relevant information was available. CONCLUSIONS: The Duplicated Gene Database provides a list of co-localised and duplicated genes for several species with the available gene co-expression level and semantic similarity value of functional annotation. Adding these data to the groups of duplicated genes provides biological information that can prove useful to gene expression analyses. The Duplicated Gene Database can be freely accessed through the DGD website at http://dgd.genouest.org.

  11. Draft Genome Sequence and Gene Annotation of the Entomopathogenic Fungus Verticillium hemipterigenum

    OpenAIRE

    Horn, Fabian; Habel, Andreas; Scharf, Daniel H.; Dworschak, Jan; Brakhage, Axel A.; Guthke, Reinhard; Hertweck, Christian; Linde, J?rg

    2015-01-01

    Verticillium hemipterigenum (anamorph Torrubiella hemipterigena) is an entomopathogenic fungus and produces a broad range of secondary metabolites. Here, we present the draft genome sequence of the fungus, including gene structure and functional annotation. Genes were predicted incorporating RNA-Seq data and functionally annotated to provide the basis for further genome studies.

  12. DAVID Knowledgebase: a gene-centered database integrating heterogeneous gene annotation resources to facilitate high-throughput gene functional analysis

    Directory of Open Access Journals (Sweden)

    Baseler Michael W

    2007-11-01

    Full Text Available Abstract Background Due to the complex and distributed nature of biological research, our current biological knowledge is spread over many redundant annotation databases maintained by many independent groups. Analysts usually need to visit many of these bioinformatics databases in order to integrate comprehensive annotation information for their genes, which becomes one of the bottlenecks, particularly for the analytic task associated with a large gene list. Thus, a highly centralized and ready-to-use gene-annotation knowledgebase is in demand for high throughput gene functional analysis. Description The DAVID Knowledgebase is built around the DAVID Gene Concept, a single-linkage method to agglomerate tens of millions of gene/protein identifiers from a variety of public genomic resources into DAVID gene clusters. The grouping of such identifiers improves the cross-reference capability, particularly across NCBI and UniProt systems, enabling more than 40 publicly available functional annotation sources to be comprehensively integrated and centralized by the DAVID gene clusters. The simple, pair-wise, text format files which make up the DAVID Knowledgebase are freely downloadable for various data analysis uses. In addition, a well organized web interface allows users to query different types of heterogeneous annotations in a high-throughput manner. Conclusion The DAVID Knowledgebase is designed to facilitate high throughput gene functional analysis. For a given gene list, it not only provides the quick accessibility to a wide range of heterogeneous annotation data in a centralized location, but also enriches the level of biological information for an individual gene. Moreover, the entire DAVID Knowledgebase is freely downloadable or searchable at http://david.abcc.ncifcrf.gov/knowledgebase/.

  13. Expanded microbial genome coverage and improved protein family annotation in the COG database.

    Science.gov (United States)

    Galperin, Michael Y; Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V

    2015-01-01

    Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the

  14. Expression profiling of hypothetical genes in Desulfovibrio vulgaris leads to improved functional annotation

    Energy Technology Data Exchange (ETDEWEB)

    Elias, Dwayne A.; Mukhopadhyay, Aindrila; Joachimiak, Marcin P.; Drury, Elliott C.; Redding, Alyssa M.; Yen, Huei-Che B.; Fields, Matthew W.; Hazen, Terry C.; Arkin, Adam P.; Keasling, Jay D.; Wall, Judy D.

    2008-10-27

    Hypothetical and conserved hypothetical genes account for>30percent of sequenced bacterial genomes. For the sulfate-reducing bacterium Desulfovibrio vulgaris Hildenborough, 347 of the 3634 genes were annotated as conserved hypothetical (9.5percent) along with 887 hypothetical genes (24.4percent). Given the large fraction of the genome, it is plausible that some of these genes serve critical cellular roles. The study goals were to determine which genes were expressed and provide a more functionally based annotation. To accomplish this, expression profiles of 1234 hypothetical and conserved genes were used from transcriptomic datasets of 11 environmental stresses, complemented with shotgun LC-MS/MS and AMT tag proteomic data. Genes were divided into putatively polycistronic operons and those predicted to be monocistronic, then classified by basal expression levels and grouped according to changes in expression for one or multiple stresses. 1212 of these genes were transcribed with 786 producing detectable proteins. There was no evidence for expression of 17 predicted genes. Except for the latter, monocistronic gene annotation was expanded using the above criteria along with matching Clusters of Orthologous Groups. Polycistronic genes were annotated in the same manner with inferences from their proximity to more confidently annotated genes. Two targeted deletion mutants were used as test cases to determine the relevance of the inferred functional annotations.

  15. NuChart: an R package to study gene spatial neighbourhoods with multi-omics annotations.

    Directory of Open Access Journals (Sweden)

    Ivan Merelli

    Full Text Available Long-range chromosomal associations between genomic regions, and their repositioning in the 3D space of the nucleus, are now considered to be key contributors to the regulation of gene expression and important links have been highlighted with other genomic features involved in DNA rearrangements. Recent Chromosome Conformation Capture (3C measurements performed with high throughput sequencing (Hi-C and molecular dynamics studies show that there is a large correlation between colocalization and coregulation of genes, but these important researches are hampered by the lack of biologists-friendly analysis and visualisation software. Here, we describe NuChart, an R package that allows the user to annotate and statistically analyse a list of input genes with information relying on Hi-C data, integrating knowledge about genomic features that are involved in the chromosome spatial organization. NuChart works directly with sequenced reads to identify the related Hi-C fragments, with the aim of creating gene-centric neighbourhood graphs on which multi-omics features can be mapped. Predictions about CTCF binding sites, isochores and cryptic Recombination Signal Sequences are provided directly with the package for mapping, although other annotation data in bed format can be used (such as methylation profiles and histone patterns. Gene expression data can be automatically retrieved and processed from the Gene Expression Omnibus and ArrayExpress repositories to highlight the expression profile of genes in the identified neighbourhood. Moreover, statistical inferences about the graph structure and correlations between its topology and multi-omics features can be performed using Exponential-family Random Graph Models. The Hi-C fragment visualisation provided by NuChart allows the comparisons of cells in different conditions, thus providing the possibility of novel biomarkers identification. NuChart is compliant with the Bioconductor standard and it is freely

  16. Annotation and comparative analysis of the glycoside hydrolase genes in Brachypodium distachyon

    Directory of Open Access Journals (Sweden)

    Wu Jiajie

    2010-10-01

    Full Text Available Abstract Background Glycoside hydrolases cleave the bond between a carbohydrate and another carbohydrate, a protein, lipid or other moiety. Genes encoding glycoside hydrolases are found in a wide range of organisms, from archea to animals, and are relatively abundant in plant genomes. In plants, these enzymes are involved in diverse processes, including starch metabolism, defense, and cell-wall remodeling. Glycoside hydrolase genes have been previously cataloged for Oryza sativa (rice, the model dicotyledonous plant Arabidopsis thaliana, and the fast-growing tree Populus trichocarpa (poplar. To improve our understanding of glycoside hydrolases in plants generally and in grasses specifically, we annotated the glycoside hydrolase genes in the grasses Brachypodium distachyon (an emerging monocotyledonous model and Sorghum bicolor (sorghum. We then compared the glycoside hydrolases across species, at the levels of the whole genome and individual glycoside hydrolase families. Results We identified 356 glycoside hydrolase genes in Brachypodium and 404 in sorghum. The corresponding proteins fell into the same 34 families that are represented in rice, Arabidopsis, and poplar, helping to define a glycoside hydrolase family profile which may be common to flowering plants. For several glycoside hydrolase familes (GH5, GH13, GH18, GH19, GH28, and GH51, we present a detailed literature review together with an examination of the family structures. This analysis of individual families revealed both similarities and distinctions between monocots and eudicots, as well as between species. Shared evolutionary histories appear to be modified by lineage-specific expansions or deletions. Within GH families, the Brachypodium and sorghum proteins generally cluster with those from other monocots. Conclusions This work provides the foundation for further comparative and functional analyses of plant glycoside hydrolases. Defining the Brachypodium glycoside hydrolases sets

  17. The Caenorhabditis chemoreceptor gene families

    Directory of Open Access Journals (Sweden)

    Robertson Hugh M

    2008-10-01

    Full Text Available Abstract Background Chemoreceptor proteins mediate the first step in the transduction of environmental chemical stimuli, defining the breadth of detection and conferring stimulus specificity. Animal genomes contain families of genes encoding chemoreceptors that mediate taste, olfaction, and pheromone responses. The size and diversity of these families reflect the biology of chemoperception in specific species. Results Based on manual curation and sequence comparisons among putative G-protein-coupled chemoreceptor genes in the nematode Caenorhabditis elegans, we identified approximately 1300 genes and 400 pseudogenes in the 19 largest gene families, most of which fall into larger superfamilies. In the related species C. briggsae and C. remanei, we identified most or all genes in each of the 19 families. For most families, C. elegans has the largest number of genes and C. briggsae the smallest number, suggesting changes in the importance of chemoperception among the species. Protein trees reveal family-specific and species-specific patterns of gene duplication and gene loss. The frequency of strict orthologs varies among the families, from just over 50% in two families to less than 5% in three families. Several families include large species-specific expansions, mostly in C. elegans and C. remanei. Conclusion Chemoreceptor gene families in Caenorhabditis species are large and evolutionarily dynamic as a result of gene duplication and gene loss. These dynamics shape the chemoreceptor gene complements in Caenorhabditis species and define the receptor space available for chemosensory responses. To explain these patterns, we propose the gray pawn hypothesis: individual genes are of little significance, but the aggregate of a large number of diverse genes is required to cover a large phenotype space.

  18. The Caenorhabditis chemoreceptor gene families.

    Science.gov (United States)

    Thomas, James H; Robertson, Hugh M

    2008-10-06

    Chemoreceptor proteins mediate the first step in the transduction of environmental chemical stimuli, defining the breadth of detection and conferring stimulus specificity. Animal genomes contain families of genes encoding chemoreceptors that mediate taste, olfaction, and pheromone responses. The size and diversity of these families reflect the biology of chemoperception in specific species. Based on manual curation and sequence comparisons among putative G-protein-coupled chemoreceptor genes in the nematode Caenorhabditis elegans, we identified approximately 1300 genes and 400 pseudogenes in the 19 largest gene families, most of which fall into larger superfamilies. In the related species C. briggsae and C. remanei, we identified most or all genes in each of the 19 families. For most families, C. elegans has the largest number of genes and C. briggsae the smallest number, suggesting changes in the importance of chemoperception among the species. Protein trees reveal family-specific and species-specific patterns of gene duplication and gene loss. The frequency of strict orthologs varies among the families, from just over 50% in two families to less than 5% in three families. Several families include large species-specific expansions, mostly in C. elegans and C. remanei. Chemoreceptor gene families in Caenorhabditis species are large and evolutionarily dynamic as a result of gene duplication and gene loss. These dynamics shape the chemoreceptor gene complements in Caenorhabditis species and define the receptor space available for chemosensory responses. To explain these patterns, we propose the gray pawn hypothesis: individual genes are of little significance, but the aggregate of a large number of diverse genes is required to cover a large phenotype space.

  19. Transcript-level annotation of Affymetrix probesets improves the interpretation of gene expression data

    Directory of Open Access Journals (Sweden)

    Tu Kang

    2007-06-01

    Full Text Available Abstract Background The wide use of Affymetrix microarray in broadened fields of biological research has made the probeset annotation an important issue. Standard Affymetrix probeset annotation is at gene level, i.e. a probeset is precisely linked to a gene, and probeset intensity is interpreted as gene expression. The increased knowledge that one gene may have multiple transcript variants clearly brings up the necessity of updating this gene-level annotation to a refined transcript-level. Results Through performing rigorous alignments of the Affymetrix probe sequences against a comprehensive pool of currently available transcript sequences, and further linking the probesets to the International Protein Index, we generated transcript-level or protein-level annotation tables for two popular Affymetrix expression arrays, Mouse Genome 430A 2.0 Array and Human Genome U133A Array. Application of our new annotations in re-examining existing expression data sets shows increased expression consistency among synonymous probesets and strengthened expression correlation between interacting proteins. Conclusion By refining the standard Affymetrix annotation of microarray probesets from the gene level to the transcript level and protein level, one can achieve a more reliable interpretation of their experimental data, which may lead to discovery of more profound regulatory mechanism.

  20. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3.

    Science.gov (United States)

    Han, Mira V; Thomas, Gregg W C; Lugo-Martinez, Jose; Hahn, Matthew W

    2013-08-01

    Current sequencing methods produce large amounts of data, but genome assemblies constructed from these data are often fragmented and incomplete. Incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. This means that methods attempting to estimate rates of gene duplication and loss often will be misled by such errors and that rates of gene family evolution will be consistently overestimated. Here, we present a method that takes these errors into account, allowing one to accurately infer rates of gene gain and loss among genomes even with low assembly and annotation quality. The method is implemented in the newest version of the software package CAFE, along with several other novel features. We demonstrate the accuracy of the method with extensive simulations and reanalyze several previously published data sets. Our results show that errors in genome annotation do lead to higher inferred rates of gene gain and loss but that CAFE 3 sufficiently accounts for these errors to provide accurate estimates of important evolutionary parameters.

  1. Gene cluster statistics with gene families.

    Science.gov (United States)

    Raghupathy, Narayanan; Durand, Dannie

    2009-05-01

    Identifying genomic regions that descended from a common ancestor is important for understanding the function and evolution of genomes. In distantly related genomes, clusters of homologous gene pairs are evidence of candidate homologous regions. Demonstrating the statistical significance of such "gene clusters" is an essential component of comparative genomic analyses. However, currently there are no practical statistical tests for gene clusters that model the influence of the number of homologs in each gene family on cluster significance. In this work, we demonstrate empirically that failure to incorporate gene family size in gene cluster statistics results in overestimation of significance, leading to incorrect conclusions. We further present novel analytical methods for estimating gene cluster significance that take gene family size into account. Our methods do not require complete genome data and are suitable for testing individual clusters found in local regions, such as contigs in an unfinished assembly. We consider pairs of regions drawn from the same genome (paralogous clusters), as well as regions drawn from two different genomes (orthologous clusters). Determining cluster significance under general models of gene family size is computationally intractable. By assuming that all gene families are of equal size, we obtain analytical expressions that allow fast approximation of cluster probabilities. We evaluate the accuracy of this approximation by comparing the resulting gene cluster probabilities with cluster probabilities obtained by simulating a realistic, power-law distributed model of gene family size, with parameters inferred from genomic data. Surprisingly, despite the simplicity of the underlying assumption, our method accurately approximates the true cluster probabilities. It slightly overestimates these probabilities, yielding a conservative test. We present additional simulation results indicating the best choice of parameter values for data

  2. Extracting Cross-Ontology Weighted Association Rules from Gene Ontology Annotations.

    Science.gov (United States)

    Agapito, Giuseppe; Milano, Marianna; Guzzi, Pietro Hiram; Cannataro, Mario

    2016-01-01

    Gene Ontology (GO) is a structured repository of concepts (GO Terms) that are associated to one or more gene products through a process referred to as annotation. The analysis of annotated data is an important opportunity for bioinformatics. There are different approaches of analysis, among those, the use of association rules (AR) which provides useful knowledge, discovering biologically relevant associations between terms of GO, not previously known. In a previous work, we introduced GO-WAR (Gene Ontology-based Weighted Association Rules), a methodology for extracting weighted association rules from ontology-based annotated datasets. We here adapt the GO-WAR algorithm to mine cross-ontology association rules, i.e., rules that involve GO terms present in the three sub-ontologies of GO. We conduct a deep performance evaluation of GO-WAR by mining publicly available GO annotated datasets, showing how GO-WAR outperforms current state of the art approaches.

  3. Annotating gene sets by mining large literature collections with protein networks.

    Science.gov (United States)

    Wang, Sheng; Ma, Jianzhu; Yu, Michael Ku; Zheng, Fan; Huang, Edward W; Han, Jiawei; Peng, Jian; Ideker, Trey

    2018-01-01

    Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networks. This system links genes with associated literature phrases and combines these links with protein interactions in a single heterogeneous network. Multiscale functional annotations are inferred based on network distances between phrases and genes and then visualized as an ontology of biological concepts. To evaluate this system, we predict functions for gene sets representing known pathways and find that our approach achieves substantial improvement over the conventional text-mining baseline method. Moreover, our system discovers novel annotations for gene sets or pathways without previously known functions. Two case studies demonstrate how the system is used in discovery of new cancer-related pathways with ontological annotations.

  4. Using phylogenetically-informed annotation (PIA) to search for light-interacting genes in transcriptomes from non-model organisms.

    Science.gov (United States)

    Speiser, Daniel I; Pankey, M Sabrina; Zaharoff, Alexander K; Battelle, Barbara A; Bracken-Grissom, Heather D; Breinholt, Jesse W; Bybee, Seth M; Cronin, Thomas W; Garm, Anders; Lindgren, Annie R; Patel, Nipam H; Porter, Megan L; Protas, Meredith E; Rivera, Ajna S; Serb, Jeanne M; Zigler, Kirk S; Crandall, Keith A; Oakley, Todd H

    2014-11-19

    Tools for high throughput sequencing and de novo assembly make the analysis of transcriptomes (i.e. the suite of genes expressed in a tissue) feasible for almost any organism. Yet a challenge for biologists is that it can be difficult to assign identities to gene sequences, especially from non-model organisms. Phylogenetic analyses are one useful method for assigning identities to these sequences, but such methods tend to be time-consuming because of the need to re-calculate trees for every gene of interest and each time a new data set is analyzed. In response, we employed existing tools for phylogenetic analysis to produce a computationally efficient, tree-based approach for annotating transcriptomes or new genomes that we term Phylogenetically-Informed Annotation (PIA), which places uncharacterized genes into pre-calculated phylogenies of gene families. We generated maximum likelihood trees for 109 genes from a Light Interaction Toolkit (LIT), a collection of genes that underlie the function or development of light-interacting structures in metazoans. To do so, we searched protein sequences predicted from 29 fully-sequenced genomes and built trees using tools for phylogenetic analysis in the Osiris package of Galaxy (an open-source workflow management system). Next, to rapidly annotate transcriptomes from organisms that lack sequenced genomes, we repurposed a maximum likelihood-based Evolutionary Placement Algorithm (implemented in RAxML) to place sequences of potential LIT genes on to our pre-calculated gene trees. Finally, we implemented PIA in Galaxy and used it to search for LIT genes in 28 newly-sequenced transcriptomes from the light-interacting tissues of a range of cephalopod mollusks, arthropods, and cubozoan cnidarians. Our new trees for LIT genes are available on the Bitbucket public repository ( http://bitbucket.org/osiris_phylogenetics/pia/ ) and we demonstrate PIA on a publicly-accessible web server ( http://galaxy-dev.cnsi.ucsb.edu/pia/ ). Our new

  5. An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets.

    Science.gov (United States)

    Hosseini, Parsa; Tremblay, Arianne; Matthews, Benjamin F; Alkharouf, Nadim W

    2010-07-02

    The data produced by an Illumina flow cell with all eight lanes occupied, produces well over a terabyte worth of images with gigabytes of reads following sequence alignment. The ability to translate such reads into meaningful annotation is therefore of great concern and importance. Very easily, one can get flooded with such a great volume of textual, unannotated data irrespective of read quality or size. CASAVA, a optional analysis tool for Illumina sequencing experiments, enables the ability to understand INDEL detection, SNP information, and allele calling. To not only extract from such analysis, a measure of gene expression in the form of tag-counts, but furthermore to annotate such reads is therefore of significant value. We developed TASE (Tag counting and Analysis of Solexa Experiments), a rapid tag-counting and annotation software tool specifically designed for Illumina CASAVA sequencing datasets. Developed in Java and deployed using jTDS JDBC driver and a SQL Server backend, TASE provides an extremely fast means of calculating gene expression through tag-counts while annotating sequenced reads with the gene's presumed function, from any given CASAVA-build. Such a build is generated for both DNA and RNA sequencing. Analysis is broken into two distinct components: DNA sequence or read concatenation, followed by tag-counting and annotation. The end result produces output containing the homology-based functional annotation and respective gene expression measure signifying how many times sequenced reads were found within the genomic ranges of functional annotations. TASE is a powerful tool to facilitate the process of annotating a given Illumina Solexa sequencing dataset. Our results indicate that both homology-based annotation and tag-count analysis are achieved in very efficient times, providing researchers to delve deep in a given CASAVA-build and maximize information extraction from a sequencing dataset. TASE is specially designed to translate sequence data

  6. High-performance web services for querying gene and variant annotation.

    Science.gov (United States)

    Xin, Jiwen; Mark, Adam; Afrasiabi, Cyrus; Tsueng, Ginger; Juchler, Moritz; Gopal, Nikhil; Stupp, Gregory S; Putman, Timothy E; Ainscough, Benjamin J; Griffith, Obi L; Torkamani, Ali; Whetzel, Patricia L; Mungall, Christopher J; Mooney, Sean D; Su, Andrew I; Wu, Chunlei

    2016-05-06

    Efficient tools for data management and integration are essential for many aspects of high-throughput biology. In particular, annotations of genes and human genetic variants are commonly used but highly fragmented across many resources. Here, we describe MyGene.info and MyVariant.info, high-performance web services for querying gene and variant annotation information. These web services are currently accessed more than three million times permonth. They also demonstrate a generalizable cloud-based model for organizing and querying biological annotation information. MyGene.info and MyVariant.info are provided as high-performance web services, accessible at http://mygene.info and http://myvariant.info . Both are offered free of charge to the research community.

  7. The Physalis peruviana leaf transcriptome: assembly, annotation and gene model prediction

    Directory of Open Access Journals (Sweden)

    Garzón-Martínez Gina A

    2012-04-01

    Full Text Available Abstract Background Physalis peruviana commonly known as Cape gooseberry is a member of the Solanaceae family that has an increasing popularity due to its nutritional and medicinal values. A broad range of genomic tools is available for other Solanaceae, including tomato and potato. However, limited genomic resources are currently available for Cape gooseberry. Results We report the generation of a total of 652,614 P. peruviana Expressed Sequence Tags (ESTs, using 454 GS FLX Titanium technology. ESTs, with an average length of 371 bp, were obtained from a normalized leaf cDNA library prepared using a Colombian commercial variety. De novo assembling was performed to generate a collection of 24,014 isotigs and 110,921 singletons, with an average length of 1,638 bp and 354 bp, respectively. Functional annotation was performed using NCBI’s BLAST tools and Blast2GO, which identified putative functions for 21,191 assembled sequences, including gene families involved in all the major biological processes and molecular functions as well as defense response and amino acid metabolism pathways. Gene model predictions in P. peruviana were obtained by using the genomes of Solanum lycopersicum (tomato and Solanum tuberosum (potato. We predict 9,436 P. peruviana sequences with multiple-exon models and conserved intron positions with respect to the potato and tomato genomes. Additionally, to study species diversity we developed 5,971 SSR markers from assembled ESTs. Conclusions We present the first comprehensive analysis of the Physalis peruviana leaf transcriptome, which will provide valuable resources for development of genetic tools in the species. Assembled transcripts with gene models could serve as potential candidates for marker discovery with a variety of applications including: functional diversity, conservation and improvement to increase productivity and fruit quality. P. peruviana was estimated to be phylogenetically branched out before the

  8. The Physalis peruviana leaf transcriptome: assembly, annotation and gene model prediction.

    Science.gov (United States)

    Garzón-Martínez, Gina A; Zhu, Z Iris; Landsman, David; Barrero, Luz S; Mariño-Ramírez, Leonardo

    2012-04-25

    Physalis peruviana commonly known as Cape gooseberry is a member of the Solanaceae family that has an increasing popularity due to its nutritional and medicinal values. A broad range of genomic tools is available for other Solanaceae, including tomato and potato. However, limited genomic resources are currently available for Cape gooseberry. We report the generation of a total of 652,614 P. peruviana Expressed Sequence Tags (ESTs), using 454 GS FLX Titanium technology. ESTs, with an average length of 371 bp, were obtained from a normalized leaf cDNA library prepared using a Colombian commercial variety. De novo assembling was performed to generate a collection of 24,014 isotigs and 110,921 singletons, with an average length of 1,638 bp and 354 bp, respectively. Functional annotation was performed using NCBI's BLAST tools and Blast2GO, which identified putative functions for 21,191 assembled sequences, including gene families involved in all the major biological processes and molecular functions as well as defense response and amino acid metabolism pathways. Gene model predictions in P. peruviana were obtained by using the genomes of Solanum lycopersicum (tomato) and Solanum tuberosum (potato). We predict 9,436 P. peruviana sequences with multiple-exon models and conserved intron positions with respect to the potato and tomato genomes. Additionally, to study species diversity we developed 5,971 SSR markers from assembled ESTs. We present the first comprehensive analysis of the Physalis peruviana leaf transcriptome, which will provide valuable resources for development of genetic tools in the species. Assembled transcripts with gene models could serve as potential candidates for marker discovery with a variety of applications including: functional diversity, conservation and improvement to increase productivity and fruit quality. P. peruviana was estimated to be phylogenetically branched out before the divergence of five other Solanaceae family members, S

  9. The Genome Sequence of Leishmania (Leishmania) amazonensis: Functional Annotation and Extended Analysis of Gene Models

    Science.gov (United States)

    Real, Fernando; Vidal, Ramon Oliveira; Carazzolle, Marcelo Falsarella; Mondego, Jorge Maurício Costa; Costa, Gustavo Gilson Lacerda; Herai, Roberto Hirochi; Würtele, Martin; de Carvalho, Lucas Miguel; e Ferreira, Renata Carmona; Mortara, Renato Arruda; Barbiéri, Clara Lucia; Mieczkowski, Piotr; da Silveira, José Franco; Briones, Marcelo Ribeiro da Silva; Pereira, Gonçalo Amarante Guimarães; Bahia, Diana

    2013-01-01

    We present the sequencing and annotation of the Leishmania (Leishmania) amazonensis genome, an etiological agent of human cutaneous leishmaniasis in the Amazon region of Brazil. L. (L.) amazonensis shares features with Leishmania (L.) mexicana but also exhibits unique characteristics regarding geographical distribution and clinical manifestations of cutaneous lesions (e.g. borderline disseminated cutaneous leishmaniasis). Predicted genes were scored for orthologous gene families and conserved domains in comparison with other human pathogenic Leishmania spp. Carboxypeptidase, aminotransferase, and 3′-nucleotidase genes and ATPase, thioredoxin, and chaperone-related domains were represented more abundantly in L. (L.) amazonensis and L. (L.) mexicana species. Phylogenetic analysis revealed that these two species share groups of amastin surface proteins unique to the genus that could be related to specific features of disease outcomes and host cell interactions. Additionally, we describe a hypothetical hybrid interactome of potentially secreted L. (L.) amazonensis proteins and host proteins under the assumption that parasite factors mimic their mammalian counterparts. The model predicts an interaction between an L. (L.) amazonensis heat-shock protein and mammalian Toll-like receptor 9, which is implicated in important immune responses such as cytokine and nitric oxide production. The analysis presented here represents valuable information for future studies of leishmaniasis pathogenicity and treatment. PMID:23857904

  10. The Literature on Military Families, 1980: An Annotated Bibliography.

    Science.gov (United States)

    1980-08-01

    Group Psychotherapy , 1964, 14, 374-377. This article examined group psychotherapy which was offered to ser- vice personnel and dependents in the...force. 43 Dobrofsky, L. R. The wife: from military dependent to feminist ? In E. Hunter (ED.), Changing families in a changing military system. (DTIC No...because officers’ wives are more socio- economically and educationally like NOW militant feminists . Dobrofsky, L. R., & Batterson, C. T. The military

  11. Computational analyses and annotations of the Arabidopsis peroxidasegene family

    DEFF Research Database (Denmark)

    Østergaard, Lars; Pedersen, Anders Gorm; Jespersen, Hans M.

    1998-01-01

    Classical heme-containing plant peroxidases have been ascribed a wide variety of functional roles related to development, defense, lignification and hormonal signaling. More than 40 peroxidase genes are now known in Arabidopsis thaliana for which functional association is complicated by a general...... containing 40-71% adenine, a rare feature observed also in cDNAs which predominantly encode stress-induced proteins, and which may indicate translational regulation....

  12. Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks

    Directory of Open Access Journals (Sweden)

    Mazo Ilya

    2007-07-01

    Full Text Available Abstract Background Uncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools. Protein functions, often referred to as its annotations, are believed to manifest themselves through topology of the networks of inter-proteins interactions. In particular, there is a growing body of evidence that proteins performing the same function are more likely to interact with each other than with proteins with other functions. However, since functional annotation and protein network topology are often studied separately, the direct relationship between them has not been comprehensively demonstrated. In addition to having the general biological significance, such demonstration would further validate the data extraction and processing methods used to compose protein annotation and protein-protein interactions datasets. Results We developed a method for automatic extraction of protein functional annotation from scientific text based on the Natural Language Processing (NLP technology. For the protein annotation extracted from the entire PubMed, we evaluated the precision and recall rates, and compared the performance of the automatic extraction technology to that of manual curation used in public Gene Ontology (GO annotation. In the second part of our presentation, we reported a large-scale investigation into the correspondence between communities in the literature-based protein networks and GO annotation groups of functionally related proteins. We found a comprehensive two-way match: proteins within biological annotation groups form significantly denser linked network clusters than expected by chance and, conversely, densely linked network communities exhibit a pronounced non-random overlap with GO groups. We also expanded the publicly available GO biological process annotation using the relations extracted by our NLP technology

  13. PANDA: pathway and annotation explorer for visualizing and interpreting gene-centric data.

    Science.gov (United States)

    Hart, Steven N; Moore, Raymond M; Zimmermann, Michael T; Oliver, Gavin R; Egan, Jan B; Bryce, Alan H; Kocher, Jean-Pierre A

    2015-01-01

    Objective. Bringing together genomics, transcriptomics, proteomics, and other -omics technologies is an important step towards developing highly personalized medicine. However, instrumentation has advances far beyond expectations and now we are able to generate data faster than it can be interpreted. Materials and Methods. We have developed PANDA (Pathway AND Annotation) Explorer, a visualization tool that integrates gene-level annotation in the context of biological pathways to help interpret complex data from disparate sources. PANDA is a web-based application that displays data in the context of well-studied pathways like KEGG, BioCarta, and PharmGKB. PANDA represents data/annotations as icons in the graph while maintaining the other data elements (i.e., other columns for the table of annotations). Custom pathways from underrepresented diseases can be imported when existing data sources are inadequate. PANDA also allows sharing annotations among collaborators. Results. In our first use case, we show how easy it is to view supplemental data from a manuscript in the context of a user's own data. Another use-case is provided describing how PANDA was leveraged to design a treatment strategy from the somatic variants found in the tumor of a patient with metastatic sarcomatoid renal cell carcinoma. Conclusion. PANDA facilitates the interpretation of gene-centric annotations by visually integrating this information with context of biological pathways. The application can be downloaded or used directly from our website: http://bioinformaticstools.mayo.edu/research/panda-viewer/.

  14. Canine candidate genes for dilated cardiomyopathy: annotation of and polymorphic markers for 14 genes

    Directory of Open Access Journals (Sweden)

    van Oost Bernard A

    2007-10-01

    Full Text Available Abstract Background Dilated cardiomyopathy is a myocardial disease occurring in humans and domestic animals and is characterized by dilatation of the left ventricle, reduced systolic function and increased sphericity of the left ventricle. Dilated cardiomyopathy has been observed in several, mostly large and giant, dog breeds, such as the Dobermann and the Great Dane. A number of genes have been identified, which are associated with dilated cardiomyopathy in the human, mouse and hamster. These genes mainly encode structural proteins of the cardiac myocyte. Results We present the annotation of, and marker development for, 14 of these genes of the dog genome, i.e. α-cardiac actin, caveolin 1, cysteine-rich protein 3, desmin, lamin A/C, LIM-domain binding factor 3, myosin heavy polypeptide 7, phospholamban, sarcoglycan δ, titin cap, α-tropomyosin, troponin I, troponin T and vinculin. A total of 33 Single Nucleotide Polymorphisms were identified for these canine genes and 11 polymorphic microsatellite repeats were developed. Conclusion The presented polymorphisms provide a tool to investigate the role of the corresponding genes in canine Dilated Cardiomyopathy by linkage analysis or association studies.

  15. Canine candidate genes for dilated cardiomyopathy: annotation of and polymorphic markers for 14 genes.

    Science.gov (United States)

    Wiersma, Anje C; Leegwater, Peter Aj; van Oost, Bernard A; Ollier, William E; Dukes-McEwan, Joanna

    2007-10-19

    Dilated cardiomyopathy is a myocardial disease occurring in humans and domestic animals and is characterized by dilatation of the left ventricle, reduced systolic function and increased sphericity of the left ventricle. Dilated cardiomyopathy has been observed in several, mostly large and giant, dog breeds, such as the Dobermann and the Great Dane. A number of genes have been identified, which are associated with dilated cardiomyopathy in the human, mouse and hamster. These genes mainly encode structural proteins of the cardiac myocyte. We present the annotation of, and marker development for, 14 of these genes of the dog genome, i.e. alpha-cardiac actin, caveolin 1, cysteine-rich protein 3, desmin, lamin A/C, LIM-domain binding factor 3, myosin heavy polypeptide 7, phospholamban, sarcoglycan delta, titin cap, alpha-tropomyosin, troponin I, troponin T and vinculin. A total of 33 Single Nucleotide Polymorphisms were identified for these canine genes and 11 polymorphic microsatellite repeats were developed. The presented polymorphisms provide a tool to investigate the role of the corresponding genes in canine Dilated Cardiomyopathy by linkage analysis or association studies.

  16. The Co-regulation Data Harvester: Automating gene annotation starting from a transcriptome database

    Science.gov (United States)

    Tsypin, Lev M.; Turkewitz, Aaron P.

    Identifying co-regulated genes provides a useful approach for defining pathway-specific machinery in an organism. To be efficient, this approach relies on thorough genome annotation, a process much slower than genome sequencing per se. Tetrahymena thermophila, a unicellular eukaryote, has been a useful model organism and has a fully sequenced but sparsely annotated genome. One important resource for studying this organism has been an online transcriptomic database. We have developed an automated approach to gene annotation in the context of transcriptome data in T. thermophila, called the Co-regulation Data Harvester (CDH). Beginning with a gene of interest, the CDH identifies co-regulated genes by accessing the Tetrahymena transcriptome database. It then identifies their closely related genes (orthologs) in other organisms by using reciprocal BLAST searches. Finally, it collates the annotations of those orthologs' functions, which provides the user with information to help predict the cellular role of the initial query. The CDH, which is freely available, represents a powerful new tool for analyzing cell biological pathways in Tetrahymena. Moreover, to the extent that genes and pathways are conserved between organisms, the inferences obtained via the CDH should be relevant, and can be explored, in many other systems.

  17. Coordinated and sequential transcription of the cyprinid herpesvirus-3 annotated genes.

    Science.gov (United States)

    Ilouze, Maya; Dishon, Arnon; Kotler, Moshe

    2012-10-01

    Cyprinid herpesvirus-3 (CyHV-3) is the cause of a fatal disease in carp and koi fish. The disease is seasonal and appears when water temperatures range from 18 to 28°C. CyHV-3 is a member of the Alloherpesviridae, a family in the Herpesvirales order that encompasses mammalian, avian and reptilian viruses. CyHV-3 is a large double-stranded DNA (dsDNA) herpesvirus with a genome of approximately 295kbp, divergent from other mammalian, avian and reptilian herpesviruses, but bearing several genes similar to cyprinid herpesvirus-1 (CyHV-1), CyHV-2, anguillid herpesvirus-1 (AngHV-1), ictalurid herpesvirus-1 (IcHV-1) and ranid herpes virus-1 (RaHV-1). Here we show that viral DNA synthesis commences 4-8h post-infection (p.i.), and is completely inhibited by pre-treatment with cytosine β-d-arabinofuranoside (Ara-C). Transcription of CyHV-3 genes initiates after infection as early as 1-2h p.i., and precedes viral DNA synthesis. All 156 annotated open reading frames (ORFs) of the CyHV-3 genome are transcribed into RNAs, most of which can be classified into immediate early (IE or α), early (E or β) and late (L or γ) classes, similar to all other herpesviruses. Several ORFs belonging to these groups are clustered along the viral genome. Copyright © 2012 Elsevier B.V. All rights reserved.

  18. The use of semantic similarity measures for optimally integrating heterogeneous Gene Ontology data from large scale annotation pipelines

    Directory of Open Access Journals (Sweden)

    Gaston K Mazandu

    2014-08-01

    Full Text Available With the advancement of new high throughput sequencing technologies, there has been an increase in the number of genome sequencing projects worldwide, which has yielded complete genome sequences of human, animals and plants. Subsequently, several labs have focused on genome annotation, consisting of assigning functions to gene products, mostly using Gene Ontology (GO terms. As a consequence, there is an increased heterogeneity in annotations across genomes due to different approaches used by different pipelines to infer these annotations and also due to the nature of the GO structure itself. This makes a curator's task difficult, even if they adhere to the established guidelines for assessing these protein annotations. Here we develop a genome-scale approach for integrating GO annotations from different pipelines using semantic similarity measures. We used this approach to identify inconsistencies and similarities in functional annotations between orthologs of human and Drosophila melanogaster, to assess the quality of GO annotations derived from InterPro2GO mappings compared to manually annotated GO annotations for the Drosophila melanogaster proteome from a FlyBase dataset and human, and to filter GO annotation data for these proteomes. Results obtained indicate that an efficient integration of GO annotations eliminates redundancy up to 27.08 and 22.32% in the Drosophila melanogaster and human GO annotation datasets, respectively. Furthermore, we identified lack of and missing annotations for some orthologs, and annotation mismatches between InterPro2GO and manual pipelines in these two proteomes, thus requiring further curation. This simplifies and facilitates tasks of curators in assessing protein annotations, reduces redundancy and eliminates inconsistencies in large annotation datasets for ease of comparative functional genomics.

  19. New genes expressed in human brains: implications for annotating evolving genomes.

    Science.gov (United States)

    Zhang, Yong E; Landback, Patrick; Vibranovski, Maria; Long, Manyuan

    2012-11-01

    New genes have frequently formed and spread to fixation in a wide variety of organisms, constituting abundant sets of lineage-specific genes. It was recently reported that an excess of primate-specific and human-specific genes were upregulated in the brains of fetuses and infants, and especially in the prefrontal cortex, which is involved in cognition. These findings reveal the prevalent addition of new genetic components to the transcriptome of the human brain. More generally, these findings suggest that genomes are continually evolving in both sequence and content, eroding the conservation endowed by common ancestry. Despite increasing recognition of the importance of new genes, we highlight here that these genes are still seriously under-characterized in functional studies and that new gene annotation is inconsistent in current practice. We propose an integrative approach to annotate new genes, taking advantage of functional and evolutionary genomic methods. We finally discuss how the refinement of new gene annotation will be important for the detection of evolutionary forces governing new gene origination. Copyright © 2012 WILEY Periodicals, Inc.

  20. prokaryote genome annotation with GeneScan and GLIMMER

    Indian Academy of Sciences (India)

    Unknown

    The number of false predictions (both positive and negative) is higher for GeneScan as compared to GLIMMER, but in a ... on whether they need to be trained on a set of genes in order to ..... FP has partial matches to the kdpA gene in C. jejuni.

  1. Lynx web services for annotations and systems analysis of multi-gene disorders.

    Science.gov (United States)

    Sulakhe, Dinanath; Taylor, Andrew; Balasubramanian, Sandhya; Feng, Bo; Xie, Bingqing; Börnigen, Daniela; Dave, Utpal J; Foster, Ian T; Gilliam, T Conrad; Maltsev, Natalia

    2014-07-01

    Lynx is a web-based integrated systems biology platform that supports annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Lynx has integrated multiple classes of biomedical data (genomic, proteomic, pathways, phenotypic, toxicogenomic, contextual and others) from various public databases as well as manually curated data from our group and collaborators (LynxKB). Lynx provides tools for gene list enrichment analysis using multiple functional annotations and network-based gene prioritization. Lynx provides access to the integrated database and the analytical tools via REST based Web Services (http://lynx.ci.uchicago.edu/webservices.html). This comprises data retrieval services for specific functional annotations, services to search across the complete LynxKB (powered by Lucene), and services to access the analytical tools built within the Lynx platform. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. Seeing the forest for the trees: annotating small RNA producing genes in plants.

    Science.gov (United States)

    Coruh, Ceyda; Shahid, Saima; Axtell, Michael J

    2014-04-01

    A key goal in genomics is the complete annotation of the expressed regions of the genome. In plants, substantial portions of the genome make regulatory small RNAs produced by Dicer-Like (DCL) proteins and utilized by Argonaute (AGO) proteins. These include miRNAs and various types of endogenous siRNAs. Small RNA-seq, enabled by cheap and fast DNA sequencing, has produced an enormous volume of data on plant miRNA and siRNA expression in recent years. In this review, we discuss recent progress in using small RNA-seq data to produce stable and reliable annotations of miRNA and siRNA genes in plants. In addition, we highlight key goals for the future of small RNA gene annotation in plants. Copyright © 2014 Elsevier Ltd. All rights reserved.

  3. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger.

    Science.gov (United States)

    Wright, James C; Sugden, Deana; Francis-McIntyre, Sue; Riba-Garcia, Isabel; Gaskell, Simon J; Grigoriev, Igor V; Baker, Scott E; Beynon, Robert J; Hubbard, Simon J

    2009-02-04

    Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR). 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.

  4. Comprehensive annotation of secondary metabolite biosynthetic genes and gene clusters of Aspergillus nidulans, A. fumigatus, A. niger and A. oryzae

    Science.gov (United States)

    2013-01-01

    Background Secondary metabolite production, a hallmark of filamentous fungi, is an expanding area of research for the Aspergilli. These compounds are potent chemicals, ranging from deadly toxins to therapeutic antibiotics to potential anti-cancer drugs. The genome sequences for multiple Aspergilli have been determined, and provide a wealth of predictive information about secondary metabolite production. Sequence analysis and gene overexpression strategies have enabled the discovery of novel secondary metabolites and the genes involved in their biosynthesis. The Aspergillus Genome Database (AspGD) provides a central repository for gene annotation and protein information for Aspergillus species. These annotations include Gene Ontology (GO) terms, phenotype data, gene names and descriptions and they are crucial for interpreting both small- and large-scale data and for aiding in the design of new experiments that further Aspergillus research. Results We have manually curated Biological Process GO annotations for all genes in AspGD with recorded functions in secondary metabolite production, adding new GO terms that specifically describe each secondary metabolite. We then leveraged these new annotations to predict roles in secondary metabolism for genes lacking experimental characterization. As a starting point for manually annotating Aspergillus secondary metabolite gene clusters, we used antiSMASH (antibiotics and Secondary Metabolite Analysis SHell) and SMURF (Secondary Metabolite Unknown Regions Finder) algorithms to identify potential clusters in A. nidulans, A. fumigatus, A. niger and A. oryzae, which we subsequently refined through manual curation. Conclusions This set of 266 manually curated secondary metabolite gene clusters will facilitate the investigation of novel Aspergillus secondary metabolites. PMID:23617571

  5. The Caenorhabditis chemoreceptor gene families

    OpenAIRE

    Robertson Hugh M; Thomas James H

    2008-01-01

    Abstract Background Chemoreceptor proteins mediate the first step in the transduction of environmental chemical stimuli, defining the breadth of detection and conferring stimulus specificity. Animal genomes contain families of genes encoding chemoreceptors that mediate taste, olfaction, and pheromone responses. The size and diversity of these families reflect the biology of chemoperception in specific species. Results Based on manual curation and sequence comparisons among putative G-protein-...

  6. Comparative Annotation of Viral Genomes with Non-Conserved Gene Structure

    DEFF Research Database (Denmark)

    de Groot, Saskia; Mailund, Thomas; Hein, Jotun

    2007-01-01

    Motivation: Detecting genes in viral genomes is a complex task. Due to the biological necessity of them being constrained in length, RNA viruses in particular tend to code in overlapping reading frames. Since one amino acid is encoded by a triplet of nucleic acids, up to three genes may be coded...... allows for coding in unidirectional nested and overlapping reading frames, to annotate two homologous aligned viral genomes. Our method does not insist on conserved gene structure between the two sequences, thus making it applicable for the pairwise comparison of more distantly related sequences. Results...... and HIV2, as well as of two different Hepatitis Viruses, attaining results of ~87% sensitivity and ~98.5% specificity. We subsequently incorporate prior knowledge by "knowing" the gene structure of one sequence and annotating the other conditional on it. Boosting accuracy close to perfect we demonstrate...

  7. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger

    OpenAIRE

    Wright, James C.; Sugden, Deana; Francis-McIntyre, Sue; Riba Garcia, Isabel; Gaskell, Simon J.; Grigoriev, Igor V.; Baker, Scott E.; Beynon, Robert J.; Hubbard, Simon J.

    2009-01-01

    Abstract Background Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were ac...

  8. Deep convolutional neural networks for annotating gene expression patterns in the mouse brain.

    Science.gov (United States)

    Zeng, Tao; Li, Rongjian; Mukkamala, Ravi; Ye, Jieping; Ji, Shuiwang

    2015-05-07

    Profiling gene expression in brain structures at various spatial and temporal scales is essential to understanding how genes regulate the development of brain structures. The Allen Developing Mouse Brain Atlas provides high-resolution 3-D in situ hybridization (ISH) gene expression patterns in multiple developing stages of the mouse brain. Currently, the ISH images are annotated with anatomical terms manually. In this paper, we propose a computational approach to annotate gene expression pattern images in the mouse brain at various structural levels over the course of development. We applied deep convolutional neural network that was trained on a large set of natural images to extract features from the ISH images of developing mouse brain. As a baseline representation, we applied invariant image feature descriptors to capture local statistics from ISH images and used the bag-of-words approach to build image-level representations. Both types of features from multiple ISH image sections of the entire brain were then combined to build 3-D, brain-wide gene expression representations. We employed regularized learning methods for discriminating gene expression patterns in different brain structures. Results show that our approach of using convolutional model as feature extractors achieved superior performance in annotating gene expression patterns at multiple levels of brain structures throughout four developing ages. Overall, we achieved average AUC of 0.894 ± 0.014, as compared with 0.820 ± 0.046 yielded by the bag-of-words approach. Deep convolutional neural network model trained on natural image sets and applied to gene expression pattern annotation tasks yielded superior performance, demonstrating its transfer learning property is applicable to such biological image sets.

  9. Annotating novel genes by integrating synthetic lethals and genomic information

    Directory of Open Access Journals (Sweden)

    Faty Mahamadou

    2008-01-01

    Full Text Available Abstract Background Large scale screening for synthetic lethality serves as a common tool in yeast genetics to systematically search for genes that play a role in specific biological processes. Often the amounts of data resulting from a single large scale screen far exceed the capacities of experimental characterization of every identified target. Thus, there is need for computational tools that select promising candidate genes in order to reduce the number of follow-up experiments to a manageable size. Results We analyze synthetic lethality data for arp1 and jnm1, two spindle migration genes, in order to identify novel members in this process. To this end, we use an unsupervised statistical method that integrates additional information from biological data sources, such as gene expression, phenotypic profiling, RNA degradation and sequence similarity. Different from existing methods that require large amounts of synthetic lethal data, our method merely relies on synthetic lethality information from two single screens. Using a Multivariate Gaussian Mixture Model, we determine the best subset of features that assign the target genes to two groups. The approach identifies a small group of genes as candidates involved in spindle migration. Experimental testing confirms the majority of our candidates and we present she1 (YBL031W as a novel gene involved in spindle migration. We applied the statistical methodology also to TOR2 signaling as another example. Conclusion We demonstrate the general use of Multivariate Gaussian Mixture Modeling for selecting candidate genes for experimental characterization from synthetic lethality data sets. For the given example, integration of different data sources contributes to the identification of genetic interaction partners of arp1 and jnm1 that play a role in the same biological process.

  10. Genome-Wide Comparative Gene Family Classification

    Science.gov (United States)

    Frech, Christian; Chen, Nansheng

    2010-01-01

    Correct classification of genes into gene families is important for understanding gene function and evolution. Although gene families of many species have been resolved both computationally and experimentally with high accuracy, gene family classification in most newly sequenced genomes has not been done with the same high standard. This project has been designed to develop a strategy to effectively and accurately classify gene families across genomes. We first examine and compare the performance of computer programs developed for automated gene family classification. We demonstrate that some programs, including the hierarchical average-linkage clustering algorithm MC-UPGMA and the popular Markov clustering algorithm TRIBE-MCL, can reconstruct manual curation of gene families accurately. However, their performance is highly sensitive to parameter setting, i.e. different gene families require different program parameters for correct resolution. To circumvent the problem of parameterization, we have developed a comparative strategy for gene family classification. This strategy takes advantage of existing curated gene families of reference species to find suitable parameters for classifying genes in related genomes. To demonstrate the effectiveness of this novel strategy, we use TRIBE-MCL to classify chemosensory and ABC transporter gene families in C. elegans and its four sister species. We conclude that fully automated programs can establish biologically accurate gene families if parameterized accordingly. Comparative gene family classification finds optimal parameters automatically, thus allowing rapid insights into gene families of newly sequenced species. PMID:20976221

  11. Evidence-based gene models for structural and functional annotations of the oil palm genome.

    Science.gov (United States)

    Chan, Kuang-Lim; Tatarinova, Tatiana V; Rosli, Rozana; Amiruddin, Nadzirah; Azizi, Norazah; Halim, Mohd Amin Ab; Sanusi, Nik Shazana Nik Mohd; Jayanthi, Nagappan; Ponomarenko, Petr; Triska, Martin; Solovyev, Victor; Firdaus-Raih, Mohd; Sambanthamurthi, Ravigadevi; Murphy, Denis; Low, Eng-Ti Leslie

    2017-09-08

    Oil palm is an important source of edible oil. The importance of the crop, as well as its long breeding cycle (10-12 years) has led to the sequencing of its genome in 2013 to pave the way for genomics-guided breeding. Nevertheless, the first set of gene predictions, although useful, had many fragmented genes. Classification and characterization of genes associated with traits of interest, such as those for fatty acid biosynthesis and disease resistance, were also limited. Lipid-, especially fatty acid (FA)-related genes are of particular interest for the oil palm as they specify oil yields and quality. This paper presents the characterization of the oil palm genome using different gene prediction methods and comparative genomics analysis, identification of FA biosynthesis and disease resistance genes, and the development of an annotation database and bioinformatics tools. Using two independent gene-prediction pipelines, Fgenesh++ and Seqping, 26,059 oil palm genes with transcriptome and RefSeq support were identified from the oil palm genome. These coding regions of the genome have a characteristic broad distribution of GC 3 (fraction of cytosine and guanine in the third position of a codon) with over half the GC 3 -rich genes (GC 3  ≥ 0.75286) being intronless. In comparison, only one-seventh of the oil palm genes identified are intronless. Using comparative genomics analysis, characterization of conserved domains and active sites, and expression analysis, 42 key genes involved in FA biosynthesis in oil palm were identified. For three of them, namely EgFABF, EgFABH and EgFAD3, segmental duplication events were detected. Our analysis also identified 210 candidate resistance genes in six classes, grouped by their protein domain structures. We present an accurate and comprehensive annotation of the oil palm genome, focusing on analysis of important categories of genes (GC 3 -rich and intronless), as well as those associated with important functions, such as FA

  12. Pairagon+N-SCAN_EST: a model-based gene annotation pipeline

    DEFF Research Database (Denmark)

    Arumugam, Manimozhiyan; Wei, Chaochun; Brown, Randall H

    2006-01-01

    This paper describes Pairagon+N-SCAN_EST, a gene annotation pipeline that uses only native alignments. For each expressed sequence it chooses the best genomic alignment. Systems like ENSEMBL and ExoGean rely on trans alignments, in which expressed sequences are aligned to the genomic loci...... with de novo gene prediction by using N-SCAN_EST. N-SCAN_EST is based on a generalized HMM probability model augmented with a phylogenetic conservation model and EST alignments. It can predict complete transcripts by extending or merging EST alignments, but it can also predict genes in regions without EST...

  13. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger

    Directory of Open Access Journals (Sweden)

    Grigoriev Igor V

    2009-02-01

    Full Text Available Abstract Background Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR. Results 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6% of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. Conclusion This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.

  14. Gene expression and functional annotation of the human and mouse choroid plexus epithelium.

    Directory of Open Access Journals (Sweden)

    Sarah F Janssen

    Full Text Available BACKGROUND: The choroid plexus epithelium (CPE is a lobed neuro-epithelial structure that forms the outer blood-brain barrier. The CPE protrudes into the brain ventricles and produces the cerebrospinal fluid (CSF, which is crucial for brain homeostasis. Malfunction of the CPE is possibly implicated in disorders like Alzheimer disease, hydrocephalus or glaucoma. To study human genetic diseases and potential new therapies, mouse models are widely used. This requires a detailed knowledge of similarities and differences in gene expression and functional annotation between the species. The aim of this study is to analyze and compare gene expression and functional annotation of healthy human and mouse CPE. METHODS: We performed 44k Agilent microarray hybridizations with RNA derived from laser dissected healthy human and mouse CPE cells. We functionally annotated and compared the gene expression data of human and mouse CPE using the knowledge database Ingenuity. We searched for common and species specific gene expression patterns and function between human and mouse CPE. We also made a comparison with previously published CPE human and mouse gene expression data. RESULTS: Overall, the human and mouse CPE transcriptomes are very similar. Their major functionalities included epithelial junctions, transport, energy production, neuro-endocrine signaling, as well as immunological, neurological and hematological functions and disorders. The mouse CPE presented two additional functions not found in the human CPE: carbohydrate metabolism and a more extensive list of (neural developmental functions. We found three genes specifically expressed in the mouse CPE compared to human CPE, being ACE, PON1 and TRIM3 and no human specifically expressed CPE genes compared to mouse CPE. CONCLUSION: Human and mouse CPE transcriptomes are very similar, and display many common functionalities. Nonetheless, we also identified a few genes and pathways which suggest that the CPE

  15. GRN2SBML: automated encoding and annotation of inferred gene regulatory networks complying with SBML.

    Science.gov (United States)

    Vlaic, Sebastian; Hoffmann, Bianca; Kupfer, Peter; Weber, Michael; Dräger, Andreas

    2013-09-01

    GRN2SBML automatically encodes gene regulatory networks derived from several inference tools in systems biology markup language. Providing a graphical user interface, the networks can be annotated via the simple object access protocol (SOAP)-based application programming interface of BioMart Central Portal and minimum information required in the annotation of models registry. Additionally, we provide an R-package, which processes the output of supported inference algorithms and automatically passes all required parameters to GRN2SBML. Therefore, GRN2SBML closes a gap in the processing pipeline between the inference of gene regulatory networks and their subsequent analysis, visualization and storage. GRN2SBML is freely available under the GNU Public License version 3 and can be downloaded from http://www.hki-jena.de/index.php/0/2/490. General information on GRN2SBML, examples and tutorials are available at the tool's web page.

  16. OAHG: an integrated resource for annotating human genes with multi-level ontologies.

    Science.gov (United States)

    Cheng, Liang; Sun, Jie; Xu, Wanying; Dong, Lixiang; Hu, Yang; Zhou, Meng

    2016-10-05

    OAHG, an integrated resource, aims to establish a comprehensive functional annotation resource for human protein-coding genes (PCGs), miRNAs, and lncRNAs by multi-level ontologies involving Gene Ontology (GO), Disease Ontology (DO), and Human Phenotype Ontology (HPO). Many previous studies have focused on inferring putative properties and biological functions of PCGs and non-coding RNA genes from different perspectives. During the past several decades, a few of databases have been designed to annotate the functions of PCGs, miRNAs, and lncRNAs, respectively. A part of functional descriptions in these databases were mapped to standardize terminologies, such as GO, which could be helpful to do further analysis. Despite these developments, there is no comprehensive resource recording the function of these three important types of genes. The current version of OAHG, release 1.0 (Jun 2016), integrates three ontologies involving GO, DO, and HPO, six gene functional databases and two interaction databases. Currently, OAHG contains 1,434,694 entries involving 16,929 PCGs, 637 miRNAs, 193 lncRNAs, and 24,894 terms of ontologies. During the performance evaluation, OAHG shows the consistencies with existing gene interactions and the structure of ontology. For example, terms with more similar structure could be associated with more associated genes (Pearson correlation γ 2  = 0.2428, p < 2.2e-16).

  17. PanCoreGen - Profiling, detecting, annotating protein-coding genes in microbial genomes.

    Science.gov (United States)

    Paul, Sandip; Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V; Chattopadhyay, Sujay

    2015-12-01

    A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing the pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen - a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for a species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars - Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. Copyright © 2015 Elsevier Inc. All rights reserved.

  18. PanCoreGen – profiling, detecting, annotating protein-coding genes in microbial genomes

    Science.gov (United States)

    Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V.

    2015-01-01

    A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen – a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars – Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. PMID:26456591

  19. Annotating activation/inhibition relationships to protein-protein interactions using gene ontology relations.

    Science.gov (United States)

    Yim, Soorin; Yu, Hasun; Jang, Dongjin; Lee, Doheon

    2018-04-11

    Signaling pathways can be reconstructed by identifying 'effect types' (i.e. activation/inhibition) of protein-protein interactions (PPIs). Effect types are composed of 'directions' (i.e. upstream/downstream) and 'signs' (i.e. positive/negative), thereby requiring directions as well as signs of PPIs to predict signaling events from PPI networks. Here, we propose a computational method for systemically annotating effect types to PPIs using relations between functional information of proteins. We used regulates, positively regulates, and negatively regulates relations in Gene Ontology (GO) to predict directions and signs of PPIs. These relations indicate both directions and signs between GO terms so that we can project directions and signs between relevant GO terms to PPIs. Independent test results showed that our method is effective for predicting both directions and signs of PPIs. Moreover, our method outperformed a previous GO-based method that did not consider the relations between GO terms. We annotated effect types to human PPIs and validated several highly confident effect types against literature. The annotated human PPIs are available in Additional file 2 to aid signaling pathway reconstruction and network biology research. We annotated effect types to PPIs by using regulates, positively regulates, and negatively regulates relations in GO. We demonstrated that those relations are effective for predicting not only signs, but also directions of PPIs. The usefulness of those relations suggests their potential applications to other types of interactions such as protein-DNA interactions.

  20. Deep developmental transcriptome sequencing uncovers numerous new genes and enhances gene annotation in the sponge Amphimedon queenslandica.

    Science.gov (United States)

    Fernandez-Valverde, Selene L; Calcino, Andrew D; Degnan, Bernard M

    2015-05-15

    The demosponge Amphimedon queenslandica is amongst the few early-branching metazoans with an assembled and annotated draft genome, making it an important species in the study of the origin and early evolution of animals. Current gene models in this species are largely based on in silico predictions and low coverage expressed sequence tag (EST) evidence. Amphimedon queenslandica protein-coding gene models are improved using deep RNA-Seq data from four developmental stages and CEL-Seq data from 82 developmental samples. Over 86% of previously predicted genes are retained in the new gene models, although 24% have additional exons; there is also a marked increase in the total number of annotated 3' and 5' untranslated regions (UTRs). Importantly, these new developmental transcriptome data reveal numerous previously unannotated protein-coding genes in the Amphimedon genome, increasing the total gene number by 25%, from 30,060 to 40,122. In general, Amphimedon genes have introns that are markedly smaller than those in other animals and most of the alternatively spliced genes in Amphimedon undergo intron-retention; exon-skipping is the least common mode of alternative splicing. Finally, in addition to canonical polyadenylation signal sequences, Amphimedon genes are enriched in a number of unique AT-rich motifs in their 3' UTRs. The inclusion of developmental transcriptome data has substantially improved the structure and composition of protein-coding gene models in Amphimedon queenslandica, providing a more accurate and comprehensive set of genes for functional and comparative studies. These improvements reveal the Amphimedon genome is comprised of a remarkably high number of tightly packed genes. These genes have small introns and there is pervasive intron retention amongst alternatively spliced transcripts. These aspects of the sponge genome are more similar unicellular opisthokont genomes than to other animal genomes.

  1. Structuring osteosarcoma knowledge: an osteosarcoma-gene association database based on literature mining and manual annotation.

    Science.gov (United States)

    Poos, Kathrin; Smida, Jan; Nathrath, Michaela; Maugg, Doris; Baumhoer, Daniel; Neumann, Anna; Korsching, Eberhard

    2014-01-01

    Osteosarcoma (OS) is the most common primary bone cancer exhibiting high genomic instability. This genomic instability affects multiple genes and microRNAs to a varying extent depending on patient and tumor subtype. Massive research is ongoing to identify genes including their gene products and microRNAs that correlate with disease progression and might be used as biomarkers for OS. However, the genomic complexity hampers the identification of reliable biomarkers. Up to now, clinico-pathological factors are the key determinants to guide prognosis and therapeutic treatments. Each day, new studies about OS are published and complicate the acquisition of information to support biomarker discovery and therapeutic improvements. Thus, it is necessary to provide a structured and annotated view on the current OS knowledge that is quick and easily accessible to researchers of the field. Therefore, we developed a publicly available database and Web interface that serves as resource for OS-associated genes and microRNAs. Genes and microRNAs were collected using an automated dictionary-based gene recognition procedure followed by manual review and annotation by experts of the field. In total, 911 genes and 81 microRNAs related to 1331 PubMed abstracts were collected (last update: 29 October 2013). Users can evaluate genes and microRNAs according to their potential prognostic and therapeutic impact, the experimental procedures, the sample types, the biological contexts and microRNA target gene interactions. Additionally, a pathway enrichment analysis of the collected genes highlights different aspects of OS progression. OS requires pathways commonly deregulated in cancer but also features OS-specific alterations like deregulated osteoclast differentiation. To our knowledge, this is the first effort of an OS database containing manual reviewed and annotated up-to-date OS knowledge. It might be a useful resource especially for the bone tumor research community, as specific

  2. A manually annotated Actinidia chinensis var. chinensis (kiwifruit) genome highlights the challenges associated with draft genomes and gene prediction in plants.

    Science.gov (United States)

    Pilkington, Sarah M; Crowhurst, Ross; Hilario, Elena; Nardozza, Simona; Fraser, Lena; Peng, Yongyan; Gunaseelan, Kularajathevan; Simpson, Robert; Tahir, Jibran; Deroles, Simon C; Templeton, Kerry; Luo, Zhiwei; Davy, Marcus; Cheng, Canhong; McNeilage, Mark; Scaglione, Davide; Liu, Yifei; Zhang, Qiong; Datson, Paul; De Silva, Nihal; Gardiner, Susan E; Bassett, Heather; Chagné, David; McCallum, John; Dzierzon, Helge; Deng, Cecilia; Wang, Yen-Yi; Barron, Lorna; Manako, Kelvina; Bowen, Judith; Foster, Toshi M; Erridge, Zoe A; Tiffin, Heather; Waite, Chethi N; Davies, Kevin M; Grierson, Ella P; Laing, William A; Kirk, Rebecca; Chen, Xiuyin; Wood, Marion; Montefiori, Mirco; Brummell, David A; Schwinn, Kathy E; Catanach, Andrew; Fullerton, Christina; Li, Dawei; Meiyalaghan, Sathiyamoorthy; Nieuwenhuizen, Niels; Read, Nicola; Prakash, Roneel; Hunter, Don; Zhang, Huaibi; McKenzie, Marian; Knäbel, Mareike; Harris, Alastair; Allan, Andrew C; Gleave, Andrew; Chen, Angela; Janssen, Bart J; Plunkett, Blue; Ampomah-Dwamena, Charles; Voogd, Charlotte; Leif, Davin; Lafferty, Declan; Souleyre, Edwige J F; Varkonyi-Gasic, Erika; Gambi, Francesco; Hanley, Jenny; Yao, Jia-Long; Cheung, Joey; David, Karine M; Warren, Ben; Marsh, Ken; Snowden, Kimberley C; Lin-Wang, Kui; Brian, Lara; Martinez-Sanchez, Marcela; Wang, Mindy; Ileperuma, Nadeesha; Macnee, Nikolai; Campin, Robert; McAtee, Peter; Drummond, Revel S M; Espley, Richard V; Ireland, Hilary S; Wu, Rongmei; Atkinson, Ross G; Karunairetnam, Sakuntala; Bulley, Sean; Chunkath, Shayhan; Hanley, Zac; Storey, Roy; Thrimawithana, Amali H; Thomson, Susan; David, Charles; Testolin, Raffaele; Huang, Hongwen; Hellens, Roger P; Schaffer, Robert J

    2018-04-16

    Most published genome sequences are drafts, and most are dominated by computational gene prediction. Draft genomes typically incorporate considerable sequence data that are not assigned to chromosomes, and predicted genes without quality confidence measures. The current Actinidia chinensis (kiwifruit) 'Hongyang' draft genome has 164 Mb of sequences unassigned to pseudo-chromosomes, and omissions have been identified in the gene models. A second genome of an A. chinensis (genotype Red5) was fully sequenced. This new sequence resulted in a 554.0 Mb assembly with all but 6 Mb assigned to pseudo-chromosomes. Pseudo-chromosomal comparisons showed a considerable number of translocation events have occurred following a whole genome duplication (WGD) event some consistent with centromeric Robertsonian-like translocations. RNA sequencing data from 12 tissues and ab initio analysis informed a genome-wide manual annotation, using the WebApollo tool. In total, 33,044 gene loci represented by 33,123 isoforms were identified, named and tagged for quality of evidential support. Of these 3114 (9.4%) were identical to a protein within 'Hongyang' The Kiwifruit Information Resource (KIR v2). Some proportion of the differences will be varietal polymorphisms. However, as most computationally predicted Red5 models required manual re-annotation this proportion is expected to be small. The quality of the new gene models was tested by fully sequencing 550 cloned 'Hort16A' cDNAs and comparing with the predicted protein models for Red5 and both the original 'Hongyang' assembly and the revised annotation from KIR v2. Only 48.9% and 63.5% of the cDNAs had a match with 90% identity or better to the original and revised 'Hongyang' annotation, respectively, compared with 90.9% to the Red5 models. Our study highlights the need to take a cautious approach to draft genomes and computationally predicted genes. Our use of the manual annotation tool WebApollo facilitated manual checking and

  3. BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS.

    Science.gov (United States)

    Hoff, Katharina J; Lange, Simone; Lomsadze, Alexandre; Borodovsky, Mark; Stanke, Mario

    2016-03-01

    Gene finding in eukaryotic genomes is notoriously difficult to automate. The task is to design a work flow with a minimal set of tools that would reach state-of-the-art performance across a wide range of species. GeneMark-ET is a gene prediction tool that incorporates RNA-Seq data into unsupervised training and subsequently generates ab initio gene predictions. AUGUSTUS is a gene finder that usually requires supervised training and uses information from RNA-Seq reads in the prediction step. Complementary strengths of GeneMark-ET and AUGUSTUS provided motivation for designing a new combined tool for automatic gene prediction. We present BRAKER1, a pipeline for unsupervised RNA-Seq-based genome annotation that combines the advantages of GeneMark-ET and AUGUSTUS. As input, BRAKER1 requires a genome assembly file and a file in bam-format with spliced alignments of RNA-Seq reads to the genome. First, GeneMark-ET performs iterative training and generates initial gene structures. Second, AUGUSTUS uses predicted genes for training and then integrates RNA-Seq read information into final gene predictions. In our experiments, we observed that BRAKER1 was more accurate than MAKER2 when it is using RNA-Seq as sole source for training and prediction. BRAKER1 does not require pre-trained parameters or a separate expert-prepared training step. BRAKER1 is available for download at http://bioinf.uni-greifswald.de/bioinf/braker/ and http://exon.gatech.edu/GeneMark/ katharina.hoff@uni-greifswald.de or borodovsky@gatech.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  4. On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report.

    Directory of Open Access Journals (Sweden)

    Paul D Thomas

    Full Text Available A recent paper (Nehrt et al., PLoS Comput. Biol. 7:e1002073, 2011 has proposed a metric for the "functional similarity" between two genes that uses only the Gene Ontology (GO annotations directly derived from published experimental results. Applying this metric, the authors concluded that paralogous genes within the mouse genome or the human genome are more functionally similar on average than orthologous genes between these genomes, an unexpected result with broad implications if true. We suggest, based on both theoretical and empirical considerations, that this proposed metric should not be interpreted as a functional similarity, and therefore cannot be used to support any conclusions about the "ortholog conjecture" (or, more properly, the "ortholog functional conservation hypothesis". First, we reexamine the case studies presented by Nehrt et al. as examples of orthologs with divergent functions, and come to a very different conclusion: they actually exemplify how GO annotations for orthologous genes provide complementary information about conserved biological functions. We then show that there is a global ascertainment bias in the experiment-based GO annotations for human and mouse genes: particular types of experiments tend to be performed in different model organisms. We conclude that the reported statistical differences in annotations between pairs of orthologous genes do not reflect differences in biological function, but rather complementarity in experimental approaches. Our results underscore two general considerations for researchers proposing novel types of analysis based on the GO: 1 that GO annotations are often incomplete, potentially in a biased manner, and subject to an "open world assumption" (absence of an annotation does not imply absence of a function, and 2 that conclusions drawn from a novel, large-scale GO analysis should whenever possible be supported by careful, in-depth examination of examples, to help ensure the

  5. Avoiding inconsistencies over time and tracking difficulties in Applied Biosystems AB1700™/Panther™ probe-to-gene annotations

    Directory of Open Access Journals (Sweden)

    Benecke Arndt

    2005-12-01

    Full Text Available Abstract Background Significant inconsistencies between probe-to-gene annotations between different releases of probe set identifiers by commercial microarray platform solutions have been reported. Such inconsistencies lead to misleading or ambiguous interpretation of published gene expression results. Results We report here similar inconsistencies in the probe-to-gene annotation of Applied Biosystems AB1700 data, demonstrating that this is not an isolated concern. Moreover, the online information source PANTHER does not provide information required to track such inconsistencies, hence, even correctly annotated datasets, when resubmitted after PANTHER was updated to a new probe-to-gene annotation release, will generate differing results without any feedback on the origin of the change. Conclusion The importance of unequivocal annotation of microarray experiments can not be underestimated. Inconsistencies greatly diminish the usefulness of the technology. Novel methods in the analysis of transcriptome profiles often rely on large disparate datasets stemming from multiple sources. The predictive and analytic power of such approaches rapidly diminishes if only least-common subsets can be used for analysis. We present here the information that needs to be provided together with the raw AB1700 data, and the information required together with the biologic interpretation of such data to avoid inconsistencies and tracking difficulties.

  6. tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles.

    Science.gov (United States)

    Cejuela, Juan Miguel; McQuilton, Peter; Ponting, Laura; Marygold, Steven J; Stefancsik, Raymund; Millburn, Gillian H; Rost, Burkhard

    2014-01-01

    The breadth and depth of biomedical literature are increasing year upon year. To keep abreast of these increases, FlyBase, a database for Drosophila genomic and genetic information, is constantly exploring new ways to mine the published literature to increase the efficiency and accuracy of manual curation and to automate some aspects, such as triaging and entity extraction. Toward this end, we present the 'tagtog' system, a web-based annotation framework that can be used to mark up biological entities (such as genes) and concepts (such as Gene Ontology terms) in full-text articles. tagtog leverages manual user annotation in combination with automatic machine-learned annotation to provide accurate identification of gene symbols and gene names. As part of the BioCreative IV Interactive Annotation Task, FlyBase has used tagtog to identify and extract mentions of Drosophila melanogaster gene symbols and names in full-text biomedical articles from the PLOS stable of journals. We show here the results of three experiments with different sized corpora and assess gene recognition performance and curation speed. We conclude that tagtog-named entity recognition improves with a larger corpus and that tagtog-assisted curation is quicker than manual curation. DATABASE URL: www.tagtog.net, www.flybase.org.

  7. A database of annotated promoters of genes associated with common respiratory and related diseases

    KAUST Repository

    Chowdhary, Rajesh; Tan, Sinlam; Pavesi, Giulio; Jin, Gg; Dong, Difeng; Mathur, Sameer K.; Burkart, Arthur; Narang, Vipin; Glurich, Ingrid E.; Raby, Benjamin A.; Weiss, Scott T.; Limsoon, Wong; Liu, Jun; Bajic, Vladimir B.

    2012-01-01

    Many genes have been implicated in the pathogenesis of common respiratory and related diseases (RRDs), yet the underlying mechanisms are largely unknown. Differential gene expression patterns in diseased and healthy individuals suggest that RRDs affect or are affected by modified transcription regulation programs. It is thus crucial to characterize implicated genes in terms of transcriptional regulation. For this purpose, we conducted a promoter analysis of genes associated with 11 common RRDs including allergic rhinitis, asthma, bronchiectasis, bronchiolitis, bronchitis, chronic obstructive pulmonary disease, cystic fibrosis, emphysema, eczema, psoriasis, and urticaria, many of which are thought to be genetically related. The objective of the present study was to obtain deeper insight into the transcriptional regulation of these disease-associated genes by annotating their promoter regions with transcription factors (TFs) and TF binding sites (TFBSs). We discovered many TFs that are significantly enriched in the target disease groups including associations that have been documented in the literature. We also identified a number of putative TFs/TFBSs that appear to be novel. The results of our analysis are provided in an online database that is freely accessible to researchers at http://www.respiratorygenomics.com. Promoter-associated TFBS information and related genomic features, such as histone modification sites, microsatellites, CpG islands, and SNPs, are graphically summarized in the database. Users can compare and contrast underlying mechanisms of specific RRDs relative to candidate genes, TFs, gene ontology terms, micro-RNAs, and biological pathways for the conduct of metaanalyses. This database represents a novel, useful resource for RRD researchers. Copyright © 2012 by the American Thoracic Society.

  8. A database of annotated promoters of genes associated with common respiratory and related diseases

    KAUST Repository

    Chowdhary, Rajesh

    2012-07-01

    Many genes have been implicated in the pathogenesis of common respiratory and related diseases (RRDs), yet the underlying mechanisms are largely unknown. Differential gene expression patterns in diseased and healthy individuals suggest that RRDs affect or are affected by modified transcription regulation programs. It is thus crucial to characterize implicated genes in terms of transcriptional regulation. For this purpose, we conducted a promoter analysis of genes associated with 11 common RRDs including allergic rhinitis, asthma, bronchiectasis, bronchiolitis, bronchitis, chronic obstructive pulmonary disease, cystic fibrosis, emphysema, eczema, psoriasis, and urticaria, many of which are thought to be genetically related. The objective of the present study was to obtain deeper insight into the transcriptional regulation of these disease-associated genes by annotating their promoter regions with transcription factors (TFs) and TF binding sites (TFBSs). We discovered many TFs that are significantly enriched in the target disease groups including associations that have been documented in the literature. We also identified a number of putative TFs/TFBSs that appear to be novel. The results of our analysis are provided in an online database that is freely accessible to researchers at http://www.respiratorygenomics.com. Promoter-associated TFBS information and related genomic features, such as histone modification sites, microsatellites, CpG islands, and SNPs, are graphically summarized in the database. Users can compare and contrast underlying mechanisms of specific RRDs relative to candidate genes, TFs, gene ontology terms, micro-RNAs, and biological pathways for the conduct of metaanalyses. This database represents a novel, useful resource for RRD researchers. Copyright © 2012 by the American Thoracic Society.

  9. Gene expression and functional annotation of the human ciliary body epithelia.

    Directory of Open Access Journals (Sweden)

    Sarah F Janssen

    Full Text Available PURPOSE: The ciliary body (CB of the human eye consists of the non-pigmented (NPE and pigmented (PE neuro-epithelia. We investigated the gene expression of NPE and PE, to shed light on the molecular mechanisms underlying the most important functions of the CB. We also developed molecular signatures for the NPE and PE and studied possible new clues for glaucoma. METHODS: We isolated NPE and PE cells from seven healthy human donor eyes using laser dissection microscopy. Next, we performed RNA isolation, amplification, labeling and hybridization against 44×k Agilent microarrays. For microarray conformations, we used a literature study, RT-PCRs, and immunohistochemical stainings. We analyzed the gene expression data with R and with the knowledge database Ingenuity. RESULTS: The gene expression profiles and functional annotations of the NPE and PE were highly similar. We found that the most important functionalities of the NPE and PE were related to developmental processes, neural nature of the tissue, endocrine and metabolic signaling, and immunological functions. In total 1576 genes differed statistically significantly between NPE and PE. From these genes, at least 3 were cell-specific for the NPE and 143 for the PE. Finally, we observed high expression in the (NPE of 35 genes previously implicated in molecular mechanisms related to glaucoma. CONCLUSION: Our gene expression analysis suggested that the NPE and PE of the CB were quite similar. Nonetheless, cell-type specific differences were found. The molecular machineries of the human NPE and PE are involved in a range of neuro-endocrinological, developmental and immunological functions, and perhaps glaucoma.

  10. Genome-wide Annotation, Identification, and Global Transcriptomic Analysis of Regulatory or Small RNA Gene Expression in Staphylococcus aureus.

    Science.gov (United States)

    Carroll, Ronan K; Weiss, Andy; Broach, William H; Wiemels, Richard E; Mogen, Austin B; Rice, Kelly C; Shaw, Lindsey N

    2016-02-09

    In Staphylococcus aureus, hundreds of small regulatory or small RNAs (sRNAs) have been identified, yet this class of molecule remains poorly understood and severely understudied. sRNA genes are typically absent from genome annotation files, and as a consequence, their existence is often overlooked, particularly in global transcriptomic studies. To facilitate improved detection and analysis of sRNAs in S. aureus, we generated updated GenBank files for three commonly used S. aureus strains (MRSA252, NCTC 8325, and USA300), in which we added annotations for >260 previously identified sRNAs. These files, the first to include genome-wide annotation of sRNAs in S. aureus, were then used as a foundation to identify novel sRNAs in the community-associated methicillin-resistant strain USA300. This analysis led to the discovery of 39 previously unidentified sRNAs. Investigating the genomic loci of the newly identified sRNAs revealed a surprising degree of inconsistency in genome annotation in S. aureus, which may be hindering the analysis and functional exploration of these elements. Finally, using our newly created annotation files as a reference, we perform a global analysis of sRNA gene expression in S. aureus and demonstrate that the newly identified tsr25 is the most highly upregulated sRNA in human serum. This study provides an invaluable resource to the S. aureus research community in the form of our newly generated annotation files, while at the same time presenting the first examination of differential sRNA expression in pathophysiologically relevant conditions. Despite a large number of studies identifying regulatory or small RNA (sRNA) genes in Staphylococcus aureus, their annotation is notably lacking in available genome files. In addition to this, there has been a considerable lack of cross-referencing in the wealth of studies identifying these elements, often leading to the same sRNA being identified multiple times and bearing multiple names. In this work

  11. trieFinder: an efficient program for annotating Digital Gene Expression (DGE) tags.

    Science.gov (United States)

    Renaud, Gabriel; LaFave, Matthew C; Liang, Jin; Wolfsberg, Tyra G; Burgess, Shawn M

    2014-10-13

    Quantification of a transcriptional profile is a useful way to evaluate the activity of a cell at a given point in time. Although RNA-Seq has revolutionized transcriptional profiling, the costs of RNA-Seq are still significantly higher than microarrays, and often the depth of data delivered from RNA-Seq is in excess of what is needed for simple transcript quantification. Digital Gene Expression (DGE) is a cost-effective, sequence-based approach for simple transcript quantification: by sequencing one read per molecule of RNA, this technique can be used to efficiently count transcripts while obviating the need for transcript-length normalization and reducing the total numbers of reads necessary for accurate quantification. Here, we present trieFinder, a program specifically designed to rapidly map, parse, and annotate DGE tags of various lengths against cDNA and/or genomic sequence databases. The trieFinder algorithm maps DGE tags in a two-step process. First, it scans FASTA files of RefSeq, UniGene, and genomic DNA sequences to create a database of all tags that can be derived from a predefined restriction site. Next, it compares the experimental DGE tags to this tag database, taking advantage of the fact that the tags are stored as a prefix tree, or "trie", which allows for linear-time searches for exact matches. DGE tags with mismatches are analyzed by recursive calls in the data structure. We find that, in terms of alignment speed, the mapping functionality of trieFinder compares favorably with Bowtie. trieFinder can quickly provide the user an annotation of the DGE tags from three sources simultaneously, simplifying transcript quantification and novel transcript detection, delivering the data in a simple parsed format, obviating the need to post-process the alignment results. trieFinder is available at http://research.nhgri.nih.gov/software/trieFinder/.

  12. False positive reduction in protein-protein interaction predictions using gene ontology annotations

    Directory of Open Access Journals (Sweden)

    Lin Yen-Han

    2007-07-01

    Full Text Available Abstract Background Many crucial cellular operations such as metabolism, signalling, and regulations are based on protein-protein interactions. However, the lack of robust protein-protein interaction information is a challenge. One reason for the lack of solid protein-protein interaction information is poor agreement between experimental findings and computational sets that, in turn, comes from huge false positive predictions in computational approaches. Reduction of false positive predictions and enhancing true positive fraction of computationally predicted protein-protein interaction datasets based on highly confident experimental results has not been adequately investigated. Results Gene Ontology (GO annotations were used to reduce false positive protein-protein interactions (PPI pairs resulting from computational predictions. Using experimentally obtained PPI pairs as a training dataset, eight top-ranking keywords were extracted from GO molecular function annotations. The sensitivity of these keywords is 64.21% in the yeast experimental dataset and 80.83% in the worm experimental dataset. The specificities, a measure of recovery power, of these keywords applied to four predicted PPI datasets for each studied organisms, are 48.32% and 46.49% (by average of four datasets in yeast and worm, respectively. Based on eight top-ranking keywords and co-localization of interacting proteins a set of two knowledge rules were deduced and applied to remove false positive protein pairs. The 'strength', a measure of improvement provided by the rules was defined based on the signal-to-noise ratio and implemented to measure the applicability of knowledge rules applying to the predicted PPI datasets. Depending on the employed PPI-predicting methods, the strength varies between two and ten-fold of randomly removing protein pairs from the datasets. Conclusion Gene Ontology annotations along with the deduced knowledge rules could be implemented to partially

  13. Computational prediction of over-annotated protein-coding genes in the genome of Agrobacterium tumefaciens strain C58

    Science.gov (United States)

    Yu, Jia-Feng; Sui, Tian-Xiang; Wang, Hong-Mei; Wang, Chun-Ling; Jing, Li; Wang, Ji-Hua

    2015-12-01

    Agrobacterium tumefaciens strain C58 is a type of pathogen that can cause tumors in some dicotyledonous plants. Ever since the genome of A. tumefaciens strain C58 was sequenced, the quality of annotation of its protein-coding genes has been queried continually, because the annotation varies greatly among different databases. In this paper, the questionable hypothetical genes were re-predicted by integrating the TN curve and Z curve methods. As a result, 30 genes originally annotated as “hypothetical” were discriminated as being non-coding sequences. By testing the re-prediction program 10 times on data sets composed of the function-known genes, the mean accuracy of 99.99% and mean Matthews correlation coefficient value of 0.9999 were obtained. Further sequence analysis and COG analysis showed that the re-annotation results were very reliable. This work can provide an efficient tool and data resources for future studies of A. tumefaciens strain C58. Project supported by the National Natural Science Foundation of China (Grant Nos. 61302186 and 61271378) and the Funding from the State Key Laboratory of Bioelectronics of Southeast University.

  14. Computational prediction of over-annotated protein-coding genes in the genome of Agrobacterium tumefaciens strain C58

    International Nuclear Information System (INIS)

    Yu Jia-Feng; Sui Tian-Xiang; Wang Ji-Hua; Wang Hong-Mei; Wang Chun-Ling; Jing Li

    2015-01-01

    Agrobacterium tumefaciens strain C58 is a type of pathogen that can cause tumors in some dicotyledonous plants. Ever since the genome of A. tumefaciens strain C58 was sequenced, the quality of annotation of its protein-coding genes has been queried continually, because the annotation varies greatly among different databases. In this paper, the questionable hypothetical genes were re-predicted by integrating the TN curve and Z curve methods. As a result, 30 genes originally annotated as “hypothetical” were discriminated as being non-coding sequences. By testing the re-prediction program 10 times on data sets composed of the function-known genes, the mean accuracy of 99.99% and mean Matthews correlation coefficient value of 0.9999 were obtained. Further sequence analysis and COG analysis showed that the re-annotation results were very reliable. This work can provide an efficient tool and data resources for future studies of A. tumefaciens strain C58. (special topic)

  15. Sequence-based heuristics for faster annotation of non-coding RNA families.

    Science.gov (United States)

    Weinberg, Zasha; Ruzzo, Walter L

    2006-01-01

    Non-coding RNAs (ncRNAs) are functional RNA molecules that do not code for proteins. Covariance Models (CMs) are a useful statistical tool to find new members of an ncRNA gene family in a large genome database, using both sequence and, importantly, RNA secondary structure information. Unfortunately, CM searches are extremely slow. Previously, we created rigorous filters, which provably sacrifice none of a CM's accuracy, while making searches significantly faster for virtually all ncRNA families. However, these rigorous filters make searches slower than heuristics could be. In this paper we introduce profile HMM-based heuristic filters. We show that their accuracy is usually superior to heuristics based on BLAST. Moreover, we compared our heuristics with those used in tRNAscan-SE, whose heuristics incorporate a significant amount of work specific to tRNAs, where our heuristics are generic to any ncRNA. Performance was roughly comparable, so we expect that our heuristics provide a high-quality solution that--unlike family-specific solutions--can scale to hundreds of ncRNA families. The source code is available under GNU Public License at the supplementary web site.

  16. Meta4: a web application for sharing and annotating metagenomic gene predictions using web services.

    Science.gov (United States)

    Richardson, Emily J; Escalettes, Franck; Fotheringham, Ian; Wallace, Robert J; Watson, Mick

    2013-01-01

    Whole-genome shotgun metagenomics experiments produce DNA sequence data from entire ecosystems, and provide a huge amount of novel information. Gene discovery projects require up-to-date information about sequence homology and domain structure for millions of predicted proteins to be presented in a simple, easy-to-use system. There is a lack of simple, open, flexible tools that allow the rapid sharing of metagenomics datasets with collaborators in a format they can easily interrogate. We present Meta4, a flexible and extensible web application that can be used to share and annotate metagenomic gene predictions. Proteins and predicted domains are stored in a simple relational database, with a dynamic front-end which displays the results in an internet browser. Web services are used to provide up-to-date information about the proteins from homology searches against public databases. Information about Meta4 can be found on the project website, code is available on Github, a cloud image is available, and an example implementation can be seen at.

  17. Reduce manual curation by combining gene predictions from multiple annotation engines, a case study of start codon prediction.

    Directory of Open Access Journals (Sweden)

    Thomas H A Ederveen

    Full Text Available Nowadays, prokaryotic genomes are sequenced faster than the capacity to manually curate gene annotations. Automated genome annotation engines provide users a straight-forward and complete solution for predicting ORF coordinates and function. For many labs, the use of AGEs is therefore essential to decrease the time necessary for annotating a given prokaryotic genome. However, it is not uncommon for AGEs to provide different and sometimes conflicting predictions. Combining multiple AGEs might allow for more accurate predictions. Here we analyzed the ab initio open reading frame (ORF calling performance of different AGEs based on curated genome annotations of eight strains from different bacterial species with GC% ranging from 35-52%. We present a case study which demonstrates a novel way of comparative genome annotation, using combinations of AGEs in a pre-defined order (or path to predict ORF start codons. The order of AGE combinations is from high to low specificity, where the specificity is based on the eight genome annotations. For each AGE combination we are able to derive a so-called projected confidence value, which is the average specificity of ORF start codon prediction based on the eight genomes. The projected confidence enables estimating likeliness of a correct prediction for a particular ORF start codon by a particular AGE combination, pinpointing ORFs notoriously difficult to predict start codons. We correctly predict start codons for 90.5±4.8% of the genes in a genome (based on the eight genomes with an accuracy of 81.1±7.6%. Our consensus-path methodology allows a marked improvement over majority voting (9.7±4.4% and with an optimal path ORF start prediction sensitivity is gained while maintaining a high specificity.

  18. MeSH key terms for validation and annotation of gene expression clusters

    Energy Technology Data Exchange (ETDEWEB)

    Rechtsteiner, A. (Andreas); Rocha, L. M. (Luis Mateus)

    2004-01-01

    Integration of different sources of information is a great challenge for the analysis of gene expression data, and for the field of Functional Genomics in general. As the availability of numerical data from high-throughput methods increases, so does the need for technologies that assist in the validation and evaluation of the biological significance of results extracted from these data. In mRNA assaying with microarrays, for example, numerical analysis often attempts to identify clusters of co-expressed genes. The important task to find the biological significance of the results and validate them has so far mostly fallen to the biological expert who had to perform this task manually. One of the most promising avenues to develop automated and integrative technology for such tasks lies in the application of modern Information Retrieval (IR) and Knowledge Management (KM) algorithms to databases with biomedical publications and data. Examples of databases available for the field are bibliographic databases c ntaining scientific publications (e.g. MEDLINE/PUBMED), databases containing sequence data (e.g. GenBank) and databases of semantic annotations (e.g. the Gene Ontology Consortium and Medical Subject Headings (MeSH)). We present here an approach that uses the MeSH terms and their concept hierarchies to validate and obtain functional information for gene expression clusters. The controlled and hierarchical MeSH vocabulary is used by the National Library of Medicine (NLM) to index all the articles cited in MEDLINE. Such indexing with a controlled vocabulary eliminates some of the ambiguity due to polysemy (terms that have multiple meanings) and synonymy (multiple terms have similar meaning) that would be encountered if terms would be extracted directly from the articles due to differing article contexts or author preferences and background. Further, the hierarchical organization of the MeSH terms can illustrate the conceptuallfunctional relationships of genes

  19. Canine candidate genes for dilated cardiomyopathy: annotation of and polymorphic markers for 14 genes

    OpenAIRE

    Wiersma, Anje C; Leegwater, Peter AJ; van Oost, Bernard A; Ollier, William E; Dukes-McEwan, Joanna

    2007-01-01

    Abstract Background Dilated cardiomyopathy is a myocardial disease occurring in humans and domestic animals and is characterized by dilatation of the left ventricle, reduced systolic function and increased sphericity of the left ventricle. Dilated cardiomyopathy has been observed in several, mostly large and giant, dog breeds, such as the Dobermann and the Great Dane. A number of genes have been identified, which are associated with dilated cardiomyopathy in the human, mouse and hamster. Thes...

  20. Ribosome Profiling Reveals Pervasive Translation Outside of Annotated Protein-Coding Genes

    Directory of Open Access Journals (Sweden)

    Nicholas T. Ingolia

    2014-09-01

    Full Text Available Ribosome profiling suggests that ribosomes occupy many regions of the transcriptome thought to be noncoding, including 5′ UTRs and long noncoding RNAs (lncRNAs. Apparent ribosome footprints outside of protein-coding regions raise the possibility of artifacts unrelated to translation, particularly when they occupy multiple, overlapping open reading frames (ORFs. Here, we show hallmarks of translation in these footprints: copurification with the large ribosomal subunit, response to drugs targeting elongation, trinucleotide periodicity, and initiation at early AUGs. We develop a metric for distinguishing between 80S footprints and nonribosomal sources using footprint size distributions, which validates the vast majority of footprints outside of coding regions. We present evidence for polypeptide production beyond annotated genes, including the induction of immune responses following human cytomegalovirus (HCMV infection. Translation is pervasive on cytosolic transcripts outside of conserved reading frames, and direct detection of this expanded universe of translated products enables efforts at understanding how cells manage and exploit its consequences.

  1. Gene Structures, Evolution and Transcriptional Profiling of the WRKY Gene Family in Castor Bean (Ricinus communis L.).

    Science.gov (United States)

    Zou, Zhi; Yang, Lifu; Wang, Danhua; Huang, Qixing; Mo, Yeyong; Xie, Guishui

    2016-01-01

    WRKY proteins comprise one of the largest transcription factor families in plants and form key regulators of many plant processes. This study presents the characterization of 58 WRKY genes from the castor bean (Ricinus communis L., Euphorbiaceae) genome. Compared with the automatic genome annotation, one more WRKY-encoding locus was identified and 20 out of the 57 predicted gene models were manually corrected. All RcWRKY genes were shown to contain at least one intron in their coding sequences. According to the structural features of the present WRKY domains, the identified RcWRKY genes were assigned to three previously defined groups (I-III). Although castor bean underwent no recent whole-genome duplication event like physic nut (Jatropha curcas L., Euphorbiaceae), comparative genomics analysis indicated that one gene loss, one intron loss and one recent proximal duplication occurred in the RcWRKY gene family. The expression of all 58 RcWRKY genes was supported by ESTs and/or RNA sequencing reads derived from roots, leaves, flowers, seeds and endosperms. Further global expression profiles with RNA sequencing data revealed diverse expression patterns among various tissues. Results obtained from this study not only provide valuable information for future functional analysis and utilization of the castor bean WRKY genes, but also provide a useful reference to investigate the gene family expansion and evolution in Euphorbiaceus plants.

  2. An integrated and comparative approach towards identification, characterization and functional annotation of candidate genes for drought tolerance in sorghum (Sorghum bicolor (L.) Moench).

    Science.gov (United States)

    Woldesemayat, Adugna Abdi; Van Heusden, Peter; Ndimba, Bongani K; Christoffels, Alan

    2017-12-22

    Drought is the most disastrous abiotic stress that severely affects agricultural productivity worldwide. Understanding the biological basis of drought-regulated traits, requires identification and an in-depth characterization of genetic determinants using model organisms and high-throughput technologies. However, studies on drought tolerance have generally been limited to traditional candidate gene approach that targets only a single gene in a pathway that is related to a trait. In this study, we used sorghum, one of the model crops that is well adapted to arid regions, to mine genes and define determinants for drought tolerance using drought expression libraries and RNA-seq data. We provide an integrated and comparative in silico candidate gene identification, characterization and annotation approach, with an emphasis on genes playing a prominent role in conferring drought tolerance in sorghum. A total of 470 non-redundant functionally annotated drought responsive genes (DRGs) were identified using experimental data from drought responses by employing pairwise sequence similarity searches, pathway and interpro-domain analysis, expression profiling and orthology relation. Comparison of the genomic locations between these genes and sorghum quantitative trait loci (QTLs) showed that 40% of these genes were co-localized with QTLs known for drought tolerance. The genome reannotation conducted using the Program to Assemble Spliced Alignment (PASA), resulted in 9.6% of existing single gene models being updated. In addition, 210 putative novel genes were identified using AUGUSTUS and PASA based analysis on expression dataset. Among these, 50% were single exonic, 69.5% represented drought responsive and 5.7% were complete gene structure models. Analysis of biochemical metabolism revealed 14 metabolic pathways that are related to drought tolerance and also had a strong biological network, among categories of genes involved. Identification of these pathways, signifies the

  3. Genome Wide Re-Annotation of Caldicellulosiruptor saccharolyticus with New Insights into Genes Involved in Biomass Degradation and Hydrogen Production.

    Science.gov (United States)

    Chowdhary, Nupoor; Selvaraj, Ashok; KrishnaKumaar, Lakshmi; Kumar, Gopal Ramesh

    2015-01-01

    Caldicellulosiruptor saccharolyticus has proven itself to be an excellent candidate for biological hydrogen (H2) production, but still it has major drawbacks like sensitivity to high osmotic pressure and low volumetric H2 productivity, which should be considered before it can be used industrially. A whole genome re-annotation work has been carried out as an attempt to update the incomplete genome information that causes gap in the knowledge especially in the area of metabolic engineering, to improve the H2 producing capabilities of C. saccharolyticus. Whole genome re-annotation was performed through manual means for 2,682 Coding Sequences (CDSs). Bioinformatics tools based on sequence similarity, motif search, phylogenetic analysis and fold recognition were employed for re-annotation. Our methodology could successfully add functions for 409 hypothetical proteins (HPs), 46 proteins previously annotated as putative and assigned more accurate functions for the known protein sequences. Homology based gene annotation has been used as a standard method for assigning function to novel proteins, but over the past few years many non-homology based methods such as genomic context approaches for protein function prediction have been developed. Using non-homology based functional prediction methods, we were able to assign cellular processes or physical complexes for 249 hypothetical sequences. Our re-annotation pipeline highlights the addition of 231 new CDSs generated from MicroScope Platform, to the original genome with functional prediction for 49 of them. The re-annotation of HPs and new CDSs is stored in the relational database that is available on the MicroScope web-based platform. In parallel, a comparative genome analyses were performed among the members of genus Caldicellulosiruptor to understand the function and evolutionary processes. Further, with results from integrated re-annotation studies (homology and genomic context approach), we strongly suggest that Csac

  4. Genome Wide Re-Annotation of Caldicellulosiruptor saccharolyticus with New Insights into Genes Involved in Biomass Degradation and Hydrogen Production.

    Directory of Open Access Journals (Sweden)

    Nupoor Chowdhary

    Full Text Available Caldicellulosiruptor saccharolyticus has proven itself to be an excellent candidate for biological hydrogen (H2 production, but still it has major drawbacks like sensitivity to high osmotic pressure and low volumetric H2 productivity, which should be considered before it can be used industrially. A whole genome re-annotation work has been carried out as an attempt to update the incomplete genome information that causes gap in the knowledge especially in the area of metabolic engineering, to improve the H2 producing capabilities of C. saccharolyticus. Whole genome re-annotation was performed through manual means for 2,682 Coding Sequences (CDSs. Bioinformatics tools based on sequence similarity, motif search, phylogenetic analysis and fold recognition were employed for re-annotation. Our methodology could successfully add functions for 409 hypothetical proteins (HPs, 46 proteins previously annotated as putative and assigned more accurate functions for the known protein sequences. Homology based gene annotation has been used as a standard method for assigning function to novel proteins, but over the past few years many non-homology based methods such as genomic context approaches for protein function prediction have been developed. Using non-homology based functional prediction methods, we were able to assign cellular processes or physical complexes for 249 hypothetical sequences. Our re-annotation pipeline highlights the addition of 231 new CDSs generated from MicroScope Platform, to the original genome with functional prediction for 49 of them. The re-annotation of HPs and new CDSs is stored in the relational database that is available on the MicroScope web-based platform. In parallel, a comparative genome analyses were performed among the members of genus Caldicellulosiruptor to understand the function and evolutionary processes. Further, with results from integrated re-annotation studies (homology and genomic context approach, we strongly

  5. The ACBP gene family in Rhodnius prolixus

    DEFF Research Database (Denmark)

    Majerowicz, David; Hannibal-Bach, Hans K; Castro, Rodolfo S C

    2016-01-01

    The acyl-CoA-binding proteins (ACBP) constitute a family of conserved proteins that bind acyl-CoA with high affinity and protect it from hydrolysis. Thus, ACBPs may have essential roles in basal cellular lipid metabolism. The genome of the insect Rhodnius prolixus encodes five ACBP genes similar...

  6. PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data.

    Science.gov (United States)

    Hawkins, Troy; Chitale, Meghana; Luban, Stanislav; Kihara, Daisuke

    2009-02-15

    Protein function prediction is a central problem in bioinformatics, increasing in importance recently due to the rapid accumulation of biological data awaiting interpretation. Sequence data represents the bulk of this new stock and is the obvious target for consideration as input, as newly sequenced organisms often lack any other type of biological characterization. We have previously introduced PFP (Protein Function Prediction) as our sequence-based predictor of Gene Ontology (GO) functional terms. PFP interprets the results of a PSI-BLAST search by extracting and scoring individual functional attributes, searching a wide range of E-value sequence matches, and utilizing conventional data mining techniques to fill in missing information. We have shown it to be effective in predicting both specific and low-resolution functional attributes when sufficient data is unavailable. Here we describe (1) significant improvements to the PFP infrastructure, including the addition of prediction significance and confidence scores, (2) a thorough benchmark of performance and comparisons to other related prediction methods, and (3) applications of PFP predictions to genome-scale data. We applied PFP predictions to uncharacterized protein sequences from 15 organisms. Among these sequences, 60-90% could be annotated with a GO molecular function term at high confidence (>or=80%). We also applied our predictions to the protein-protein interaction network of the Malaria plasmodium (Plasmodium falciparum). High confidence GO biological process predictions (>or=90%) from PFP increased the number of fully enriched interactions in this dataset from 23% of interactions to 94%. Our benchmark comparison shows significant performance improvement of PFP relative to GOtcha, InterProScan, and PSI-BLAST predictions. This is consistent with the performance of PFP as the overall best predictor in both the AFP-SIG '05 and CASP7 function (FN) assessments. PFP is available as a web service at http

  7. The systematic annotation of the three main GPCR families in Reactome.

    Science.gov (United States)

    Jassal, Bijay; Jupe, Steven; Caudy, Michael; Birney, Ewan; Stein, Lincoln; Hermjakob, Henning; D'Eustachio, Peter

    2010-07-29

    Reactome is an open-source, freely available database of human biological pathways and processes. A major goal of our work is to provide an integrated view of cellular signalling processes that spans from ligand-receptor interactions to molecular readouts at the level of metabolic and transcriptional events. To this end, we have built the first catalogue of all human G protein-coupled receptors (GPCRs) known to bind endogenous or natural ligands. The UniProt database has records for 797 proteins classified as GPCRs and sorted into families A/1, B/2 and C/3 on the basis of amino acid sequence. To these records we have added details from the IUPHAR database and our own manual curation of relevant literature to create reactions in which 563 GPCRs bind ligands and also interact with specific G-proteins to initiate signalling cascades. We believe the remaining 234 GPCRs are true orphans. The Reactome GPCR pathway can be viewed as a detailed interactive diagram and can be exported in many forms. It provides a template for the orthology-based inference of GPCR reactions for diverse model organism species, and can be overlaid with protein-protein interaction and gene expression datasets to facilitate overrepresentation studies and other forms of pathway analysis. Database URL: http://www.reactome.org.

  8. The Eucalyptus terpene synthase gene family.

    Science.gov (United States)

    Külheim, Carsten; Padovan, Amanda; Hefer, Charles; Krause, Sandra T; Köllner, Tobias G; Myburg, Alexander A; Degenhardt, Jörg; Foley, William J

    2015-06-11

    Terpenoids are abundant in the foliage of Eucalyptus, providing the characteristic smell as well as being valuable economically and influencing ecological interactions. Quantitative and qualitative inter- and intra- specific variation of terpenes is common in eucalypts. The genome sequences of Eucalyptus grandis and E. globulus were mined for terpene synthase genes (TPS) and compared to other plant species. We investigated the relative expression of TPS in seven plant tissues and functionally characterized five TPS genes from E. grandis. Compared to other sequenced plant genomes, Eucalyptus grandis has the largest number of putative functional TPS genes of any sequenced plant. We discovered 113 and 106 putative functional TPS genes in E. grandis and E. globulus, respectively. All but one TPS from E. grandis were expressed in at least one of seven plant tissues examined. Genomic clusters of up to 20 genes were identified. Many TPS are expressed in tissues other than leaves which invites a re-evaluation of the function of terpenes in Eucalyptus. Our data indicate that terpenes in Eucalyptus may play a wider role in biotic and abiotic interactions than previously thought. Tissue specific expression is common and the possibility of stress induction needs further investigation. Phylogenetic comparison of the two investigated Eucalyptus species gives insight about recent evolution of different clades within the TPS gene family. While the majority of TPS genes occur in orthologous pairs some clades show evidence of recent gene duplication, as well as loss of function.

  9. The human protein disulfide isomerase gene family

    Directory of Open Access Journals (Sweden)

    Galligan James J

    2012-07-01

    Full Text Available Abstract Enzyme-mediated disulfide bond formation is a highly conserved process affecting over one-third of all eukaryotic proteins. The enzymes primarily responsible for facilitating thiol-disulfide exchange are members of an expanding family of proteins known as protein disulfide isomerases (PDIs. These proteins are part of a larger superfamily of proteins known as the thioredoxin protein family (TRX. As members of the PDI family of proteins, all proteins contain a TRX-like structural domain and are predominantly expressed in the endoplasmic reticulum. Subcellular localization and the presence of a TRX domain, however, comprise the short list of distinguishing features required for gene family classification. To date, the PDI gene family contains 21 members, varying in domain composition, molecular weight, tissue expression, and cellular processing. Given their vital role in protein-folding, loss of PDI activity has been associated with the pathogenesis of numerous disease states, most commonly related to the unfolded protein response (UPR. Over the past decade, UPR has become a very attractive therapeutic target for multiple pathologies including Alzheimer disease, Parkinson disease, alcoholic and non-alcoholic liver disease, and type-2 diabetes. Understanding the mechanisms of protein-folding, specifically thiol-disulfide exchange, may lead to development of a novel class of therapeutics that would help alleviate a wide range of diseases by targeting the UPR.

  10. Annotation of a hybrid partial genome of the Coffee Rust (Hemileia vastatrix contributes to the gene repertoire catalogue of the Pucciniales

    Directory of Open Access Journals (Sweden)

    Marco Aurelio Cristancho

    2014-10-01

    Full Text Available Coffee leaf rust caused by the fungus Hemileia vastatrix is the most damaging disease to coffee worldwide. The pathogen has recently appeared in multiple outbreaks in coffee producing countries resulting in significant yield losses and increases in costs related to its control. New races/isolates are constantly emerging as evidenced by the presence of the fungus in plants that were previously resistant. Genomic studies are opening new avenues for the study of the evolution of pathogens, the detailed description of plant-pathogen interactions and the development of molecular techniques for the identification of individual isolates. For this purpose we sequenced 8 different H. vastatrix isolates using NGS technologies and gathered partial genome assemblies due to the large repetitive content in the coffee rust hybrid genome; 74.4% of the assembled contigs harbor repetitive sequences. A hybrid assembly of 333Mb was built based on the 8 isolates; this assembly was used for subsequent analyses.Analysis of the conserved gene space showed that the hybrid H. vastatrix genome, though highly fragmented, had a satisfactory level of completion with 91.94% of core protein-coding orthologous genes present. RNA-Seq from urediniospores was used to guide the de novo annotation of the H. vastatrix gene complement. In total, 14,445 genes organized in 3,921 families were uncovered; a considerable proportion of the predicted proteins (73.8% were homologous to other Pucciniales species genomes. Several gene families related to the fungal lifestyle were identified, particularly 483 predicted secreted proteins that represent candidate effector genes and will provide interesting hints to decipher virulence in the coffee rust fungus. The genome sequence of Hva will serve as a template to understand the molecular mechanisms used by this fungus to attack the coffee plant, to study the diversity of this species and for the development of molecular markers to distinguish

  11. Genome-wide profiling of 24 hr diel rhythmicity in the water flea, Daphnia pulex: network analysis reveals rhythmic gene expression and enhances functional gene annotation.

    Science.gov (United States)

    Rund, Samuel S C; Yoo, Boyoung; Alam, Camille; Green, Taryn; Stephens, Melissa T; Zeng, Erliang; George, Gary F; Sheppard, Aaron D; Duffield, Giles E; Milenković, Tijana; Pfrender, Michael E

    2016-08-18

    Marine and freshwater zooplankton exhibit daily rhythmic patterns of behavior and physiology which may be regulated directly by the light:dark (LD) cycle and/or a molecular circadian clock. One of the best-studied zooplankton taxa, the freshwater crustacean Daphnia, has a 24 h diel vertical migration (DVM) behavior whereby the organism travels up and down through the water column daily. DVM plays a critical role in resource tracking and the behavioral avoidance of predators and damaging ultraviolet radiation. However, there is little information at the transcriptional level linking the expression patterns of genes to the rhythmic physiology/behavior of Daphnia. Here we analyzed genome-wide temporal transcriptional patterns from Daphnia pulex collected over a 44 h time period under a 12:12 LD cycle (diel) conditions using a cosine-fitting algorithm. We used a comprehensive network modeling and analysis approach to identify novel co-regulated rhythmic genes that have similar network topological properties and functional annotations as rhythmic genes identified by the cosine-fitting analyses. Furthermore, we used the network approach to predict with high accuracy novel gene-function associations, thus enhancing current functional annotations available for genes in this ecologically relevant model species. Our results reveal that genes in many functional groupings exhibit 24 h rhythms in their expression patterns under diel conditions. We highlight the rhythmic expression of immunity, oxidative detoxification, and sensory process genes. We discuss differences in the chronobiology of D. pulex from other well-characterized terrestrial arthropods. This research adds to a growing body of literature suggesting the genetic mechanisms governing rhythmicity in crustaceans may be divergent from other arthropod lineages including insects. Lastly, these results highlight the power of using a network analysis approach to identify differential gene expression and provide novel

  12. Prosecutor: parameter-free inference of gene function for prokaryotes using DNA microarray data, genomic context and multiple gene annotation sources

    Directory of Open Access Journals (Sweden)

    van Hijum Sacha AFT

    2008-10-01

    Full Text Available Abstract Background Despite a plethora of functional genomic efforts, the function of many genes in sequenced genomes remains unknown. The increasing amount of microarray data for many species allows employing the guilt-by-association principle to predict function on a large scale: genes exhibiting similar expression patterns are more likely to participate in shared biological processes. Results We developed Prosecutor, an application that enables researchers to rapidly infer gene function based on available gene expression data and functional annotations. Our parameter-free functional prediction method uses a sensitive algorithm to achieve a high association rate of linking genes with unknown function to annotated genes. Furthermore, Prosecutor utilizes additional biological information such as genomic context and known regulatory mechanisms that are specific for prokaryotes. We analyzed publicly available transcriptome data sets and used literature sources to validate putative functions suggested by Prosecutor. We supply the complete results of our analysis for 11 prokaryotic organisms on a dedicated website. Conclusion The Prosecutor software and supplementary datasets available at http://www.prosecutor.nl allow researchers working on any of the analyzed organisms to quickly identify the putative functions of their genes of interest. A de novo analysis allows new organisms to be studied.

  13. Characterization and gene expression analysis of the cir multi-gene family of plasmodium chabaudi chabaudi (AS

    Directory of Open Access Journals (Sweden)

    Lawton Jennifer

    2012-03-01

    Full Text Available Abstract Background The pir genes comprise the largest multi-gene family in Plasmodium, with members found in P. vivax, P. knowlesi and the rodent malaria species. Despite comprising up to 5% of the genome, little is known about the functions of the proteins encoded by pir genes. P. chabaudi causes chronic infection in mice, which may be due to antigenic variation. In this model, pir genes are called cirs and may be involved in this mechanism, allowing evasion of host immune responses. In order to fully understand the role(s of CIR proteins during P. chabaudi infection, a detailed characterization of the cir gene family was required. Results The cir repertoire was annotated and a detailed bioinformatic characterization of the encoded CIR proteins was performed. Two major sub-families were identified, which have been named A and B. Members of each sub-family displayed different amino acid motifs, and were thus predicted to have undergone functional divergence. In addition, the expression of the entire cir repertoire was analyzed via RNA sequencing and microarray. Up to 40% of the cir gene repertoire was expressed in the parasite population during infection, and dominant cir transcripts could be identified. In addition, some differences were observed in the pattern of expression between the cir subgroups at the peak of P. chabaudi infection. Finally, specific cir genes were expressed at different time points during asexual blood stages. Conclusions In conclusion, the large number of cir genes and their expression throughout the intraerythrocytic cycle of development indicates that CIR proteins are likely to be important for parasite survival. In particular, the detection of dominant cir transcripts at the peak of P. chabaudi infection supports the idea that CIR proteins are expressed, and could perform important functions in the biology of this parasite. Further application of the methodologies described here may allow the elucidation of CIR sub-family

  14. ORF Sequence: Ca19AnnotatedDec2004aaSeq [GENIUS II[Archive

    Lifescience Database Archive (English)

    Full Text Available Ca19AnnotatedDec2004aaSeq orf19.3361 >orf19.3361; Contig19-10173; 157397..>158185;... YAT2*; carnitine acetyltransferase; gene family | truncated protein MSTYRFQETLEKLPIPDLVQTCNAYLEALKPLQTEQEHE

  15. Automated update, revision, and quality control of the maize genome annotations using MAKER-P improves the B73 RefGen_v3 gene models and identifies new genes

    Science.gov (United States)

    The large size and relative complexity of many plant genomes make creation, quality control, and dissemination of high-quality gene structure annotations challenging. In response, we have developed MAKER-P, a fast and easy-to-use genome annotation engine for plants. Here, we report the use of MAKER-...

  16. Dlx homeobox gene family expression in osteoclasts.

    Science.gov (United States)

    Lézot, F; Thomas, B L; Blin-Wakkach, C; Castaneda, B; Bolanos, A; Hotton, D; Sharpe, P T; Heymann, D; Carles, G F; Grigoriadis, A E; Berdal, A

    2010-06-01

    Skeletal growth and homeostasis require the finely orchestrated secretion of mineralized tissue matrices by highly specialized cells, balanced with their degradation by osteoclasts. Time- and site-specific expression of Dlx and Msx homeobox genes in the cells secreting these matrices have been identified as important elements in the regulation of skeletal morphology. Such specific expression patterns have also been reported in osteoclasts for Msx genes. The aim of the present study was to establish the expression patterns of Dlx genes in osteoclasts and identify their function in regulating skeletal morphology. The expression patterns of all Dlx genes were examined during the whole osteoclastogenesis using different in vitro models. The results revealed that Dlx1 and Dlx2 are the only Dlx family members with a possible function in osteoclastogenesis as well as in mature osteoclasts. Dlx5 and Dlx6 were detected in the cultures but appear to be markers of monocytes and their derivatives. In vivo, Dlx2 expression in osteoclasts was examined using a Dlx2/LacZ transgenic mouse. Dlx2 is expressed in a subpopulation of osteoclasts in association with tooth, brain, nerve, and bone marrow volumetric growths. Altogether the present data suggest a role for Dlx2 in regulation of skeletal morphogenesis via functions within osteoclasts. (c) 2010 Wiley-Liss, Inc.

  17. PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements.

    Science.gov (United States)

    Mi, Huaiyu; Huang, Xiaosong; Muruganujan, Anushya; Tang, Haiming; Mills, Caitlin; Kang, Diane; Thomas, Paul D

    2017-01-04

    The PANTHER database (Protein ANalysis THrough Evolutionary Relationships, http://pantherdb.org) contains comprehensive information on the evolution and function of protein-coding genes from 104 completely sequenced genomes. PANTHER software tools allow users to classify new protein sequences, and to analyze gene lists obtained from large-scale genomics experiments. In the past year, major improvements include a large expansion of classification information available in PANTHER, as well as significant enhancements to the analysis tools. Protein subfamily functional classifications have more than doubled due to progress of the Gene Ontology Phylogenetic Annotation Project. For human genes (as well as a few other organisms), PANTHER now also supports enrichment analysis using pathway classifications from the Reactome resource. The gene list enrichment tools include a new 'hierarchical view' of results, enabling users to leverage the structure of the classifications/ontologies; the tools also allow users to upload genetic variant data directly, rather than requiring prior conversion to a gene list. The updated coding single-nucleotide polymorphisms (SNP) scoring tool uses an improved algorithm. The hidden Markov model (HMM) search tools now use HMMER3, dramatically reducing search times and improving accuracy of E-value statistics. Finally, the PANTHER Tree-Attribute Viewer has been implemented in JavaScript, with new views for exploring protein sequence evolution. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. [Genome-wide identification and bioinformatic analysis of PPR gene family in tomato].

    Science.gov (United States)

    Ding, Anming; Li, Ling; Qu, Xu; Sun, Tingting; Chen, Yaqiong; Zong, Peng; Li, Zunqiang; Gong, Daping; Sun, Yuhe

    2014-01-01

    Pentatricopeptide repeats (PPRs) genes constitute one of the largest gene families in plants, which play a broad and essential role in plant growth and development. In this study, the protein sequences annotated by the tomato (S. lycopersicum L.) genome project were screened with the Pfam PPR sequences. A total of 471 putative PPR-encoding genes were identified. Based on the motifs defined in A. thaliana L., protein structure and conserved sequences for each tomato motif were analyzed. We also analyzed phylogenetic relationship, subcellular localization, expression and GO analysis of the identified gene sequences. Our results demonstrate that tomato PPR gene family contains two subfamilies, P and PLS, each accounting for half of the family. PLS subfamily can be divided into four subclasses i.e., PLS, E, E+ and DYW. Each subclass of sequences forms a clade in the phylogenetic tree. The PPR motifs were found highly conserved among plants. The tomato PPR genes were distributed over 12 chromosomes and most of them lack introns. The majority of PPR proteins harbor mitochondrial or chloroplast localization sequences, whereas GO analysis showed that most PPR proteins participate in RNA-related biological processes.

  19. Mining and gene ontology based annotation of SSR markers from expressed sequence tags of Humulus lupulus

    Science.gov (United States)

    Singh, Swati; Gupta, Sanchita; Mani, Ashutosh; Chaturvedi, Anoop

    2012-01-01

    Humulus lupulus is commonly known as hops, a member of the family moraceae. Currently many projects are underway leading to the accumulation of voluminous genomic and expressed sequence tag sequences in public databases. The genetically characterized domains in these databases are limited due to non-availability of reliable molecular markers. The large data of EST sequences are available in hops. The simple sequence repeat markers extracted from EST data are used as molecular markers for genetic characterization, in the present study. 25,495 EST sequences were examined and assembled to get full-length sequences. Maximum frequency distribution was shown by mononucleotide SSR motifs i.e. 60.44% in contig and 62.16% in singleton where as minimum frequency are observed for hexanucleotide SSR in contig (0.09%) and pentanucleotide SSR in singletons (0.12%). Maximum trinucleotide motifs code for Glutamic acid (GAA) while AT/TA were the most frequent repeat of dinucleotide SSRs. Flanking primer pairs were designed in-silico for the SSR containing sequences. Functional categorization of SSRs containing sequences was done through gene ontology terms like biological process, cellular component and molecular function. PMID:22368382

  20. Annotated genetic linkage maps of Pinus pinaster Ait. from a Central Spain population using microsatellite and gene based markers.

    Science.gov (United States)

    de Miguel, Marina; de Maria, Nuria; Guevara, M Angeles; Diaz, Luis; Sáez-Laguna, Enrique; Sánchez-Gómez, David; Chancerel, Emilie; Aranda, Ismael; Collada, Carmen; Plomion, Christophe; Cabezas, José-Antonio; Cervera, María-Teresa

    2012-10-04

    Pinus pinaster Ait. is a major resin producing species in Spain. Genetic linkage mapping can facilitate marker-assisted selection (MAS) through the identification of Quantitative Trait Loci and selection of allelic variants of interest in breeding populations. In this study, we report annotated genetic linkage maps for two individuals (C14 and C15) belonging to a breeding program aiming to increase resin production. We use different types of DNA markers, including last-generation molecular markers. We obtained 13 and 14 linkage groups for C14 and C15 maps, respectively. A total of 211 and 215 markers were positioned on each map and estimated genome length was between 1,870 and 2,166 cM respectively, which represents near 65% of genome coverage. Comparative mapping with previously developed genetic linkage maps for P. pinaster based on about 60 common markers enabled aligning linkage groups to this reference map. The comparison of our annotated linkage maps and linkage maps reporting QTL information revealed 11 annotated SNPs in candidate genes that co-localized with previously reported QTLs for wood properties and water use efficiency. This study provides genetic linkage maps from a Spanish population that shows high levels of genetic divergence with French populations from which segregating progenies have been previously mapped. These genetic maps will be of interest to construct a reliable consensus linkage map for the species. The importance of developing functional genetic linkage maps is highlighted, especially when working with breeding populations for its future application in MAS for traits of interest.

  1. Annotated genetic linkage maps of Pinus pinaster Ait. from a Central Spain population using microsatellite and gene based markers

    Directory of Open Access Journals (Sweden)

    de Miguel Marina

    2012-10-01

    Full Text Available Abstract Background Pinus pinaster Ait. is a major resin producing species in Spain. Genetic linkage mapping can facilitate marker-assisted selection (MAS through the identification of Quantitative Trait Loci and selection of allelic variants of interest in breeding populations. In this study, we report annotated genetic linkage maps for two individuals (C14 and C15 belonging to a breeding program aiming to increase resin production. We use different types of DNA markers, including last-generation molecular markers. Results We obtained 13 and 14 linkage groups for C14 and C15 maps, respectively. A total of 211 and 215 markers were positioned on each map and estimated genome length was between 1,870 and 2,166 cM respectively, which represents near 65% of genome coverage. Comparative mapping with previously developed genetic linkage maps for P. pinaster based on about 60 common markers enabled aligning linkage groups to this reference map. The comparison of our annotated linkage maps and linkage maps reporting QTL information revealed 11 annotated SNPs in candidate genes that co-localized with previously reported QTLs for wood properties and water use efficiency. Conclusions This study provides genetic linkage maps from a Spanish population that shows high levels of genetic divergence with French populations from which segregating progenies have been previously mapped. These genetic maps will be of interest to construct a reliable consensus linkage map for the species. The importance of developing functional genetic linkage maps is highlighted, especially when working with breeding populations for its future application in MAS for traits of interest.

  2. Developmental gene discovery in a hemimetabolous insect: de novo assembly and annotation of a transcriptome for the cricket Gryllus bimaculatus.

    Directory of Open Access Journals (Sweden)

    Victor Zeng

    Full Text Available Most genomic resources available for insects represent the Holometabola, which are insects that undergo complete metamorphosis like beetles and flies. In contrast, the Hemimetabola (direct developing insects, representing the basal branches of the insect tree, have very few genomic resources. We have therefore created a large and publicly available transcriptome for the hemimetabolous insect Gryllus bimaculatus (cricket, a well-developed laboratory model organism whose potential for functional genetic experiments is currently limited by the absence of genomic resources. cDNA was prepared using mRNA obtained from adult ovaries containing all stages of oogenesis, and from embryo samples on each day of embryogenesis. Using 454 Titanium pyrosequencing, we sequenced over four million raw reads, and assembled them into 21,512 isotigs (predicted transcripts and 120,805 singletons with an average coverage per base pair of 51.3. We annotated the transcriptome manually for over 400 conserved genes involved in embryonic patterning, gametogenesis, and signaling pathways. BLAST comparison of the transcriptome against the NCBI non-redundant protein database (nr identified significant similarity to nr sequences for 55.5% of transcriptome sequences, and suggested that the transcriptome may contain 19,874 unique transcripts. For predicted transcripts without significant similarity to known sequences, we assessed their similarity to other orthopteran sequences, and determined that these transcripts contain recognizable protein domains, largely of unknown function. We created a searchable, web-based database to allow public access to all raw, assembled and annotated data. This database is to our knowledge the largest de novo assembled and annotated transcriptome resource available for any hemimetabolous insect. We therefore anticipate that these data will contribute significantly to more effective and higher-throughput deployment of molecular analysis tools in

  3. Annotated Gene and Proteome Data Support Recognition of Interconnections Between the Results of Different Experiments in Space Research

    Science.gov (United States)

    Bauer, Johann; Wehland, Markus; Pietsch, Jessica; Sickmann, Albert; Weber, Gerhard; Grimm, Daniela

    2016-06-01

    In a series of studies, human thyroid and endothelial cells exposed to real or simulated microgravity were analyzed in terms of changes in gene expression patterns or protein content. Due to the limitation of available cells in many space research experiments, comparative and control experiments had to be done in a serial manner. Therefore, detected genes or proteins were annotated with gene names and SwissProt numbers, in order to allow searches for interconnections between results obtained in different experiments by different methods. A crosscheck of several studies on the behavior of cytoskeletal genes and proteins suggested that clusters of cytoskeletal components change differently under the influence of microgravity and/or vibration in different cell types. The result that LOX and ISG15 gene expression were clearly altered during the Shenzhou-8 spaceflight mission could be estimated by comparison with the results of other experiments. The more than 100-fold down-regulation of LOX supports our hypothesis that the amount and stability of extracellular matrix have a great influence on the formation of three-dimensional aggregates under microgravity. The approximately 40-fold up-regulation of ISG15 cannot yet be explained in detail, but strongly suggests that ISGylation, an alternative form of posttranslational modification, plays a role in longterm cultures.

  4. Genome-wide association study and annotating candidate gene networks affecting age at first calving in Nellore cattle.

    Science.gov (United States)

    Mota, R R; Guimarães, S E F; Fortes, M R S; Hayes, B; Silva, F F; Verardo, L L; Kelly, M J; de Campos, C F; Guimarães, J D; Wenceslau, R R; Penitente-Filho, J M; Garcia, J F; Moore, S

    2017-12-01

    We performed a genome-wide mapping for the age at first calving (AFC) with the goal of annotating candidate genes that regulate fertility in Nellore cattle. Phenotypic data from 762 cows and 777k SNP genotypes from 2,992 bulls and cows were used. Single nucleotide polymorphism (SNP) effects based on the single-step GBLUP methodology were blocked into adjacent windows of 1 Megabase (Mb) to explain the genetic variance. SNP windows explaining more than 0.40% of the AFC genetic variance were identified on chromosomes 2, 8, 9, 14, 16 and 17. From these windows, we identified 123 coding protein genes that were used to build gene networks. From the association study and derived gene networks, putative candidate genes (e.g., PAPPA, PREP, FER1L6, TPR, NMNAT1, ACAD10, PCMTD1, CRH, OPKR1, NPBWR1 and NCOA2) and transcription factors (TF) (STAT1, STAT3, RELA, E2F1 and EGR1) were strongly associated with female fertility (e.g., negative regulation of luteinizing hormone secretion, folliculogenesis and establishment of uterine receptivity). Evidence suggests that AFC inheritance is complex and controlled by multiple loci across the genome. As several windows explaining higher proportion of the genetic variance were identified on chromosome 14, further studies investigating the interaction across haplotypes to better understand the molecular architecture behind AFC in Nellore cattle should be undertaken. © 2017 Blackwell Verlag GmbH.

  5. Integrative annotation of 21,037 human genes validated by full-length cDNA clones.

    Directory of Open Access Journals (Sweden)

    Tadashi Imanishi

    2004-06-01

    Full Text Available The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/. It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs, identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA

  6. Genomic organization, annotation, and ligand-receptor inferences of chicken chemokines and chemokine receptor genes based on comparative genomics

    Directory of Open Access Journals (Sweden)

    Sze Sing-Hoi

    2005-03-01

    Full Text Available Abstract Background Chemokines and their receptors play important roles in host defense, organogenesis, hematopoiesis, and neuronal communication. Forty-two chemokines and 19 cognate receptors have been found in the human genome. Prior to this report, only 11 chicken chemokines and 7 receptors had been reported. The objectives of this study were to systematically identify chicken chemokines and their cognate receptor genes in the chicken genome and to annotate these genes and ligand-receptor binding by a comparative genomics approach. Results Twenty-three chemokine and 14 chemokine receptor genes were identified in the chicken genome. All of the chicken chemokines contained a conserved CC, CXC, CX3C, or XC motif, whereas all the chemokine receptors had seven conserved transmembrane helices, four extracellular domains with a conserved cysteine, and a conserved DRYLAIV sequence in the second intracellular domain. The number of coding exons in these genes and the syntenies are highly conserved between human, mouse, and chicken although the amino acid sequence homologies are generally low between mammalian and chicken chemokines. Chicken genes were named with the systematic nomenclature used in humans and mice based on phylogeny, synteny, and sequence homology. Conclusion The independent nomenclature of chicken chemokines and chemokine receptors suggests that the chicken may have ligand-receptor pairings similar to mammals. All identified chicken chemokines and their cognate receptors were identified in the chicken genome except CCR9, whose ligand was not identified in this study. The organization of these genes suggests that there were a substantial number of these genes present before divergence between aves and mammals and more gene duplications of CC, CXC, CCR, and CXCR subfamilies in mammals than in aves after the divergence.

  7. De novo assembly, gene annotation, and marker discovery in stored-product pest Liposcelis entomophila (Enderlein using transcriptome sequences.

    Directory of Open Access Journals (Sweden)

    Dan-Dan Wei

    Full Text Available BACKGROUND: As a major stored-product pest insect, Liposcelis entomophila has developed high levels of resistance to various insecticides in grain storage systems. However, the molecular mechanisms underlying resistance and environmental stress have not been characterized. To date, there is a lack of genomic information for this species. Therefore, studies aimed at profiling the L. entomophila transcriptome would provide a better understanding of the biological functions at the molecular levels. METHODOLOGY/PRINCIPAL FINDINGS: We applied Illumina sequencing technology to sequence the transcriptome of L. entomophila. A total of 54,406,328 clean reads were obtained and that de novo assembled into 54,220 unigenes, with an average length of 571 bp. Through a similarity search, 33,404 (61.61% unigenes were matched to known proteins in the NCBI non-redundant (Nr protein database. These unigenes were further functionally annotated with gene ontology (GO, cluster of orthologous groups of proteins (COG, and Kyoto Encyclopedia of Genes and Genomes (KEGG databases. A large number of genes potentially involved in insecticide resistance were manually curated, including 68 putative cytochrome P450 genes, 37 putative glutathione S-transferase (GST genes, 19 putative carboxyl/cholinesterase (CCE genes, and other 126 transcripts to contain target site sequences or encoding detoxification genes representing eight types of resistance enzymes. Furthermore, to gain insight into the molecular basis of the L. entomophila toward thermal stresses, 25 heat shock protein (Hsp genes were identified. In addition, 1,100 SSRs and 57,757 SNPs were detected and 231 pairs of SSR primes were designed for investigating the genetic diversity in future. CONCLUSIONS/SIGNIFICANCE: We developed a comprehensive transcriptomic database for L. entomophila. These sequences and putative molecular markers would further promote our understanding of the molecular mechanisms underlying

  8. Comparative genome analysis of PHB gene family reveals deep evolutionary origins and diverse gene function.

    Science.gov (United States)

    Di, Chao; Xu, Wenying; Su, Zhen; Yuan, Joshua S

    2010-10-07

    PHB (Prohibitin) gene family is involved in a variety of functions important for different biological processes. PHB genes are ubiquitously present in divergent species from prokaryotes to eukaryotes. Human PHB genes have been found to be associated with various diseases. Recent studies by our group and others have shown diverse function of PHB genes in plants for development, senescence, defence, and others. Despite the importance of the PHB gene family, no comprehensive gene family analysis has been carried to evaluate the relatedness of PHB genes across different species. In order to better guide the gene function analysis and understand the evolution of the PHB gene family, we therefore carried out the comparative genome analysis of the PHB genes across different kingdoms. The relatedness, motif distribution, and intron/exon distribution all indicated that PHB genes is a relatively conserved gene family. The PHB genes can be classified into 5 classes and each class have a very deep evolutionary origin. The PHB genes within the class maintained the same motif patterns during the evolution. With Arabidopsis as the model species, we found that PHB gene intron/exon structure and domains are also conserved during the evolution. Despite being a conserved gene family, various gene duplication events led to the expansion of the PHB genes. Both segmental and tandem gene duplication were involved in Arabidopsis PHB gene family expansion. However, segmental duplication is predominant in Arabidopsis. Moreover, most of the duplicated genes experienced neofunctionalization. The results highlighted that PHB genes might be involved in important functions so that the duplicated genes are under the evolutionary pressure to derive new function. PHB gene family is a conserved gene family and accounts for diverse but important biological functions based on the similar molecular mechanisms. The highly diverse biological function indicated that more research needs to be carried out

  9. Identification of the 14-3-3 gene family in Rafflesia cantleyi

    Science.gov (United States)

    Rosli, Khadijah; Wan, Kiew-Lian

    2018-04-01

    Rafflesia is known to be the largest flower in the world. Due to its size and appearance, it is considered to be very unique. Little is known about the molecular biology of this rare parasitic flowering plant as it is very difficult to locate and has a short life-span as a flower. Physiological activities in plants are regulated by signalling regulators such as the members of the 14-3-3 gene family. The number of members of this gene family varies in plants and there are thirteen known members in Arabidopsis thaliana. Their role is to bind to phosphorylated targets to complete signal transduction processes. Sequence comparison using BLAST of transcriptome data from three different Rafflesia cantleyi floral bud stages against the Swissprot database revealed 27 transcripts annotated as members of this gene family. All of the transcripts were expressed during floral bud stage 1 (S1) while 14 and four transcripts were expressed during floral bud stages 2 (S2) and 3 (S3), respectively. Significant downregulation was recorded for six and nine transcripts at S1 vs. S2 and S2 vs. S3 respectively. This gene family may play a critical role as signalling regulators during the development of Rafflesia floral bud.

  10. The role of retrotransposons in gene family expansions: insights from the mouse Abp gene family.

    Science.gov (United States)

    Janoušek, Václav; Karn, Robert C; Laukaitis, Christina M

    2013-05-29

    Retrotransposons have been suggested to provide a substrate for non-allelic homologous recombination (NAHR) and thereby promote gene family expansion. Their precise role, however, is controversial. Here we ask whether retrotransposons contributed to the recent expansions of the Androgen-binding protein (Abp) gene families that occurred independently in the mouse and rat genomes. Using dot plot analysis, we found that the most recent duplication in the Abp region of the mouse genome is flanked by L1Md_T elements. Analysis of the sequence of these elements revealed breakpoints that are the relicts of the recombination that caused the duplication, confirming that the duplication arose as a result of NAHR using L1 elements as substrates. L1 and ERVII retrotransposons are considerably denser in the Abp regions than in one Mb flanking regions, while other repeat types are depleted in the Abp regions compared to flanking regions. L1 retrotransposons preferentially accumulated in the Abp gene regions after lineage separation and roughly followed the pattern of Abp gene expansion. By contrast, the proportion of shared vs. lineage-specific ERVII repeats in the Abp region resembles the rest of the genome. We confirmed the role of L1 repeats in Abp gene duplication with the identification of recombinant L1Md_T elements at the edges of the most recent mouse Abp gene duplication. High densities of L1 and ERVII repeats were found in the Abp gene region with abrupt transitions at the region boundaries, suggesting that their higher densities are tightly associated with Abp gene duplication. We observed that the major accumulation of L1 elements occurred after the split of the mouse and rat lineages and that there is a striking overlap between the timing of L1 accumulation and expansion of the Abp gene family in the mouse genome. Establishing a link between the accumulation of L1 elements and the expansion of the Abp gene family and identification of an NAHR-related breakpoint in

  11. An effective approach for annotation of protein families with low sequence similarity and conserved motifs: identifying GDSL hydrolases across the plant kingdom.

    Science.gov (United States)

    Vujaklija, Ivan; Bielen, Ana; Paradžik, Tina; Biđin, Siniša; Goldstein, Pavle; Vujaklija, Dušica

    2016-02-18

    The massive accumulation of protein sequences arising from the rapid development of high-throughput sequencing, coupled with automatic annotation, results in high levels of incorrect annotations. In this study, we describe an approach to decrease annotation errors of protein families characterized by low overall sequence similarity. The GDSL lipolytic family comprises proteins with multifunctional properties and high potential for pharmaceutical and industrial applications. The number of proteins assigned to this family has increased rapidly over the last few years. In particular, the natural abundance of GDSL enzymes reported recently in plants indicates that they could be a good source of novel GDSL enzymes. We noticed that a significant proportion of annotated sequences lack specific GDSL motif(s) or catalytic residue(s). Here, we applied motif-based sequence analyses to identify enzymes possessing conserved GDSL motifs in selected proteomes across the plant kingdom. Motif-based HMM scanning (Viterbi decoding-VD and posterior decoding-PD) and the here described PD/VD protocol were successfully applied on 12 selected plant proteomes to identify sequences with GDSL motifs. A significant number of identified GDSL sequences were novel. Moreover, our scanning approach successfully detected protein sequences lacking at least one of the essential motifs (171/820) annotated by Pfam profile search (PfamA) as GDSL. Based on these analyses we provide a curated list of GDSL enzymes from the selected plants. CLANS clustering and phylogenetic analysis helped us to gain a better insight into the evolutionary relationship of all identified GDSL sequences. Three novel GDSL subfamilies as well as unreported variations in GDSL motifs were discovered in this study. In addition, analyses of selected proteomes showed a remarkable expansion of GDSL enzymes in the lycophyte, Selaginella moellendorffii. Finally, we provide a general motif-HMM scanner which is easily accessible through

  12. The SPINK gene family and celiac disease susceptibility

    NARCIS (Netherlands)

    Wapenaar, M.C.; Monsuur, A.J.; Poell, J.; Slot, R. van 't; Meijer, J.W.R.; Meijer, G.A.; Mulder, C.J.; Mearin, M.L.; Wijmenga, C.

    2007-01-01

    The gene family of serine protease inhibitors of the Kazal type (SPINK) are functional and positional candidate genes for celiac disease (CD). Our aim was to assess the gut mucosal gene expression and genetic association of SPINK1, -2, -4, and -5 in the Dutch CD population. Gene expression was

  13. The SPINK gene family and celiac disease susceptibility

    NARCIS (Netherlands)

    Wapenaar, Martin C.; Monsuur, Alienke J.; Poell, Jos; Slot, Ruben Van 't; Meijer, Jos W. R.; Meijer, Gerrit A.; Mulder, Chris J.; Mearin, Maria Luisa; Wijmenga, Cisca

    The gene family of serine protease inhibitors of the Kazal type (SPINK) are functional and positional candidate genes for celiac disease (CD). Our aim was to assess the gut mucosal gene expression and genetic association of SPINK1, -2, -4, and -5 in the Dutch CD population. Gene expression was

  14. Array2BIO: from microarray expression data to functional annotation of co-regulated genes

    Directory of Open Access Journals (Sweden)

    Rasley Amy

    2006-06-01

    Full Text Available Abstract Background There are several isolated tools for partial analysis of microarray expression data. To provide an integrative, easy-to-use and automated toolkit for the analysis of Affymetrix microarray expression data we have developed Array2BIO, an application that couples several analytical methods into a single web based utility. Results Array2BIO converts raw intensities into probe expression values, automatically maps those to genes, and subsequently identifies groups of co-expressed genes using two complementary approaches: (1 comparative analysis of signal versus control and (2 clustering analysis of gene expression across different conditions. The identified genes are assigned to functional categories based on Gene Ontology classification and KEGG protein interaction pathways. Array2BIO reliably handles low-expressor genes and provides a set of statistical methods for quantifying expression levels, including Benjamini-Hochberg and Bonferroni multiple testing corrections. An automated interface with the ECR Browser provides evolutionary conservation analysis for the identified gene loci while the interconnection with Crème allows prediction of gene regulatory elements that underlie observed expression patterns. Conclusion We have developed Array2BIO – a web based tool for rapid comprehensive analysis of Affymetrix microarray expression data, which also allows users to link expression data to Dcode.org comparative genomics tools and integrates a system for translating co-expression data into mechanisms of gene co-regulation. Array2BIO is publicly available at http://array2bio.dcode.org.

  15. Mapping and annotating obesity-related genes in pig and human genomes.

    Science.gov (United States)

    Martelli, Pier Luigi; Fontanesi, Luca; Piovesan, Damiano; Fariselli, Piero; Casadio, Rita

    2014-01-01

    Background. Obesity is a major health problem in both developed and emerging countries. Obesity is a complex disease whose etiology involves genetic factors in strong interplay with environmental determinants and lifestyle. The discovery of genetic factors and biological pathways underlying human obesity is hampered by the difficulty in controlling the genetic background of human cohorts. Animal models are then necessary to further dissect the genetics of obesity. Pig has emerged as one of the most attractive models, because of the similarity with humans in the mechanisms regulating the fat deposition. Results. We collected the genes related to obesity in humans and to fat deposition traits in pig. We localized them on both human and pig genomes, building a map useful to interpret comparative studies on obesity. We characterized the collected genes structurally and functionally with BAR+ and mapped them on KEGG pathways and on STRING protein interaction network. Conclusions. The collected set consists of 361 obesity related genes in human and pig genomes. All genes were mapped on the human genome, and 54 could not be localized on the pig genome (release 2012). Only for 3 human genes there is no counterpart in pig, confirming that this animal is a good model for human obesity studies. Obesity related genes are mostly involved in regulation and signaling processes/pathways and relevant connection emerges between obesity-related genes and diseases such as cancer and infectious diseases.

  16. Cloning, annotation and expression analysis of mycoparasitism-related genes in Trichoderma harzianum 88.

    Science.gov (United States)

    Yao, Lin; Yang, Qian; Song, Jinzhu; Tan, Chong; Guo, Changhong; Wang, Li; Qu, Lianhai; Wang, Yun

    2013-04-01

    Trichoderma harzianum 88, a filamentous soil fungus, is an effective biocontrol agent against several plant pathogens. High-throughput sequencing was used here to study the mycoparasitism mechanisms of T. harzianum 88. Plate confrontation tests of T. harzianum 88 against plant pathogens were conducted, and a cDNA library was constructed from T. harzianum 88 mycelia in the presence of plant pathogen cell walls. Randomly selected transcripts from the cDNA library were compared with eukaryotic plant and fungal genomes. Of the 1,386 transcripts sequenced, the most abundant Gene Ontology (GO) classification group was "physiological process". Differential expression of 19 genes was confirmed by real-time RT-PCR at different mycoparasitism stages against plant pathogens. Gene expression analysis revealed the transcription of various genes involved in mycoparasitism of T. harzianum 88. Our study provides helpful insights into the mechanisms of T. harzianum 88-plant pathogen interactions.

  17. Genome-wide identification and characterization of WRKY gene family in Salix suchowensis.

    Science.gov (United States)

    Bi, Changwei; Xu, Yiqing; Ye, Qiaolin; Yin, Tongming; Ye, Ning

    2016-01-01

    WRKY proteins are the zinc finger transcription factors that were first identified in plants. They can specifically interact with the W-box, which can be found in the promoter region of a large number of plant target genes, to regulate the expressions of downstream target genes. They also participate in diverse physiological and growing processes in plants. Prior to this study, a plenty of WRKY genes have been identified and characterized in herbaceous species, but there is no large-scale study of WRKY genes in willow. With the whole genome sequencing of Salix suchowensis, we have the opportunity to conduct the genome-wide research for willow WRKY gene family. In this study, we identified 85 WRKY genes in the willow genome and renamed them from SsWRKY1 to SsWRKY85 on the basis of their specific distributions on chromosomes. Due to their diverse structural features, the 85 willow WRKY genes could be further classified into three main groups (group I-III), with five subgroups (IIa-IIe) in group II. With the multiple sequence alignment and the manual search, we found three variations of the WRKYGQK heptapeptide: WRKYGRK, WKKYGQK and WRKYGKK, and four variations of the normal zinc finger motif, which might execute some new biological functions. In addition, the SsWRKY genes from the same subgroup share the similar exon-intron structures and conserved motif domains. Further studies of SsWRKY genes revealed that segmental duplication events (SDs) played a more prominent role in the expansion of SsWRKY genes. Distinct expression profiles of SsWRKY genes with RNA sequencing data revealed that diverse expression patterns among five tissues, including tender roots, young leaves, vegetative buds, non-lignified stems and barks. With the analyses of WRKY gene family in willow, it is not only beneficial to complete the functional and annotation information of WRKY genes family in woody plants, but also provide important references to investigate the expansion and evolution of

  18. ExpTreeDB: web-based query and visualization of manually annotated gene expression profiling experiments of human and mouse from GEO.

    Science.gov (United States)

    Ni, Ming; Ye, Fuqiang; Zhu, Juanjuan; Li, Zongwei; Yang, Shuai; Yang, Bite; Han, Lu; Wu, Yongge; Chen, Ying; Li, Fei; Wang, Shengqi; Bo, Xiaochen

    2014-12-01

    Numerous public microarray datasets are valuable resources for the scientific communities. Several online tools have made great steps to use these data by querying related datasets with users' own gene signatures or expression profiles. However, dataset annotation and result exhibition still need to be improved. ExpTreeDB is a database that allows for queries on human and mouse microarray experiments from Gene Expression Omnibus with gene signatures or profiles. Compared with similar applications, ExpTreeDB pays more attention to dataset annotations and result visualization. We introduced a multiple-level annotation system to depict and organize original experiments. For example, a tamoxifen-treated cell line experiment is hierarchically annotated as 'agent→drug→estrogen receptor antagonist→tamoxifen'. Consequently, retrieved results are exhibited by an interactive tree-structured graphics, which provide an overview for related experiments and might enlighten users on key items of interest. The database is freely available at http://biotech.bmi.ac.cn/ExpTreeDB. Web site is implemented in Perl, PHP, R, MySQL and Apache. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  19. CRDB: database of chemosensory receptor gene families in vertebrate.

    Directory of Open Access Journals (Sweden)

    Dong Dong

    Full Text Available Chemosensory receptors (CR are crucial for animals to sense the environmental changes and survive on earth. The emergence of whole-genome sequences provides us an opportunity to identify the entire CR gene repertoires. To completely gain more insight into the evolution of CR genes in vertebrates, we identified the nearly all CR genes in 25 vertebrates using homology-based approaches. Among these CR gene repertoires, nearly half of them were identified for the first time in those previously uncharacterized species, such as the guinea pig, giant panda and elephant, etc. Consistent with previous findings, we found that the numbers of CR genes vary extensively among different species, suggesting an extreme form of 'birth-and-death' evolution. For the purpose of facilitating CR gene analysis, we constructed a database with the goals to provide a resource for CR genes annotation and a web tool for exploring their evolutionary patterns. Besides a search engine for the gene extraction from a specific chromosome region, an easy-to-use phylogenetic analysis tool was also provided to facilitate online phylogeny study of CR genes. Our work can provide a rigorous platform for further study on the evolution of CR genes in vertebrates.

  20. Characterization and gene expression analysis of the cir multi-gene family of plasmodium chabaudi chabaudi (AS)

    KAUST Repository

    Lawton, Jennifer

    2012-03-29

    Background: The pir genes comprise the largest multi-gene family in Plasmodium, with members found in P. vivax, P. knowlesi and the rodent malaria species. Despite comprising up to 5% of the genome, little is known about the functions of the proteins encoded by pir genes. P. chabaudi causes chronic infection in mice, which may be due to antigenic variation. In this model, pir genes are called cirs and may be involved in this mechanism, allowing evasion of host immune responses. In order to fully understand the role(s) of CIR proteins during P. chabaudi infection, a detailed characterization of the cir gene family was required.Results: The cir repertoire was annotated and a detailed bioinformatic characterization of the encoded CIR proteins was performed. Two major sub-families were identified, which have been named A and B. Members of each sub-family displayed different amino acid motifs, and were thus predicted to have undergone functional divergence. In addition, the expression of the entire cir repertoire was analyzed via RNA sequencing and microarray. Up to 40% of the cir gene repertoire was expressed in the parasite population during infection, and dominant cir transcripts could be identified. In addition, some differences were observed in the pattern of expression between the cir subgroups at the peak of P. chabaudi infection. Finally, specific cir genes were expressed at different time points during asexual blood stages.Conclusions: In conclusion, the large number of cir genes and their expression throughout the intraerythrocytic cycle of development indicates that CIR proteins are likely to be important for parasite survival. In particular, the detection of dominant cir transcripts at the peak of P. chabaudi infection supports the idea that CIR proteins are expressed, and could perform important functions in the biology of this parasite. Further application of the methodologies described here may allow the elucidation of CIR sub-family A and B protein

  1. Characterization and gene expression analysis of the cir multi-gene family of plasmodium chabaudi chabaudi (AS)

    KAUST Repository

    Lawton, Jennifer; Brugat, Thibaut; Yan, Yam Xue; Reid, Adam James; Bö hme, Ulrike; Otto, Thomas Dan; Pain, Arnab; Jackson, Andrew; Berriman, Matthew; Cunningham, Deirdre; Preiser, Peter; Langhorne, Jean

    2012-01-01

    Background: The pir genes comprise the largest multi-gene family in Plasmodium, with members found in P. vivax, P. knowlesi and the rodent malaria species. Despite comprising up to 5% of the genome, little is known about the functions of the proteins encoded by pir genes. P. chabaudi causes chronic infection in mice, which may be due to antigenic variation. In this model, pir genes are called cirs and may be involved in this mechanism, allowing evasion of host immune responses. In order to fully understand the role(s) of CIR proteins during P. chabaudi infection, a detailed characterization of the cir gene family was required.Results: The cir repertoire was annotated and a detailed bioinformatic characterization of the encoded CIR proteins was performed. Two major sub-families were identified, which have been named A and B. Members of each sub-family displayed different amino acid motifs, and were thus predicted to have undergone functional divergence. In addition, the expression of the entire cir repertoire was analyzed via RNA sequencing and microarray. Up to 40% of the cir gene repertoire was expressed in the parasite population during infection, and dominant cir transcripts could be identified. In addition, some differences were observed in the pattern of expression between the cir subgroups at the peak of P. chabaudi infection. Finally, specific cir genes were expressed at different time points during asexual blood stages.Conclusions: In conclusion, the large number of cir genes and their expression throughout the intraerythrocytic cycle of development indicates that CIR proteins are likely to be important for parasite survival. In particular, the detection of dominant cir transcripts at the peak of P. chabaudi infection supports the idea that CIR proteins are expressed, and could perform important functions in the biology of this parasite. Further application of the methodologies described here may allow the elucidation of CIR sub-family A and B protein

  2. De novo assembly, gene annotation and marker development using Illumina paired-end transcriptome sequences in celery (Apium graveolens L..

    Directory of Open Access Journals (Sweden)

    Nan Fu

    Full Text Available BACKGROUND: Celery is an increasing popular vegetable species, but limited transcriptome and genomic data hinder the research to it. In addition, a lack of celery molecular markers limits the process of molecular genetic breeding. High-throughput transcriptome sequencing is an efficient method to generate a large transcriptome sequence dataset for gene discovery, molecular marker development and marker-assisted selection breeding. PRINCIPAL FINDINGS: Celery transcriptomes from four tissues were sequenced using Illumina paired-end sequencing technology. De novo assembling was performed to generate a collection of 42,280 unigenes (average length of 502.6 bp that represent the first transcriptome of the species. 78.43% and 48.93% of the unigenes had significant similarity with proteins in the National Center for Biotechnology Information (NCBI non-redundant protein database (Nr and Swiss-Prot database respectively, and 10,473 (24.77% unigenes were assigned to Clusters of Orthologous Groups (COG. 21,126 (49.97% unigenes harboring Interpro domains were annotated, in which 15,409 (36.45% were assigned to Gene Ontology(GO categories. Additionally, 7,478 unigenes were mapped onto 228 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG. Large numbers of simple sequence repeats (SSRs were indentified, and then the rate of successful amplication and polymorphism were investigated among 31 celery accessions. CONCLUSIONS: This study demonstrates the feasibility of generating a large scale of sequence information by Illumina paired-end sequencing and efficient assembling. Our results provide a valuable resource for celery research. The developed molecular markers are the foundation of further genetic linkage analysis and gene localization, and they will be essential to accelerate the process of breeding.

  3. Msx homeobox gene family and craniofacial development.

    Science.gov (United States)

    Alappat, Sylvia; Zhang, Zun Yi; Chen, Yi Ping

    2003-12-01

    Vertebrate Msx genes are unlinked, homeobox-containing genes that bear homology to the Drosophila muscle segment homeobox gene. These genes are expressed at multiple sites of tissue-tissue interactions during vertebrate embryonic development. Inductive interactions mediated by the Msx genes are essential for normal craniofacial, limb and ectodermal organ morphogenesis, and are also essential to survival in mice, as manifested by the phenotypic abnormalities shown in knockout mice and in humans. This review summarizes studies on the expression, regulation, and functional analysis of Msx genes that bear relevance to craniofacial development in humans and mice. Key words: Msx genes, craniofacial, tooth, cleft palate, suture, development, transcription factor, signaling molecule.

  4. Organization and annotation of the Xcat critical region: elimination of seven positional candidate genes.

    Science.gov (United States)

    Huang, Kristen M; Geunes-Boyer, Scarlett; Wu, Sufen; Dutra, Amalia; Favor, Jack; Stambolian, Dwight

    2004-05-01

    Xcat mice display X-linked congenital cataracts and are a mouse model for the human X-linked cataract disease Nance Horan syndrome (NHS). The genetic defect in Xcat mice and NHS patients is not known. We isolated and sequenced a BAC contig representing a portion of the Xcat critical region. We combined our sequencing data with the most recent mouse sequence assemblies from both Celera and public databases. The sequence of the 2.2-Mb Xcat critical region was then analyzed for potential Xcat candidate genes. The coding regions of the seven known genes within this area (Rai2, Rbbp7, Ctps2, Calb3, Grpr, Reps2, and Syap1) were sequenced in Xcat mice and no mutations were detected. The expression of Rai2 was quantitatively identical in wild-type and Xcat mutant eyes. These results indicate that the Xcat mutation is within a novel, undiscovered gene.

  5. Gene Ontology Terms and Automated Annotation for Energy-Related Microbial Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Mukhopadhyay, Biswarup [Virginia Polytechnic Inst. and State Univ. (Virginia Tech), Blacksburg, VA (United States); Tyler, Brett M. [Oregon State Univ., Corvallis, OR (United States); Setubal, Joao [Univ. of Sao Paulo (Brazil); Murali, T. M. [Virginia Polytechnic Inst. and State Univ. (Virginia Tech), Blacksburg, VA (United States)

    2017-11-03

    Gene Ontology (GO) is one of the more widely used functional ontologies for describing gene functions at various levels. The project developed 660 GO terms for describing energy-related microbial processes and filled the known gaps in this area of the GO system, and then used these terms to describe functions of 179 genes to showcase the utilities of the new resources. It hosted a series of workshops and made presentations at key meetings to inform and train scientific community members on these terms and to receive inputs from them for the GO term generation efforts. The project has developed a website for storing and displaying the resources (http://www.mengo.biochem.vt.edu/). The outcome of the project was further disseminated through peer-reviewed publications and poster and seminar presentations.

  6. Genomic sequence around butterfly wing development genes: annotation and comparative analysis.

    Directory of Open Access Journals (Sweden)

    Inês C Conceição

    Full Text Available BACKGROUND: Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. METHODOLOGY/PRINCIPAL FINDINGS: We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes. CONCLUSIONS: The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1 the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2 the high

  7. Short interspersed nuclear elements (SINEs) are abundant in Solanaceae and have a family-specific impact on gene structure and genome organization.

    Science.gov (United States)

    Seibt, Kathrin M; Wenke, Torsten; Muders, Katja; Truberg, Bernd; Schmidt, Thomas

    2016-05-01

    Short interspersed nuclear elements (SINEs) are highly abundant non-autonomous retrotransposons that are widespread in plants. They are short in size, non-coding, show high sequence diversity, and are therefore mostly not or not correctly annotated in plant genome sequences. Hence, comparative studies on genomic SINE populations are rare. To explore the structural organization and impact of SINEs, we comparatively investigated the genome sequences of the Solanaceae species potato (Solanum tuberosum), tomato (Solanum lycopersicum), wild tomato (Solanum pennellii), and two pepper cultivars (Capsicum annuum). Based on 8.5 Gbp sequence data, we annotated 82 983 SINE copies belonging to 10 families and subfamilies on a base pair level. Solanaceae SINEs are dispersed over all chromosomes with enrichments in distal regions. Depending on the genome assemblies and gene predictions, 30% of all SINE copies are associated with genes, particularly frequent in introns and untranslated regions (UTRs). The close association with genes is family specific. More than 10% of all genes annotated in the Solanaceae species investigated contain at least one SINE insertion, and we found genes harbouring up to 16 SINE copies. We demonstrate the involvement of SINEs in gene and genome evolution including the donation of splice sites, start and stop codons and exons to genes, enlargement of introns and UTRs, generation of tandem-like duplications and transduction of adjacent sequence regions. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.

  8. Recurrent APC gene mutations in Polish FAP families

    Directory of Open Access Journals (Sweden)

    Pławski Andrzej

    2007-12-01

    Full Text Available Abstract The molecular diagnostics of genetically conditioned disorders is based on the identification of the mutations in the predisposing genes. Hereditary cancer disorders of the gastrointestinal tracts are caused by mutations of the tumour suppressor genes or the DNA repair genes. Occurrence of recurrent mutation allows improvement of molecular diagnostics. The mutation spectrum in the genes causing hereditary forms of colorectal cancers in the Polish population was previously described. In the present work an estimation of the frequency of the recurrent mutations of the APC gene was performed. Eight types of mutations occurred in 19.4% of our FAP families and these constitute 43% of all Polish diagnosed families.

  9. Genomic assessment of the evolution of the prion protein gene family in vertebrates.

    Science.gov (United States)

    Harrison, Paul M; Khachane, Amit; Kumar, Manish

    2010-05-01

    Prion diseases are devastating neurological disorders caused by the propagation of particles containing an alternative beta-sheet-rich form of the prion protein (PrP). Genes paralogous to PrP, called Doppel and Shadoo, have been identified, that also have neuropathological relevance. To aid in the further functional characterization of PrP and its relatives, we annotated completely the PrP gene family (PrP-GF), in the genomes of 42 vertebrates, through combined strategic application of gene prediction programs and advanced remote homology detection techniques (such as HMMs, PSI-TBLASTN and pGenThreader). We have uncovered several previously undescribed paralogous genes and pseudogenes. We find that current high-quality genomic evidence indicates that the PrP relative Doppel, was likely present in the last common ancestor of present-day Tetrapoda, but was lost in the bird lineage, since its divergence from reptiles. Using the new gene annotations, we have defined the consensus of structural features that are characteristic of the PrP and Doppel structures, across diverse Tetrapoda clades. Furthermore, we describe in detail a transcribed pseudogene derived from Shadoo that is conserved across primates, and that overlaps the meiosis gene, SYCE1, thus possibly regulating its expression. In addition, we analysed the locus of PRNP/PRND for significant conservation across the genomic DNA of eleven mammals, and determined the phylogenetic penetration of non-coding exons. The genomic evidence indicates that the second PRNP non-coding exon found in even-toed ungulates and rodents, is conserved in all high-coverage genome assemblies of primates (human, chimp, orang utan and macaque), and is, at least, likely to have fallen out of use during primate speciation. Furthermore, we have demonstrated that the PRNT gene (at the PRNP human locus) is conserved across at least sixteen mammals, and evolves like a long non-coding RNA, fashioned from fragments of ancient, long

  10. plantiSMASH: automated identification, annotation and expression analysis of plant biosynthetic gene clusters

    DEFF Research Database (Denmark)

    Kautsar, Satria A.; Suarez Duran, Hernando G.; Blin, Kai

    2017-01-01

    exploration of the nature and dynamics of gene clustering in plant metabolism. Moreover, spurred by the continuing decrease in costs of plant genome sequencing, they will allow genome mining technologies to be applied to plant natural product discovery. The plantiSMASH web server, precalculated results...

  11. Gene Expression and Functional Annotation of the Human Ciliary Body Epithelia

    NARCIS (Netherlands)

    Janssen, Sarah F.; Gorgels, Theo G. M. F.; Bossers, Koen; ten Brink, Jacoline B.; Essing, Anke H. W.; Nagtegaal, Martijn; van der Spek, Peter J.; Jansonius, Nomdo M.; Bergen, Arthur A. B.

    2012-01-01

    Purpose: The ciliary body (CB) of the human eye consists of the non-pigmented (NPE) and pigmented (PE) neuro-epithelia. We investigated the gene expression of NPE and PE, to shed light on the molecular mechanisms underlying the most important functions of the CB. We also developed molecular

  12. Functional annotation of rare gene aberration drivers of pancreatic cancer | Office of Cancer Genomics

    Science.gov (United States)

    As we enter the era of precision medicine, characterization of cancer genomes will directly influence therapeutic decisions in the clinic. Here we describe a platform enabling functionalization of rare gene mutations through their high-throughput construction, molecular barcoding and delivery to cancer models for in vivo tumour driver screens. We apply these technologies to identify oncogenic drivers of pancreatic ductal adenocarcinoma (PDAC).

  13. Characterization of transcriptome dynamics during watermelon fruit development: sequencing, assembly, annotation and gene expression profiles.

    Science.gov (United States)

    Guo, Shaogui; Liu, Jingan; Zheng, Yi; Huang, Mingyun; Zhang, Haiying; Gong, Guoyi; He, Hongju; Ren, Yi; Zhong, Silin; Fei, Zhangjun; Xu, Yong

    2011-09-21

    Cultivated watermelon [Citrullus lanatus (Thunb.) Matsum. & Nakai var. lanatus] is an important agriculture crop world-wide. The fruit of watermelon undergoes distinct stages of development with dramatic changes in its size, color, sweetness, texture and aroma. In order to better understand the genetic and molecular basis of these changes and significantly expand the watermelon transcript catalog, we have selected four critical stages of watermelon fruit development and used Roche/454 next-generation sequencing technology to generate a large expressed sequence tag (EST) dataset and a comprehensive transcriptome profile for watermelon fruit flesh tissues. We performed half Roche/454 GS-FLX run for each of the four watermelon fruit developmental stages (immature white, white-pink flesh, red flesh and over-ripe) and obtained 577,023 high quality ESTs with an average length of 302.8 bp. De novo assembly of these ESTs together with 11,786 watermelon ESTs collected from GenBank produced 75,068 unigenes with a total length of approximately 31.8 Mb. Overall 54.9% of the unigenes showed significant similarities to known sequences in GenBank non-redundant (nr) protein database and around two-thirds of them matched proteins of cucumber, the most closely-related species with a sequenced genome. The unigenes were further assigned with gene ontology (GO) terms and mapped to biochemical pathways. More than 5,000 SSRs were identified from the EST collection. Furthermore we carried out digital gene expression analysis of these ESTs and identified 3,023 genes that were differentially expressed during watermelon fruit development and ripening, which provided novel insights into watermelon fruit biology and a comprehensive resource of candidate genes for future functional analysis. We then generated profiles of several interesting metabolites that are important to fruit quality including pigmentation and sweetness. Integrative analysis of metabolite and digital gene expression

  14. Interestingness measures and strategies for mining multi-ontology multi-level association rules from gene ontology annotations for the discovery of new GO relationships.

    Science.gov (United States)

    Manda, Prashanti; McCarthy, Fiona; Bridges, Susan M

    2013-10-01

    The Gene Ontology (GO), a set of three sub-ontologies, is one of the most popular bio-ontologies used for describing gene product characteristics. GO annotation data containing terms from multiple sub-ontologies and at different levels in the ontologies is an important source of implicit relationships between terms from the three sub-ontologies. Data mining techniques such as association rule mining that are tailored to mine from multiple ontologies at multiple levels of abstraction are required for effective knowledge discovery from GO annotation data. We present a data mining approach, Multi-ontology data mining at All Levels (MOAL) that uses the structure and relationships of the GO to mine multi-ontology multi-level association rules. We introduce two interestingness measures: Multi-ontology Support (MOSupport) and Multi-ontology Confidence (MOConfidence) customized to evaluate multi-ontology multi-level association rules. We also describe a variety of post-processing strategies for pruning uninteresting rules. We use publicly available GO annotation data to demonstrate our methods with respect to two applications (1) the discovery of co-annotation suggestions and (2) the discovery of new cross-ontology relationships. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

  15. Genome-wide annotation of porcine microRNA genes and transcriptome profiling during Actinobacillus infection

    DEFF Research Database (Denmark)

    Nielsen, Mathilde

    MicroRNAs are small single stranded non-coding RNA molecules which contributes to the regulation of gene expression by primarily binding to the 3´end of protein coding mRNA, hereby inhibiting the translation process or promting degradation of the mRNA. The main focus of this PhD project was to ex......MicroRNAs are small single stranded non-coding RNA molecules which contributes to the regulation of gene expression by primarily binding to the 3´end of protein coding mRNA, hereby inhibiting the translation process or promting degradation of the mRNA. The main focus of this PhD project...

  16. Rapid high resolution genotyping of Francisella tularensis by whole genome sequence comparison of annotated genes ("MLST+".

    Directory of Open Access Journals (Sweden)

    Markus H Antwerpen

    Full Text Available The zoonotic disease tularemia is caused by the bacterium Francisella tularensis. This pathogen is considered as a category A select agent with potential to be misused in bioterrorism. Molecular typing based on DNA-sequence like canSNP-typing or MLVA has become the accepted standard for this organism. Due to the organism's highly clonal nature, the current typing methods have reached their limit of discrimination for classifying closely related subpopulations within the subspecies F. tularensis ssp. holarctica. We introduce a new gene-by-gene approach, MLST+, based on whole genome data of 15 sequenced F. tularensis ssp. holarctica strains and apply this approach to investigate an epidemic of lethal tularemia among non-human primates in two animal facilities in Germany. Due to the high resolution of MLST+ we are able to demonstrate that three independent clones of this highly infectious pathogen were responsible for these spatially and temporally restricted outbreaks.

  17. Genome, Functional Gene Annotation, and Nuclear Transformation of the Heterokont Oleaginous Alga Nannochloropsis oceanica CCMP1779

    Science.gov (United States)

    2012-11-15

    development of such an algal model system for basic discovery, we sequenced the genome and two sets of transcriptomes of N. oceanica CCMP1779, assembled...CCMP1779 has a gene encoding a highly conserved violax- anthin de-epoxidase ( VDE ) protein like that found in plants (Table S9). In Arabidopsis, VDE is...HLA3 or LCI1 were present. This result suggests that CCMP1779 might have a plastid Ci transport system similar to that of Chlamydomonas, but a distinct

  18. Closing the loop: from paper to protein annotation using supervised Gene Ontology classification.

    Science.gov (United States)

    Gobeill, Julien; Pasche, Emilie; Vishnyakova, Dina; Ruch, Patrick

    2014-01-01

    Gene function curation of the literature with Gene Ontology (GO) concepts is one particularly time-consuming task in genomics, and the help from bioinformatics is highly requested to keep up with the flow of publications. In 2004, the first BioCreative challenge already designed a task of automatic GO concepts assignment from a full text. At this time, results were judged far from reaching the performances required by real curation workflows. In particular, supervised approaches produced the most disappointing results because of lack of training data. Ten years later, the available curation data have massively grown. In 2013, the BioCreative IV GO task revisited the automatic GO assignment task. For this issue, we investigated the power of our supervised classifier, GOCat. GOCat computes similarities between an input text and already curated instances contained in a knowledge base to infer GO concepts. The subtask A consisted in selecting GO evidence sentences for a relevant gene in a full text. For this, we designed a state-of-the-art supervised statistical approach, using a naïve Bayes classifier and the official training set, and obtained fair results. The subtask B consisted in predicting GO concepts from the previous output. For this, we applied GOCat and reached leading results, up to 65% for hierarchical recall in the top 20 outputted concepts. Contrary to previous competitions, machine learning has this time outperformed standard dictionary-based approaches. Thanks to BioCreative IV, we were able to design a complete workflow for curation: given a gene name and a full text, this system is able to select evidence sentences for curation and to deliver highly relevant GO concepts. Contrary to previous competitions, machine learning this time outperformed dictionary-based systems. Observed performances are sufficient for being used in a real semiautomatic curation workflow. GOCat is available at http://eagl.unige.ch/GOCat/. http://eagl.unige.ch/GOCat4FT/.

  19. Analysis of the leaf transcriptome of Musa acuminata during interaction with Mycosphaerella musicola: gene assembly, annotation and marker development.

    Science.gov (United States)

    Passos, Marco A N; de Cruz, Viviane Oliveira; Emediato, Flavia L; de Teixeira, Cristiane Camargo; Azevedo, Vânia C Rennó; Brasileiro, Ana C M; Amorim, Edson P; Ferreira, Claudia F; Martins, Natalia F; Togawa, Roberto C; Júnior, Georgios J Pappas; da Silva, Orzenil Bonfim; Miller, Robert N G

    2013-02-05

    Although banana (Musa sp.) is an important edible crop, contributing towards poverty alleviation and food security, limited transcriptome datasets are available for use in accelerated molecular-based breeding in this genus. 454 GS-FLX Titanium technology was employed to determine the sequence of gene transcripts in genotypes of Musa acuminata ssp. burmannicoides Calcutta 4 and M. acuminata subgroup Cavendish cv. Grande Naine, contrasting in resistance to the fungal pathogen Mycosphaerella musicola, causal organism of Sigatoka leaf spot disease. To enrich for transcripts under biotic stress responses, full length-enriched cDNA libraries were prepared from whole plant leaf materials, both uninfected and artificially challenged with pathogen conidiospores. The study generated 846,762 high quality sequence reads, with an average length of 334 bp and totalling 283 Mbp. De novo assembly generated 36,384 and 35,269 unigene sequences for M. acuminata Calcutta 4 and Cavendish Grande Naine, respectively. A total of 64.4% of the unigenes were annotated through Basic Local Alignment Search Tool (BLAST) similarity analyses against public databases.Assembled sequences were functionally mapped to Gene Ontology (GO) terms, with unigene functions covering a diverse range of molecular functions, biological processes and cellular components. Genes from a number of defense-related pathways were observed in transcripts from each cDNA library. Over 99% of contig unigenes mapped to exon regions in the reference M. acuminata DH Pahang whole genome sequence. A total of 4068 genic-SSR loci were identified in Calcutta 4 and 4095 in Cavendish Grande Naine. A subset of 95 potential defense-related gene-derived simple sequence repeat (SSR) loci were validated for specific amplification and polymorphism across M. acuminata accessions. Fourteen loci were polymorphic, with alleles per polymorphic locus ranging from 3 to 8 and polymorphism information content ranging from 0.34 to 0.82. A large set

  20. Interferon induced IFIT family genes in host antiviral defense.

    Science.gov (United States)

    Zhou, Xiang; Michal, Jennifer J; Zhang, Lifan; Ding, Bo; Lunney, Joan K; Liu, Bang; Jiang, Zhihua

    2013-01-01

    Secretion of interferons (IFNs) from virus-infected cells is a hallmark of host antiviral immunity and in fact, IFNs exert their antiviral activities through the induction of antiviral proteins. The IFN-induced protein with tetratricopeptide repeats (IFITs) family is among hundreds of IFN-stimulated genes. This family contains a cluster of duplicated loci. Most mammals have IFIT1, IFIT2, IFIT3 and IFIT5; however, bird, marsupial, frog and fish have only IFIT5. Regardless of species, IFIT5 is always adjacent to SLC16A12. IFIT family genes are predominantly induced by type I and type III interferons and are regulated by the pattern recognition and the JAK-STAT signaling pathway. IFIT family proteins are involved in many processes in response to viral infection. However, some viruses can escape the antiviral functions of the IFIT family by suppressing IFIT family genes expression or methylation of 5' cap of viral molecules. In addition, the variants of IFIT family genes could significantly influence the outcome of hepatitis C virus (HCV) therapy. We believe that our current review provides a comprehensive picture for the community to understand the structure and function of IFIT family genes in response to pathogens in human, as well as in animals.

  1. Annotation Of Novel And Conserved MicroRNA Genes In The Build 10 Sus scrofa Reference Genome And Determination Of Their Expression Levels In Ten Different Tissues

    DEFF Research Database (Denmark)

    Thomsen, Bo; Nielsen, Mathilde; Hedegaard, Jakob

    The DNA template used in the pig genome sequencing project was provided by a Duroc pig named TJ Tabasco. In an effort to annotate microRNA (miRNA) genes in the reference genome we have conducted deep sequencing to determine the miRNA transcriptomes in ten different tissues isolated from Pinky......, a genetically identical clone of TJ Tabasco. The purpose was to generate miRNA sequences that are highly homologous to the reference genome sequence, which along with computational prediction will improve confidence in the genomic annotation of miRNA genes. Based on homology searches of the sequence data...... against miRBase, we identified more than 600 conserved known miRNA/miRNA*, which is a significant increase relative to the 211 porcine miRNA/miRNA* deposited in the current version of miRBase. Furthermore, the genome-wide transcript profiles provided important information on the relative abundance...

  2. Differential Gene Expression in the Otic Capsule and the Middle Ear-An Annotation of Bone-Related Signaling Genes

    DEFF Research Database (Denmark)

    Nielsen, Michelle C.; Martin-Bertelsen, Tomas; Friis, Morten

    2015-01-01

    Hypothesis: A number of bone-related genes may be responsible for the unique suppression of perilabyrinthine bone remodeling. Background: Bone remodeling is highly inhibited around the inner ear space most likely because of osteoprotegerin (OPG), which is a well-known potent inhibitor of osteocla...

  3. Transcriptome-wide survey of mouse CNS-derived cells reveals monoallelic expression within novel gene families.

    Directory of Open Access Journals (Sweden)

    Sierra M Li

    Full Text Available Monoallelic expression is an integral component of regulation of a number of essential genes and gene families. To probe for allele-specific expression in cells of CNS origin, we used next-generation sequencing (RNA-seq to analyze four clonal neural stem cell (NSC lines derived from Mus musculus C57BL/6 (B6×Mus musculus molossinus (JF1 adult female mice. We established a JF1 cSNP library, then ascertained transcriptome-wide expression from B6 vs. JF1 alleles in the NSC lines. Validating the assay, we found that 262 of 268 X-linked genes evaluable in at least one cell line showed monoallelic expression (at least 85% expression of the predominant allele, p-value<0.05. For autosomal genes 170 of 7,198 genes (2.4% of the total showed monoallelic expression in at least 2 evaluable cell lines. The group included eight known imprinted genes with the expected pattern of allele-specific expression. Among the other autosomal genes with monoallelic expression were five members of the glutathione transferase gene superfamily, which processes xenobiotic compounds as well as carcinogens and cancer therapeutic agents. Monoallelic expression within this superfamily thus may play a functional role in the response to diverse and potentially lethal exogenous factors, as is the case for the immunoglobulin and olfactory receptor superfamilies. Other genes and gene families showing monoallelic expression include the annexin gene family and the Thy1 gene, both linked to inflammation and cancer, as well as genes linked to alcohol dependence (Gabrg1 and epilepsy (Kcnma1. The annotated set of genes will provide a resource for investigation of mechanisms underlying certain cases of these and other major disorders.

  4. The ALMT Gene Family Performs Multiple Functions in Plants

    Directory of Open Access Journals (Sweden)

    Jie Liu

    2018-02-01

    Full Text Available The aluminium activated malate transporter (ALMT gene family is named after the first member of the family identified in wheat (Triticum aestivum L.. The product of this gene controls resistance to aluminium (Al toxicity. ALMT genes encode transmembrane proteins that function as anion channels and perform multiple functions involving the transport of organic anions (e.g., carboxylates and inorganic anions in cells. They share a PF11744 domain and are classified in the Fusaric acid resistance protein-like superfamily, CL0307. The proteins typically have five to seven transmembrane regions in the N-terminal half and a long hydrophillic C-terminal tail but predictions of secondary structure vary. Although widely spread in plants, relatively little information is available on the roles performed by other members of this family. In this review, we summarized functions of ALMT gene families, including Al resistance, stomatal function, mineral nutrition, microbe interactions, fruit acidity, light response and seed development.

  5. Comprehensive annotation of secondary metabolite biosynthetic genes and gene clusters of Aspergillus nidulans, A. fumigatus, A. niger and A. oryzae

    OpenAIRE

    Inglis, Diane O; Binkley, Jonathan; Skrzypek, Marek S; Arnaud, Martha B; Cerqueira, Gustavo C; Shah, Prachi; Wymore, Farrell; Wortman, Jennifer R; Sherlock, Gavin

    2013-01-01

    Background Secondary metabolite production, a hallmark of filamentous fungi, is an expanding area of research for the Aspergilli. These compounds are potent chemicals, ranging from deadly toxins to therapeutic antibiotics to potential anti-cancer drugs. The genome sequences for multiple Aspergilli have been determined, and provide a wealth of predictive information about secondary metabolite production. Sequence analysis and gene overexpression strategies have enabled the discovery of novel s...

  6. Evaluation of Gene-Based Family-Based Methods to Detect Novel Genes Associated With Familial Late Onset Alzheimer Disease

    Directory of Open Access Journals (Sweden)

    Maria V. Fernández

    2018-04-01

    Full Text Available Gene-based tests to study the combined effect of rare variants on a particular phenotype have been widely developed for case-control studies, but their evolution and adaptation for family-based studies, especially studies of complex incomplete families, has been slower. In this study, we have performed a practical examination of all the latest gene-based methods available for family-based study designs using both simulated and real datasets. We examined the performance of several collapsing, variance-component, and transmission disequilibrium tests across eight different software packages and 22 models utilizing a cohort of 285 families (N = 1,235 with late-onset Alzheimer disease (LOAD. After a thorough examination of each of these tests, we propose a methodological approach to identify, with high confidence, genes associated with the tested phenotype and we provide recommendations to select the best software and model for family-based gene-based analyses. Additionally, in our dataset, we identified PTK2B, a GWAS candidate gene for sporadic AD, along with six novel genes (CHRD, CLCN2, HDLBP, CPAMD8, NLRP9, and MAS1L as candidate genes for familial LOAD.

  7. Molecular evolution of the major chemosensory gene families in insects.

    Science.gov (United States)

    Sánchez-Gracia, A; Vieira, F G; Rozas, J

    2009-09-01

    Chemoreception is a crucial biological process that is essential for the survival of animals. In insects, olfaction allows the organism to recognise volatile cues that allow the detection of food, predators and mates, whereas the sense of taste commonly allows the discrimination of soluble stimulants that elicit feeding behaviours and can also initiate innate sexual and reproductive responses. The most important proteins involved in the recognition of chemical cues comprise moderately sized multigene families. These families include odorant-binding proteins (OBPs) and chemosensory proteins (CSPs), which are involved in peripheral olfactory processing, and the chemoreceptor superfamily formed by the olfactory receptor (OR) and gustatory receptor (GR) families. Here, we review some recent evolutionary genomic studies of chemosensory gene families using the data from fully sequenced insect genomes, especially from the 12 newly available Drosophila genomes. Overall, the results clearly support the birth-and-death model as the major mechanism of evolution in these gene families. Namely, new members arise by tandem gene duplication, progressively diverge in sequence and function, and can eventually be lost from the genome by a deletion or pseudogenisation event. Adaptive changes fostered by environmental shifts are also observed in the evolution of chemosensory families in insects and likely involve reproductive, ecological or behavioural traits. Consequently, the current size of these gene families is mainly a result of random gene gain and loss events. This dynamic process may represent a major source of genetic variation, providing opportunities for FUTURE specific adaptations.

  8. Genomewide analysis of TCP transcription factor gene family in ...

    Indian Academy of Sciences (India)

    Home; Journals; Journal of Genetics; Volume 93; Issue 3. Genomewide ... Teosinte branched1/cycloidea/proliferating cell factor1 (TCP) proteins are a large family of transcriptional regulators in angiosperms. They are ... To the best of our knowledge, this is the first study of a genomewide analysis of apple TCP gene family.

  9. Identification of metalloprotease gene families in sugarcane

    Directory of Open Access Journals (Sweden)

    O.H.P. Ramos

    2001-12-01

    Full Text Available Metalloproteases play a key role in many physiological processes in mammals such as cell migration, tissue remodeling and processing of growth factors. They have also been identified as important factors in the patho-physiology of a number of human diseases, including cancer and hypertension. Many bacterial pathogens rely on proteases in order to infect the host. Several classes of metalloproteases have been described in humans, bacteria, snake venoms and insects. However, the presence and characterization of plant metalloproteases have rarely been described in the literature. In our research, we searched the sugarcane expressed sequence tag (SUCEST DNA library in order to identify, by homology with sequences deposited in other databases, metalloprotease gene families expressed under different conditions. Protein sequences from Arabidopsis thaliana and Glycine max were used to search the SUCEST data bank. Conserved regions corresponding to different metalloprotease domains and sequence motifs were identified in the reads to characterize each group of enzymes. At least four classes of sugarcane metalloproteases have been identified, i.e. matrix metalloproteases, zincins, inverzincins, and ATP-dependent metalloproteases. Each enzyme class was analyzed for its expression in different conditions and tissues.Metaloproteases exercem papéis importantes em muitos processos fisiológicos em mamíferos tais como migração celular, remodelamento tecidual e processamento de fatores de crescimento. Estas enzimas estão envolvidas também na pato-fisiologia de um grande número de doenças humanas como hipertensão e câncer. Muitas bactérias patogênicas dependem de proteases para infectar o hospedeiro. Diversas classes de metaloproteases foram descritas em seres humanos, bactérias, venenos de serpentes e insetos. No entanto, a presença e a caracterização de metaloproteases em plantas estão pouco descritas na literatura. Neste trabalho, foi

  10. Genome organization and expression of the rat ACBP gene family

    DEFF Research Database (Denmark)

    Mandrup, S; Andreasen, P H; Knudsen, J

    1993-01-01

    pool former. We have molecularly cloned and characterized the rat ACBP gene family which comprises one expressed and four processed pseudogenes. One of these was shown to exist in two allelic forms. A comprehensive computer-aided analysis of the promoter region of the expressed ACBP gene revealed...

  11. APC gene mutations and extraintestinal phenotype of familial adenomatous polyposis

    NARCIS (Netherlands)

    Giardiello, F. M.; Petersen, G. M.; Piantadosi, S.; Gruber, S. B.; Traboulsi, E. I.; Offerhaus, G. J.; Muro, K.; Krush, A. J.; Booker, S. V.; Luce, M. C.; Laken, S. J.; Kinzler, K. W.; Vogelstein, B.; Hamilton, S. R.

    1997-01-01

    Familial adenomatous polyposis (FAP) is caused by germline mutation of the adenomatous polyposis coli (APC) gene on chromosome 5q. This study assessed genotype-phenotype correlations for extraintestinal lesions in FAP. Mutations of the APC gene were compared with the occurrence of seven

  12. LeARN: a platform for detecting, clustering and annotating non-coding RNAs

    Directory of Open Access Journals (Sweden)

    Schiex Thomas

    2008-01-01

    Full Text Available Abstract Background In the last decade, sequencing projects have led to the development of a number of annotation systems dedicated to the structural and functional annotation of protein-coding genes. These annotation systems manage the annotation of the non-protein coding genes (ncRNAs in a very crude way, allowing neither the edition of the secondary structures nor the clustering of ncRNA genes into families which are crucial for appropriate annotation of these molecules. Results LeARN is a flexible software package which handles the complete process of ncRNA annotation by integrating the layers of automatic detection and human curation. Conclusion This software provides the infrastructure to deal properly with ncRNAs in the framework of any annotation project. It fills the gap between existing prediction software, that detect independent ncRNA occurrences, and public ncRNA repositories, that do not offer the flexibility and interactivity required for annotation projects. The software is freely available from the download section of the website http://bioinfo.genopole-toulouse.prd.fr/LeARN

  13. Combined evidence annotation of transposable elements in genome sequences.

    Directory of Open Access Journals (Sweden)

    Hadi Quesneville

    2005-07-01

    Full Text Available Transposable elements (TEs are mobile, repetitive sequences that make up significant fractions of metazoan genomes. Despite their near ubiquity and importance in genome and chromosome biology, most efforts to annotate TEs in genome sequences rely on the results of a single computational program, RepeatMasker. In contrast, recent advances in gene annotation indicate that high-quality gene models can be produced from combining multiple independent sources of computational evidence. To elevate the quality of TE annotations to a level comparable to that of gene models, we have developed a combined evidence-model TE annotation pipeline, analogous to systems used for gene annotation, by integrating results from multiple homology-based and de novo TE identification methods. As proof of principle, we have annotated "TE models" in Drosophila melanogaster Release 4 genomic sequences using the combined computational evidence derived from RepeatMasker, BLASTER, TBLASTX, all-by-all BLASTN, RECON, TE-HMM and the previous Release 3.1 annotation. Our system is designed for use with the Apollo genome annotation tool, allowing automatic results to be curated manually to produce reliable annotations. The euchromatic TE fraction of D. melanogaster is now estimated at 5.3% (cf. 3.86% in Release 3.1, and we found a substantially higher number of TEs (n = 6,013 than previously identified (n = 1,572. Most of the new TEs derive from small fragments of a few hundred nucleotides long and highly abundant families not previously annotated (e.g., INE-1. We also estimated that 518 TE copies (8.6% are inserted into at least one other TE, forming a nest of elements. The pipeline allows rapid and thorough annotation of even the most complex TE models, including highly deleted and/or nested elements such as those often found in heterochromatic sequences. Our pipeline can be easily adapted to other genome sequences, such as those of the D. melanogaster heterochromatin or other

  14. The ABC gene family in arthropods: comparative genomics and role in insecticide transport and resistance.

    Science.gov (United States)

    Dermauw, Wannes; Van Leeuwen, Thomas

    2014-02-01

    About a 100 years ago, the Drosophila white mutant marked the birth of Drosophila genetics. The white gene turned out to encode the first well studied ABC transporter in arthropods. The ABC gene family is now recognized as one of the largest transporter families in all kingdoms of life. The majority of ABC proteins function as primary-active transporters that bind and hydrolyze ATP while transporting a large diversity of substrates across lipid membranes. Although extremely well studied in vertebrates for their role in drug resistance, less is known about the role of this family in the transport of endogenous and exogenous substances in arthropods. The ABC families of five insect species, a crustacean and a chelicerate have been annotated in some detail. We conducted a thorough phylogenetic analysis of the seven arthropod and human ABC protein subfamilies, to infer orthologous relationships that might suggest conserved function. Most orthologous relationships were found in the ABCB half transporter, ABCD, ABCE and ABCF subfamilies, but specific expansions within species and lineages are frequently observed and discussed. We next surveyed the role of ABC transporters in the transport of xenobiotics/plant allelochemicals and their involvement in insecticide resistance. The involvement of ABC transporters in xenobiotic resistance in arthropods is historically not well documented, but an increasing number of studies using unbiased differential gene expression analysis now points to their importance. We give an overview of methods that can be used to link ABC transporters to resistance. ABC proteins have also recently been implicated in the mode of action and resistance to Bt toxins in Lepidoptera. Given the enormous interest in Bt toxicology in transgenic crops, such findings will provide an impetus to further reveal the role of ABC transporters in arthropods. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.

  15. The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana

    Directory of Open Access Journals (Sweden)

    Baumgarten Andrew

    2004-06-01

    Full Text Available Abstract Background Most genes in Arabidopsis thaliana are members of gene families. How do the members of gene families arise, and how are gene family copy numbers maintained? Some gene families may evolve primarily through tandem duplication and high rates of birth and death in clusters, and others through infrequent polyploidy or large-scale segmental duplications and subsequent losses. Results Our approach to understanding the mechanisms of gene family evolution was to construct phylogenies for 50 large gene families in Arabidopsis thaliana, identify large internal segmental duplications in Arabidopsis, map gene duplications onto the segmental duplications, and use this information to identify which nodes in each phylogeny arose due to segmental or tandem duplication. Examples of six gene families exemplifying characteristic modes are described. Distributions of gene family sizes and patterns of duplication by genomic distance are also described in order to characterize patterns of local duplication and copy number for large gene families. Both gene family size and duplication by distance closely follow power-law distributions. Conclusions Combining information about genomic segmental duplications, gene family phylogenies, and gene positions provides a method to evaluate contributions of tandem duplication and segmental genome duplication in the generation and maintenance of gene families. These differences appear to correspond meaningfully to differences in functional roles of the members of the gene families.

  16. Functional annotation of rheumatoid arthritis and osteoarthritis associated genes by integrative genome-wide gene expression profiling analysis.

    Directory of Open Access Journals (Sweden)

    Zhan-Chun Li

    Full Text Available BACKGROUND: Rheumatoid arthritis (RA and osteoarthritis (OA are two major types of joint diseases that share multiple common symptoms. However, their pathological mechanism remains largely unknown. The aim of our study is to identify RA and OA related-genes and gain an insight into the underlying genetic basis of these diseases. METHODS: We collected 11 whole genome-wide expression profiling datasets from RA and OA cohorts and performed a meta-analysis to comprehensively investigate their expression signatures. This method can avoid some pitfalls of single dataset analyses. RESULTS AND CONCLUSION: We found that several biological pathways (i.e., the immunity, inflammation and apoptosis related pathways are commonly involved in the development of both RA and OA. Whereas several other pathways (i.e., vasopressin-related pathway, regulation of autophagy, endocytosis, calcium transport and endoplasmic reticulum stress related pathways present significant difference between RA and OA. This study provides novel insights into the molecular mechanisms underlying this disease, thereby aiding the diagnosis and treatment of the disease.

  17. Evolution of the YABBY gene family in seed plants.

    Science.gov (United States)

    Finet, Cédric; Floyd, Sandra K; Conway, Stephanie J; Zhong, Bojian; Scutt, Charles P; Bowman, John L

    2016-01-01

    Members of the YABBY gene family of transcription factors in angiosperms have been shown to be involved in the initiation of outgrowth of the lamina, the maintenance of polarity, and establishment of the leaf margin. Although most of the dorsal-ventral polarity genes in seed plants have homologs in non-spermatophyte lineages, the presence of YABBY genes is restricted to seed plants. To gain insight into the origin and diversification of this gene family, we reconstructed the evolutionary history of YABBY gene lineages in seed plants. Our findings suggest that either one or two YABBY genes were present in the last common ancestor of extant seed plants. We also examined the expression of YABBY genes in the gymnosperms Ephedra distachya (Gnetales), Ginkgo biloba (Ginkgoales), and Pseudotsuga menziesii (Coniferales). Our data indicate that some YABBY genes are expressed in a polar (abaxial) manner in leaves and female cones in gymnosperms. We propose that YABBY genes already acted as polarity genes in the last common ancestor of extant seed plants. © 2016 Wiley Periodicals, Inc.

  18. Molecular Evolution of the Glycosyltransferase 6 Gene Family in Primates

    Directory of Open Access Journals (Sweden)

    Eliane Evanovich

    2016-01-01

    Full Text Available Glycosyltransferase 6 gene family includes ABO, Ggta1, iGb3S, and GBGT1 genes and by three putative genes restricted to mammals, GT6m6, GTm6, and GT6m7, only the latter is found in primates. GT6 genes may encode functional and nonfunctional proteins. Ggta1 and GBGT1 genes, for instance, are pseudogenes in catarrhine primates, while iGb3S gene is only inactive in human, bonobo, and chimpanzee. Even inactivated, these genes tend to be conversed in primates. As some of the GT6 genes are related to the susceptibility or resistance to parasites, we investigated (i the selective pressure on the GT6 paralogs genes in primates; (ii the basis of the conservation of iGb3S in human, chimpanzee, and bonobo; and (iii the functional potential of the GBGT1 and GT6m7 in catarrhines. We observed that the purifying selection is prevalent and these genes have a low diversity, though ABO and Ggta1 genes have some sites under positive selection. GT6m7, a putative gene associated with aggressive periodontitis, may have regulatory function, but experimental studies are needed to assess its function. The evolutionary conservation of iGb3S in humans, chimpanzee, and bonobo seems to be the result of proximity to genes with important biological functions.

  19. Identification and in silico analysis of the Citrus HSP70 molecular chaperone gene family

    Directory of Open Access Journals (Sweden)

    Luciano G. Fietto

    2007-01-01

    Full Text Available The completion of the genome sequencing of the Arabidopsis thaliana model system provided a powerful molecular tool for comparative analysis of gene families present in the genome of economically relevant plant species. In this investigation, we used the sequences of the Arabidopsis Hsp70 gene family to identify and annotate the Citrus Hsp70 genes represented in the CitEST database. Based on sequence comparison analysis, we identified 18 clusters that were further divided into 5 subgroups encoding four mitochondrial mtHsp70s, three plastid csHsp70s, one ER luminal Hsp70 BiP, two HSP110/SSE-related proteins and eight cytosolic Hsp/Hsc70s. We also analyzed the expression profile by digital Northern of each Hsp70 transcript in different organs and in response to stress conditions. The EST database revealed a distinct population distribution of Hsp70 ESTs among isoforms and across the organs surveyed. The Hsp70-5 isoform was highly expressed in seeds, whereas BiP, mitochondrial and plastid HSp70 mRNAs displayed a similar expression profile in the organs analyzed, and were predominantly represented in flowers. Distinct Hsp70 mRNAs were also differentially expressed during Xylella infection and Citrus tristeza viral infection as well as during water deficit. This in silico study sets the groundwork for future investigations to fully characterize functionally the Citrus Hsp70 family and underscores the relevance of Hsp70s in response to abiotic and biotic stresses in Citrus.

  20. msh/Msx gene family in neural development.

    Science.gov (United States)

    Ramos, Casto; Robert, Benoît

    2005-11-01

    The involvement of Msx homeobox genes in skull and tooth formation has received a great deal of attention. Recent studies also indicate a role for the msh/Msx gene family in development of the nervous system. In this article, we discuss the functions of these transcription factors in neural-tissue organogenesis. We will deal mainly with the interactions of the Drosophila muscle segment homeobox (msh) gene with other homeobox genes and the repressive cascade that leads to neuroectoderm patterning; the role of Msx genes in neural-crest induction, focusing especially on the differences between lower and higher vertebrates; their implication in patterning of the vertebrate neural tube, particularly in diencephalon midline formation. Finally, we will examine the distinct activities of Msx1, Msx2 and Msx3 genes during neurogenesis, taking into account their relationships with signalling molecules such as BMP.

  1. The sieve element occlusion gene family in dicotyledonous plants.

    Science.gov (United States)

    Ernst, Antonia M; Rüping, Boris; Jekat, Stephan B; Nordzieke, Steffen; Reineke, Anna R; Müller, Boje; Bornberg-Bauer, Erich; Prüfer, Dirk; Noll, Gundula A

    2011-01-01

    Sieve element occlusion (SEO) genes encoding forisome subunits have been identified in Medicago truncatula and other legumes. Forisomes are structural phloem proteins uniquely found in Fabaceae sieve elements. They undergo a reversible conformational change after wounding, from a condensed to a dispersed state, thereby blocking sieve tube translocation and preventing the loss of photoassimilates. Recently, we identified SEO genes in several non-Fabaceae plants (lacking forisomes) and concluded that they most probably encode conventional non-forisome P-proteins. Molecular and phylogenetic analysis of the SEO gene family has identified domains that are characteristic for SEO proteins. Here, we extended our phylogenetic analysis by including additional SEO genes from several diverse species based on recently published genomic data. Our results strengthen the original assumption that SEO genes seem to be widespread in dicotyledonous angiosperms, and further underline the divergent evolution of SEO genes within the Fabaceae.

  2. Evolution of the vertebrate insulin receptor substrate (Irs) gene family.

    Science.gov (United States)

    Al-Salam, Ahmad; Irwin, David M

    2017-06-23

    Insulin receptor substrate (Irs) proteins are essential for insulin signaling as they allow downstream effectors to dock with, and be activated by, the insulin receptor. A family of four Irs proteins have been identified in mice, however the gene for one of these, IRS3, has been pseudogenized in humans. While it is known that the Irs gene family originated in vertebrates, it is not known when it originated and which members are most closely related to each other. A better understanding of the evolution of Irs genes and proteins should provide insight into the regulation of metabolism by insulin. Multiple genes for Irs proteins were identified in a wide variety of vertebrate species. Phylogenetic and genomic neighborhood analyses indicate that this gene family originated very early in vertebrae evolution. Most Irs genes were duplicated and retained in fish after the fish-specific genome duplication. Irs genes have been lost of various lineages, including Irs3 in primates and birds and Irs1 in most fish. Irs3 and Irs4 experienced an episode of more rapid protein sequence evolution on the ancestral mammalian lineage. Comparisons of the conservation of the proteins sequences among Irs paralogs show that domains involved in binding to the plasma membrane and insulin receptors are most strongly conserved, while divergence has occurred in sequences involved in interacting with downstream effector proteins. The Irs gene family originated very early in vertebrate evolution, likely through genome duplications, and in parallel with duplications of other components of the insulin signaling pathway, including insulin and the insulin receptor. While the N-terminal sequences of these proteins are conserved among the paralogs, changes in the C-terminal sequences likely allowed changes in biological function.

  3. Predictions of Gene Family Distributions in Microbial Genomes: Evolution by Gene Duplication and Modification

    International Nuclear Information System (INIS)

    Yanai, Itai; Camacho, Carlos J.; DeLisi, Charles

    2000-01-01

    A universal property of microbial genomes is the considerable fraction of genes that are homologous to other genes within the same genome. The process by which these homologues are generated is not well understood, but sequence analysis of 20 microbial genomes unveils a recurrent distribution of gene family sizes. We show that a simple evolutionary model based on random gene duplication and point mutations fully accounts for these distributions and permits predictions for the number of gene families in genomes not yet complete. Our findings are consistent with the notion that a genome evolves from a set of precursor genes to a mature size by gene duplications and increasing modifications. (c) 2000 The American Physical Society

  4. Predictions of Gene Family Distributions in Microbial Genomes: Evolution by Gene Duplication and Modification

    Energy Technology Data Exchange (ETDEWEB)

    Yanai, Itai; Camacho, Carlos J.; DeLisi, Charles

    2000-09-18

    A universal property of microbial genomes is the considerable fraction of genes that are homologous to other genes within the same genome. The process by which these homologues are generated is not well understood, but sequence analysis of 20 microbial genomes unveils a recurrent distribution of gene family sizes. We show that a simple evolutionary model based on random gene duplication and point mutations fully accounts for these distributions and permits predictions for the number of gene families in genomes not yet complete. Our findings are consistent with the notion that a genome evolves from a set of precursor genes to a mature size by gene duplications and increasing modifications. (c) 2000 The American Physical Society.

  5. Snf2 family gene distribution in higher plant genomes reveals DRD1 expansion and diversification in the tomato genome.

    Science.gov (United States)

    Bargsten, Joachim W; Folta, Adam; Mlynárová, Ludmila; Nap, Jan-Peter

    2013-01-01

    As part of large protein complexes, Snf2 family ATPases are responsible for energy supply during chromatin remodeling, but the precise mechanism of action of many of these proteins is largely unknown. They influence many processes in plants, such as the response to environmental stress. This analysis is the first comprehensive study of Snf2 family ATPases in plants. We here present a comparative analysis of 1159 candidate plant Snf2 genes in 33 complete and annotated plant genomes, including two green algae. The number of Snf2 ATPases shows considerable variation across plant genomes (17-63 genes). The DRD1, Rad5/16 and Snf2 subfamily members occur most often. Detailed analysis of the plant-specific DRD1 subfamily in related plant genomes shows the occurrence of a complex series of evolutionary events. Notably tomato carries unexpected gene expansions of DRD1 gene members. Most of these genes are expressed in tomato, although at low levels and with distinct tissue or organ specificity. In contrast, the Snf2 subfamily genes tend to be expressed constitutively in tomato. The results underpin and extend the Snf2 subfamily classification, which could help to determine the various functional roles of Snf2 ATPases and to target environmental stress tolerance and yield in future breeding.

  6. Regulatory patterns of a large family of defensin-like genes expressed in nodules of Medicago truncatula.

    Directory of Open Access Journals (Sweden)

    Sumitha Nallu

    Full Text Available Root nodules are the symbiotic organ of legumes that house nitrogen-fixing bacteria. Many genes are specifically induced in nodules during the interactions between the host plant and symbiotic rhizobia. Information regarding the regulation of expression for most of these genes is lacking. One of the largest gene families expressed in the nodules of the model legume Medicago truncatula is the nodule cysteine-rich (NCR group of defensin-like (DEFL genes. We used a custom Affymetrix microarray to catalog the expression changes of 566 NCRs at different stages of nodule development. Additionally, bacterial mutants were used to understand the importance of the rhizobial partners in induction of NCRs. Expression of early NCRs was detected during the initial infection of rhizobia in nodules and expression continued as nodules became mature. Late NCRs were induced concomitantly with bacteroid development in the nodules. The induction of early and late NCRs was correlated with the number and morphology of rhizobia in the nodule. Conserved 41 to 50 bp motifs identified in the upstream 1,000 bp promoter regions of NCRs were required for promoter activity. These cis-element motifs were found to be unique to the NCR family among all annotated genes in the M. truncatula genome, although they contain sub-regions with clear similarity to known regulatory motifs involved in nodule-specific expression and temporal gene regulation.

  7. Snf2 family gene distribution in higher plant genomes reveals DRD1 expansion and diversification in the tomato genome.

    Directory of Open Access Journals (Sweden)

    Joachim W Bargsten

    Full Text Available As part of large protein complexes, Snf2 family ATPases are responsible for energy supply during chromatin remodeling, but the precise mechanism of action of many of these proteins is largely unknown. They influence many processes in plants, such as the response to environmental stress. This analysis is the first comprehensive study of Snf2 family ATPases in plants. We here present a comparative analysis of 1159 candidate plant Snf2 genes in 33 complete and annotated plant genomes, including two green algae. The number of Snf2 ATPases shows considerable variation across plant genomes (17-63 genes. The DRD1, Rad5/16 and Snf2 subfamily members occur most often. Detailed analysis of the plant-specific DRD1 subfamily in related plant genomes shows the occurrence of a complex series of evolutionary events. Notably tomato carries unexpected gene expansions of DRD1 gene members. Most of these genes are expressed in tomato, although at low levels and with distinct tissue or organ specificity. In contrast, the Snf2 subfamily genes tend to be expressed constitutively in tomato. The results underpin and extend the Snf2 subfamily classification, which could help to determine the various functional roles of Snf2 ATPases and to target environmental stress tolerance and yield in future breeding.

  8. Jannovar: a java library for exome annotation.

    Science.gov (United States)

    Jäger, Marten; Wang, Kai; Bauer, Sebastian; Smedley, Damian; Krawitz, Peter; Robinson, Peter N

    2014-05-01

    Transcript-based annotation and pedigree analysis are two basic steps in the computational analysis of whole-exome sequencing experiments in genetic diagnostics and disease-gene discovery projects. Here, we present Jannovar, a stand-alone Java application as well as a Java library designed to be used in larger software frameworks for exome and genome analysis. Jannovar uses an interval tree to identify all transcripts affected by a given variant, and provides Human Genome Variation Society-compliant annotations both for variants affecting coding sequences and splice junctions as well as untranslated regions and noncoding RNA transcripts. Jannovar can also perform family-based pedigree analysis with Variant Call Format (VCF) files with data from members of a family segregating a Mendelian disorder. Using a desktop computer, Jannovar requires a few seconds to annotate a typical VCF file with exome data. Jannovar is freely available under the BSD2 license. Source code as well as the Java application and library file can be downloaded from http://compbio.charite.de (with tutorial) and https://github.com/charite/jannovar. © 2014 WILEY PERIODICALS, INC.

  9. De novo cloning and annotation of genes associated with immunity, detoxification and energy metabolism from the fat body of the oriental fruit fly, Bactrocera dorsalis.

    Directory of Open Access Journals (Sweden)

    Wen-Jia Yang

    Full Text Available The oriental fruit fly, Bactrocera dorsalis, is a destructive pest in tropical and subtropical areas. In this study, we performed transcriptome-wide analysis of the fat body of B. dorsalis and obtained more than 59 million sequencing reads, which were assembled into 27,787 unigenes with an average length of 591 bp. Among them, 17,442 (62.8% unigenes matched known proteins in the NCBI database. The assembled sequences were further annotated with gene ontology, cluster of orthologous group terms, and Kyoto encyclopedia of genes and genomes. In depth analysis was performed to identify genes putatively involved in immunity, detoxification, and energy metabolism. Many new genes were identified including serpins, peptidoglycan recognition proteins and defensins, which were potentially linked to immune defense. Many detoxification genes were identified, including cytochrome P450s, glutathione S-transferases and ATP-binding cassette (ABC transporters. Many new transcripts possibly involved in energy metabolism, including fatty acid desaturases, lipases, alpha amylases, and trehalose-6-phosphate synthases, were identified. Moreover, we randomly selected some genes to examine their expression patterns in different tissues by quantitative real-time PCR, which indicated that some genes exhibited fat body-specific expression in B. dorsalis. The identification of a numerous transcripts in the fat body of B. dorsalis laid the foundation for future studies on the functions of these genes.

  10. Human heavy-chain variable region gene family nonrandomly rearranged in familial chronic lymphocytic leukemia

    International Nuclear Information System (INIS)

    Shen, A.; Humphries, C.; Tucker, P.; Blattner, F.

    1987-01-01

    The authors have identified a family of human immunoglobulin heavy-chain variable-region (V/sub H/) genes, one member of which is rearranged in two affected members of a family in which the father and four of five siblings developed chronic lymphocytic leukemia. Cloning and sequencing of the rearranged V/sub H/ genes from leukemic lymphocytes of three affected siblings showed that two siblings had rearranged V/sub H/ genes (V/sub H/TS1 and V/sub H/WS1) that were 90% homologous. The corresponding germ-line gene, V/sub H/251, was found to part of a small (four gene) V/sub H/ gene family, which they term V/sub H/V. The DNA sequence homology to V/sub H/WS1 (95%) and V/sub H/TS1 (88%) and identical restriction sites on the 5' side of V/sub H/ confirm that rearrangement of V/sub H/251 followed by somatic mutation produced the identical V/sub H/ gene rearrangements in the two siblings. V/sub H/TS1 is not a functional V/sub H/ gene; a functional V/sub H/ rearrangement was found on the other chromosome of this patient. The other two siblings had different V/sub H/ gene rearrangements. All used different diversity genes. Mechanisms proposed for nonrandom selection of a single V/sub H/ gene include developmental regulation of this V/sub H/ gene rearrangement or selection of a subpopulation of B cells in which this V/sub H/ has been rearranged

  11. Gene family expansions and contractions are associated with host range in plant pathogens of the genus Colletotrichum.

    Science.gov (United States)

    Baroncelli, Riccardo; Amby, Daniel Buchvaldt; Zapparata, Antonio; Sarrocco, Sabrina; Vannacci, Giovanni; Le Floch, Gaétan; Harrison, Richard J; Holub, Eric; Sukno, Serenella A; Sreenivasaprasad, Surapareddy; Thon, Michael R

    2016-08-05

    Many species belonging to the genus Colletotrichum cause anthracnose disease on a wide range of plant species. In addition to their economic impact, the genus Colletotrichum is a useful model for the study of the evolution of host specificity, speciation and reproductive behaviors. Genome projects of Colletotrichum species have already opened a new era for studying the evolution of pathogenesis in fungi. We sequenced and annotated the genomes of four strains in the Colletotrichum acutatum species complex (CAsc), a clade of broad host range pathogens within the genus. The four CAsc proteomes and secretomes along with those representing an additional 13 species (six Colletotrichum spp. and seven other Sordariomycetes) were classified into protein families using a variety of tools. Hierarchical clustering of gene family and functional domain assignments, and phylogenetic analyses revealed lineage specific losses of carbohydrate-active enzymes (CAZymes) and proteases encoding genes in Colletotrichum species that have narrow host range as well as duplications of these families in the CAsc. We also found a lineage specific expansion of necrosis and ethylene-inducing peptide 1 (Nep1)-like protein (NLPs) families within the CAsc. This study illustrates the plasticity of Colletotrichum genomes, and shows that major changes in host range are associated with relatively recent changes in gene content.

  12. Small Mutations of the DMD Gene in Taiwanese Families

    Directory of Open Access Journals (Sweden)

    Hsiao-Lin Hwa

    2008-06-01

    Conclusion: Most identified mutations either led to a predictable premature stop codon or resulted in splicing defects, which caused defective function of dystrophin. Our findings extend the mutation spectrum of the DMD gene. Molecular characterization of the affected families is important for genetic counseling and prenatal diagnosis.

  13. Genetic diversity of bitter taste receptor gene family in Sichuan

    Indian Academy of Sciences (India)

    Genetic diversity of bitter taste receptor gene family in Sichuan domestic and Tibetan chicken populations. YUAN SU DIYAN LI UMA GAUR YAN WANG NAN WU BINLONG CHEN HONGXIAN XU HUADONG YIN YAODONG HU QING ZHU. RESEARCH ARTICLE Volume 95 Issue 3 September 2016 pp 675-681 ...

  14. Genomewide analysis of TCP transcription factor gene family in ...

    Indian Academy of Sciences (India)

    2014-12-09

    Dec 9, 2014 ... study of a genomewide analysis of apple TCP gene family. These results provide .... synthesize the first-strand cDNA using the PrimeScript First. Strand cDNA ..... only detected in the stem, leaf and fruit (figure 8). When.

  15. Identification of the trehalose-6-phosphate synthase gene family in ...

    Indian Academy of Sciences (India)

    2015-03-04

    Mar 4, 2015 ... stress, however, our study mainly analysed the TPS gene family under freezing conditions in winter wheat .... size the first-strand cDNA using the Fermentas RevertAid ..... In the stem of Dongnongdongmai 1, TaTPS1, 2, 3, 4, 8,.

  16. Gene family size conservation is a good indicator of evolutionary rates.

    Science.gov (United States)

    Chen, Feng-Chi; Chen, Chiuan-Jung; Li, Wen-Hsiung; Chuang, Trees-Juen

    2010-08-01

    The evolution of duplicate genes has been a topic of broad interest. Here, we propose that the conservation of gene family size is a good indicator of the rate of sequence evolution and some other biological properties. By comparing the human-chimpanzee-macaque orthologous gene families with and without family size conservation, we demonstrate that genes with family size conservation evolve more slowly than those without family size conservation. Our results further demonstrate that both family expansion and contraction events may accelerate gene evolution, resulting in elevated evolutionary rates in the genes without family size conservation. In addition, we show that the duplicate genes with family size conservation evolve significantly more slowly than those without family size conservation. Interestingly, the median evolutionary rate of singletons falls in between those of the above two types of duplicate gene families. Our results thus suggest that the controversy on whether duplicate genes evolve more slowly than singletons can be resolved when family size conservation is taken into consideration. Furthermore, we also observe that duplicate genes with family size conservation have the highest level of gene expression/expression breadth, the highest proportion of essential genes, and the lowest gene compactness, followed by singletons and then by duplicate genes without family size conservation. Such a trend accords well with our observations of evolutionary rates. Our results thus point to the importance of family size conservation in the evolution of duplicate genes.

  17. Transcriptome profiling in conifers and the PiceaGenExpress database show patterns of diversification within gene families and interspecific conservation in vascular gene expression

    Directory of Open Access Journals (Sweden)

    Raherison Elie

    2012-08-01

    Full Text Available Abstract Background Conifers have very large genomes (13 to 30 Gigabases that are mostly uncharacterized although extensive cDNA resources have recently become available. This report presents a global overview of transcriptome variation in a conifer tree and documents conservation and diversity of gene expression patterns among major vegetative tissues. Results An oligonucleotide microarray was developed from Picea glauca and P. sitchensis cDNA datasets. It represents 23,853 unique genes and was shown to be suitable for transcriptome profiling in several species. A comparison of secondary xylem and phelloderm tissues showed that preferential expression in these vascular tissues was highly conserved among Picea spp. RNA-Sequencing strongly confirmed tissue preferential expression and provided a robust validation of the microarray design. A small database of transcription profiles called PiceaGenExpress was developed from over 150 hybridizations spanning eight major tissue types. In total, transcripts were detected for 92% of the genes on the microarray, in at least one tissue. Non-annotated genes were predominantly expressed at low levels in fewer tissues than genes of known or predicted function. Diversity of expression within gene families may be rapidly assessed from PiceaGenExpress. In conifer trees, dehydrins and late embryogenesis abundant (LEA osmotic regulation proteins occur in large gene families compared to angiosperms. Strong contrasts and low diversity was observed in the dehydrin family, while diverse patterns suggested a greater degree of diversification among LEAs. Conclusion Together, the oligonucleotide microarray and the PiceaGenExpress database represent the first resource of this kind for gymnosperm plants. The spruce transcriptome analysis reported here is expected to accelerate genetic studies in the large and important group comprised of conifer trees.

  18. Using machine learning to speed up manual image annotation: application to a 3D imaging protocol for measuring single cell gene expression in the developing C. elegans embryo

    Directory of Open Access Journals (Sweden)

    Waterston Robert H

    2010-02-01

    Full Text Available Abstract Background Image analysis is an essential component in many biological experiments that study gene expression, cell cycle progression, and protein localization. A protocol for tracking the expression of individual C. elegans genes was developed that collects image samples of a developing embryo by 3-D time lapse microscopy. In this protocol, a program called StarryNite performs the automatic recognition of fluorescently labeled cells and traces their lineage. However, due to the amount of noise present in the data and due to the challenges introduced by increasing number of cells in later stages of development, this program is not error free. In the current version, the error correction (i.e., editing is performed manually using a graphical interface tool named AceTree, which is specifically developed for this task. For a single experiment, this manual annotation task takes several hours. Results In this paper, we reduce the time required to correct errors made by StarryNite. We target one of the most frequent error types (movements annotated as divisions and train a support vector machine (SVM classifier to decide whether a division call made by StarryNite is correct or not. We show, via cross-validation experiments on several benchmark data sets, that the SVM successfully identifies this type of error significantly. A new version of StarryNite that includes the trained SVM classifier is available at http://starrynite.sourceforge.net. Conclusions We demonstrate the utility of a machine learning approach to error annotation for StarryNite. In the process, we also provide some general methodologies for developing and validating a classifier with respect to a given pattern recognition task.

  19. Evolution of the MAGUK protein gene family in premetazoan lineages

    Directory of Open Access Journals (Sweden)

    Ruiz-Trillo Iñaki

    2010-04-01

    Full Text Available Abstract Background Cell-to-cell communication is a key process in multicellular organisms. In multicellular animals, scaffolding proteins belonging to the family of membrane-associated guanylate kinases (MAGUK are involved in the regulation and formation of cell junctions. These MAGUK proteins were believed to be exclusive to Metazoa. However, a MAGUK gene was recently identified in an EST survey of Capsaspora owczarzaki, an unicellular organism that branches off near the metazoan clade. To further investigate the evolutionary history of MAGUK, we have undertook a broader search for this gene family using available genomic sequences of different opisthokont taxa. Results Our survey and phylogenetic analyses show that MAGUK proteins are present not only in Metazoa, but also in the choanoflagellate Monosiga brevicollis and in the protist Capsaspora owczarzaki. However, MAGUKs are absent from fungi, amoebozoans or any other eukaryote. The repertoire of MAGUKs in Placozoa and eumetazoan taxa (Cnidaria + Bilateria is quite similar, except for one class that is missing in Trichoplax, while Porifera have a simpler MAGUK repertoire. However, Vertebrata have undergone several independent duplications and exhibit two exclusive MAGUK classes. Three different MAGUK types are found in both M. brevicollis and C. owczarzaki: DLG, MPP and MAGI. Furthermore, M. brevicollis has suffered a lineage-specific diversification. Conclusions The diversification of the MAGUK protein gene family occurred, most probably, prior to the divergence between Metazoa+choanoflagellates and the Capsaspora+Ministeria clade. A MAGI-like, a DLG-like, and a MPP-like ancestral genes were already present in the unicellular ancestor of Metazoa, and new gene members have been incorporated through metazoan evolution within two major periods, one before the sponge-eumetazoan split and another within the vertebrate lineage. Moreover, choanoflagellates have suffered an independent MAGUK

  20. PlantTribes: a gene and gene family resource for comparative genomics in plants

    OpenAIRE

    Wall, P. Kerr; Leebens-Mack, Jim; Müller, Kai F.; Field, Dawn; Altman, Naomi S.; dePamphilis, Claude W.

    2007-01-01

    The PlantTribes database (http://fgp.huck.psu.edu/tribe.html) is a plant gene family database based on the inferred proteomes of five sequenced plant species: Arabidopsis thaliana, Carica papaya, Medicago truncatula, Oryza sativa and Populus trichocarpa. We used the graph-based clustering algorithm MCL [Van Dongen (Technical Report INS-R0010 2000) and Enright et al. (Nucleic Acids Res. 2002; 30: 1575–1584)] to classify all of these species’ protein-coding genes into putative gene families, ca...

  1. Characterization of the MLO gene family in Rosaceae and gene expression analysis in Malus domestica.

    Science.gov (United States)

    Pessina, Stefano; Pavan, Stefano; Catalano, Domenico; Gallotta, Alessandra; Visser, Richard G F; Bai, Yuling; Malnoy, Mickael; Schouten, Henk J

    2014-07-22

    Powdery mildew (PM) is a major fungal disease of thousands of plant species, including many cultivated Rosaceae. PM pathogenesis is associated with up-regulation of MLO genes during early stages of infection, causing down-regulation of plant defense pathways. Specific members of the MLO gene family act as PM-susceptibility genes, as their loss-of-function mutations grant durable and broad-spectrum resistance. We carried out a genome-wide characterization of the MLO gene family in apple, peach and strawberry, and we isolated apricot MLO homologs through a PCR-approach. Evolutionary relationships between MLO homologs were studied and syntenic blocks constructed. Homologs that are candidates for being PM susceptibility genes were inferred by phylogenetic relationships with functionally characterized MLO genes and, in apple, by monitoring their expression following inoculation with the PM causal pathogen Podosphaera leucotricha. Genomic tools available for Rosaceae were exploited in order to characterize the MLO gene family. Candidate MLO susceptibility genes were identified. In follow-up studies it can be investigated whether silencing or a loss-of-function mutations in one or more of these candidate genes leads to PM resistance.

  2. Early evolution of the LIM homeobox gene family

    Energy Technology Data Exchange (ETDEWEB)

    Srivastava, Mansi; Larroux, Claire; Lu, Daniel R; Mohanty, Kareshma; Chapman, Jarrod; Degnan, Bernard M; Rokhsar, Daniel S

    2010-01-01

    LIM homeobox (Lhx) transcription factors are unique to the animal lineage and have patterning roles during embryonic development in flies, nematodes and vertebrates, with a conserved role in specifying neuronal identity. Though genes of this family have been reported in a sponge and a cnidarian, the expression patterns and functions of the Lhx family during development in non-bilaterian phyla are not known. We identified Lhx genes in two cnidarians and a placozoan and report the expression of Lhx genes during embryonic development in Nematostella and the demosponge Amphimedon. Members of the six major LIM homeobox subfamilies are represented in the genomes of the starlet sea anemone, Nematostella vectensis, and the placozoan Trichoplax adhaerens. The hydrozoan cnidarian, Hydra magnipapillata, has retained four of the six Lhx subfamilies, but apparently lost two others. Only three subfamilies are represented in the haplosclerid demosponge Amphimedon queenslandica. A tandem cluster of three Lhx genes of different subfamilies and a gene containing two LIM domains in the genome of T. adhaerens (an animal without any neurons) indicates that Lhx subfamilies were generated by tandem duplication. This tandem cluster in Trichoplax is likely a remnant of the original chromosomal context in which Lhx subfamilies first appeared. Three of the six Trichoplax Lhx genes are expressed in animals in laboratory culture, as are all Lhx genes in Hydra. Expression patterns of Nematostella Lhx genes correlate with neural territories in larval and juvenile polyp stages. In the aneural demosponge, A. queenslandica, the three Lhx genes are expressed widely during development, including in cells that are associated with the larval photosensory ring. The Lhx family expanded and diversified early in animal evolution, with all six subfamilies already diverged prior to the cnidarian-placozoan-bilaterian last common ancestor. In Nematostella, Lhx gene expression is correlated with neural

  3. Early evolution of the LIM homeobox gene family

    Directory of Open Access Journals (Sweden)

    Degnan Bernard M

    2010-01-01

    Full Text Available Abstract Background LIM homeobox (Lhx transcription factors are unique to the animal lineage and have patterning roles during embryonic development in flies, nematodes and vertebrates, with a conserved role in specifying neuronal identity. Though genes of this family have been reported in a sponge and a cnidarian, the expression patterns and functions of the Lhx family during development in non-bilaterian phyla are not known. Results We identified Lhx genes in two cnidarians and a placozoan and report the expression of Lhx genes during embryonic development in Nematostella and the demosponge Amphimedon. Members of the six major LIM homeobox subfamilies are represented in the genomes of the starlet sea anemone, Nematostella vectensis, and the placozoan Trichoplax adhaerens. The hydrozoan cnidarian, Hydra magnipapillata, has retained four of the six Lhx subfamilies, but apparently lost two others. Only three subfamilies are represented in the haplosclerid demosponge Amphimedon queenslandica. A tandem cluster of three Lhx genes of different subfamilies and a gene containing two LIM domains in the genome of T. adhaerens (an animal without any neurons indicates that Lhx subfamilies were generated by tandem duplication. This tandem cluster in Trichoplax is likely a remnant of the original chromosomal context in which Lhx subfamilies first appeared. Three of the six Trichoplax Lhx genes are expressed in animals in laboratory culture, as are all Lhx genes in Hydra. Expression patterns of Nematostella Lhx genes correlate with neural territories in larval and juvenile polyp stages. In the aneural demosponge, A. queenslandica, the three Lhx genes are expressed widely during development, including in cells that are associated with the larval photosensory ring. Conclusions The Lhx family expanded and diversified early in animal evolution, with all six subfamilies already diverged prior to the cnidarian-placozoan-bilaterian last common ancestor. In

  4. RNA sequencing reveals sexually dimorphic gene expression before gonadal differentiation in chicken and allows comprehensive annotation of the W-chromosome

    Science.gov (United States)

    2013-01-01

    Background Birds have a ZZ male: ZW female sex chromosome system and while the Z-linked DMRT1 gene is necessary for testis development, the exact mechanism of sex determination in birds remains unsolved. This is partly due to the poor annotation of the W chromosome, which is speculated to carry a female determinant. Few genes have been mapped to the W and little is known of their expression. Results We used RNA-seq to produce a comprehensive profile of gene expression in chicken blastoderms and embryonic gonads prior to sexual differentiation. We found robust sexually dimorphic gene expression in both tissues pre-dating gonadogenesis, including sex-linked and autosomal genes. This supports the hypothesis that sexual differentiation at the molecular level is at least partly cell autonomous in birds. Different sets of genes were sexually dimorphic in the two tissues, indicating that molecular sexual differentiation is tissue specific. Further analyses allowed the assembly of full-length transcripts for 26 W chromosome genes, providing a view of the W transcriptome in embryonic tissues. This is the first extensive analysis of W-linked genes and their expression profiles in early avian embryos. Conclusion Sexual differentiation at the molecular level is established in chicken early in embryogenesis, before gonadal sex differentiation. We find that the W chromosome is more transcriptionally active than previously thought, expand the number of known genes to 26 and present complete coding sequences for these W genes. This includes two novel W-linked sequences and three small RNAs reassigned to the W from the Un_Random chromosome. PMID:23531366

  5. Plant ion channels: gene families, physiology, and functional genomics analyses.

    Science.gov (United States)

    Ward, John M; Mäser, Pascal; Schroeder, Julian I

    2009-01-01

    Distinct potassium, anion, and calcium channels in the plasma membrane and vacuolar membrane of plant cells have been identified and characterized by patch clamping. Primarily owing to advances in Arabidopsis genetics and genomics, and yeast functional complementation, many of the corresponding genes have been identified. Recent advances in our understanding of ion channel genes that mediate signal transduction and ion transport are discussed here. Some plant ion channels, for example, ALMT and SLAC anion channel subunits, are unique. The majority of plant ion channel families exhibit homology to animal genes; such families include both hyperpolarization- and depolarization-activated Shaker-type potassium channels, CLC chloride transporters/channels, cyclic nucleotide-gated channels, and ionotropic glutamate receptor homologs. These plant ion channels offer unique opportunities to analyze the structural mechanisms and functions of ion channels. Here we review gene families of selected plant ion channel classes and discuss unique structure-function aspects and their physiological roles in plant cell signaling and transport.

  6. Chromosomal evolution of the PKD1 gene family in primates

    Directory of Open Access Journals (Sweden)

    Krawczak Michael

    2008-09-01

    Full Text Available Abstract Background The autosomal dominant polycystic kidney disease (ADPKD is mostly caused by mutations in the PKD1 (polycystic kidney disease 1 gene located in 16p13.3. Moreover, there are six pseudogenes of PKD1 that are located proximal to the master gene in 16p13.1. In contrast, no pseudogene could be detected in the mouse genome, only a single copy gene on chromosome 17. The question arises how the human situation originated phylogenetically. To address this question we applied comparative FISH-mapping of a human PKD1-containing genomic BAC clone and a PKD1-cDNA clone to chromosomes of a variety of primate species and the dog as a non-primate outgroup species. Results Comparative FISH with the PKD1-cDNA clone clearly shows that in all primate species studied distinct single signals map in subtelomeric chromosomal positions orthologous to the short arm of human chromosome 16 harbouring the master PKD1 gene. Only in human and African great apes, but not in orangutan, FISH with both BAC and cDNA clones reveals additional signal clusters located proximal of and clearly separated from the PKD1 master genes indicating the chromosomal position of PKD1 pseudogenes in 16p of these species, respectively. Indeed, this is in accordance with sequencing data in human, chimpanzee and orangutan. Apart from the master PKD1 gene, six pseudogenes are identified in both, human and chimpanzee, while only a single-copy gene is present in the whole-genome sequence of orangutan. The phylogenetic reconstruction of the PKD1-tree reveals that all human pseudogenes are closely related to the human PKD1 gene, and all chimpanzee pseudogenes are closely related to the chimpanzee PKD1 gene. However, our statistical analyses provide strong indication that gene conversion events may have occurred within the PKD1 family members of human and chimpanzee, respectively. Conclusion PKD1 must have undergone amplification very recently in hominid evolution. Duplicative

  7. The claudin gene family: expression in normal and neoplastic tissues

    International Nuclear Information System (INIS)

    Hewitt, Kyle J; Agarwal, Rachana; Morin, Patrice J

    2006-01-01

    The claudin (CLDN) genes encode a family of proteins important in tight junction formation and function. Recently, it has become apparent that CLDN gene expression is frequently altered in several human cancers. However, the exact patterns of CLDN expression in various cancers is unknown, as only a limited number of CLDN genes have been investigated in a few tumors. We identified all the human CLDN genes from Genbank and we used the large public SAGE database to ascertain the gene expression of all 21 CLDN in 266 normal and neoplastic tissues. Using real-time RT-PCR, we also surveyed a subset of 13 CLDN genes in 24 normal and 24 neoplastic tissues. We show that claudins represent a family of highly related proteins, with claudin-16, and -23 being the most different from the others. From in silico analysis and RT-PCR data, we find that most claudin genes appear decreased in cancer, while CLDN3, CLDN4, and CLDN7 are elevated in several malignancies such as those originating from the pancreas, bladder, thyroid, fallopian tubes, ovary, stomach, colon, breast, uterus, and the prostate. Interestingly, CLDN5 is highly expressed in vascular endothelial cells, providing a possible target for antiangiogenic therapy. CLDN18 might represent a biomarker for gastric cancer. Our study confirms previously known CLDN gene expression patterns and identifies new ones, which may have applications in the detection, prognosis and therapy of several human cancers. In particular we identify several malignancies that express CLDN3 and CLDN4. These cancers may represent ideal candidates for a novel therapy being developed based on CPE, a toxin that specifically binds claudin-3 and claudin-4

  8. A comprehensive family-based replication study of schizophrenia genes

    DEFF Research Database (Denmark)

    Aberg, Karolina A; Liu, Youfang; Bukszár, Jozsef

    2013-01-01

     768 control subjects from 6 databases and, after quality control 6298 individuals (including 3286 cases) from 1811 nuclear families. MAIN OUTCOMES AND MEASURES Case-control status for SCZ. RESULTS Replication results showed a highly significant enrichment of SNPs with small P values. Of the SNPs...... in an independent family-based replication study that, after quality control, consisted of 8107 SNPs. SETTING Linkage meta-analysis, brain transcriptome meta-analysis, candidate gene database, OMIM, relevant mouse studies, and expression quantitative trait locus databases. PATIENTS We included 11 185 cases and 10...

  9. Massive expansion of the calpain gene family in unicellular eukaryotes

    Directory of Open Access Journals (Sweden)

    Zhao Sen

    2012-09-01

    Full Text Available Abstract Background Calpains are Ca2+-dependent cysteine proteases that participate in a range of crucial cellular processes. Dysfunction of these enzymes may cause, for instance, life-threatening diseases in humans, the loss of sex determination in nematodes and embryo lethality in plants. Although the calpain family is well characterized in animal and plant model organisms, there is a great lack of knowledge about these genes in unicellular eukaryote species (i.e. protists. Here, we study the distribution and evolution of calpain genes in a wide range of eukaryote genomes from major branches in the tree of life. Results Our investigations reveal 24 types of protein domains that are combined with the calpain-specific catalytic domain CysPc. In total we identify 41 different calpain domain architectures, 28 of these domain combinations have not been previously described. Based on our phylogenetic inferences, we propose that at least four calpain variants were established in the early evolution of eukaryotes, most likely before the radiation of all the major supergroups of eukaryotes. Many domains associated with eukaryotic calpain genes can be found among eubacteria or archaebacteria but never in combination with the CysPc domain. Conclusions The analyses presented here show that ancient modules present in prokaryotes, and a few de novo eukaryote domains, have been assembled into many novel domain combinations along the evolutionary history of eukaryotes. Some of the new calpain genes show a narrow distribution in a few branches in the tree of life, likely representing lineage-specific innovations. Hence, the functionally important classical calpain genes found among humans and vertebrates make up only a tiny fraction of the calpain family. In fact, a massive expansion of the calpain family occurred by domain shuffling among unicellular eukaryotes and contributed to a wealth of functionally different genes.

  10. Massive expansion of the calpain gene family in unicellular eukaryotes.

    Science.gov (United States)

    Zhao, Sen; Liang, Zhe; Demko, Viktor; Wilson, Robert; Johansen, Wenche; Olsen, Odd-Arne; Shalchian-Tabrizi, Kamran

    2012-09-29

    Calpains are Ca2+-dependent cysteine proteases that participate in a range of crucial cellular processes. Dysfunction of these enzymes may cause, for instance, life-threatening diseases in humans, the loss of sex determination in nematodes and embryo lethality in plants. Although the calpain family is well characterized in animal and plant model organisms, there is a great lack of knowledge about these genes in unicellular eukaryote species (i.e. protists). Here, we study the distribution and evolution of calpain genes in a wide range of eukaryote genomes from major branches in the tree of life. Our investigations reveal 24 types of protein domains that are combined with the calpain-specific catalytic domain CysPc. In total we identify 41 different calpain domain architectures, 28 of these domain combinations have not been previously described. Based on our phylogenetic inferences, we propose that at least four calpain variants were established in the early evolution of eukaryotes, most likely before the radiation of all the major supergroups of eukaryotes. Many domains associated with eukaryotic calpain genes can be found among eubacteria or archaebacteria but never in combination with the CysPc domain. The analyses presented here show that ancient modules present in prokaryotes, and a few de novo eukaryote domains, have been assembled into many novel domain combinations along the evolutionary history of eukaryotes. Some of the new calpain genes show a narrow distribution in a few branches in the tree of life, likely representing lineage-specific innovations. Hence, the functionally important classical calpain genes found among humans and vertebrates make up only a tiny fraction of the calpain family. In fact, a massive expansion of the calpain family occurred by domain shuffling among unicellular eukaryotes and contributed to a wealth of functionally different genes.

  11. NDP gene mutations in 14 French families with Norrie disease.

    Science.gov (United States)

    Royer, Ghislaine; Hanein, Sylvain; Raclin, Valérie; Gigarel, Nadine; Rozet, Jean-Michel; Munnich, Arnold; Steffann, Julie; Dufier, Jean-Louis; Kaplan, Josseline; Bonnefont, Jean-Paul

    2003-12-01

    Norrie disease is a rare X-inked recessive condition characterized by congenital blindness and occasionally deafness and mental retardation in males. This disease has been ascribed to mutations in the NDP gene on chromosome Xp11.1. Previous investigations of the NDP gene have identified largely sixty disease-causing sequence variants. Here, we report on ten different NDP gene allelic variants in fourteen of a series of 21 families fulfilling inclusion criteria. Two alterations were intragenic deletions and eight were nucleotide substitutions or splicing variants, six of them being hitherto unreported, namely c.112C>T (p.Arg38Cys), c.129C>G (p.His43Gln), c.133G>A (p.Val45Met), c.268C>T (p.Arg90Cys), c.382T>C (p.Cys128Arg), c.23479-1G>C (unknown). No NDP gene sequence variant was found in seven of the 21 families. This observation raises the issue of misdiagnosis, phenocopies, or existence of other X-linked or autosomal genes, the mutations of which would mimic the Norrie disease phenotype. Copyright 2003 Wiley-Liss, Inc.

  12. Leiomodins: larger members of the tropomodulin (Tmod) gene family

    Science.gov (United States)

    Conley, C. A.; Fritz-Six, K. L.; Almenar-Queralt, A.; Fowler, V. M.

    2001-01-01

    The 64-kDa autoantigen D1 or 1D, first identified as a potential autoantigen in Graves' disease, is similar to the tropomodulin (Tmod) family of actin filament pointed end-capping proteins. A novel gene with significant similarity to the 64-kDa human autoantigen D1 has been cloned from both humans and mice, and the genomic sequences of both genes have been identified. These genes form a subfamily closely related to the Tmods and are here named the Leiomodins (Lmods). Both Lmod genes display a conserved intron-exon structure, as do three Tmod genes, but the intron-exon structure of the Lmods and the Tmods is divergent. mRNA expression analysis indicates that the gene formerly known as the 64-kDa autoantigen D1 is most highly expressed in a variety of human tissues that contain smooth muscle, earning it the name smooth muscle Leiomodin (SM-Lmod; HGMW-approved symbol LMOD1). Transcripts encoding the novel Lmod gene are present exclusively in fetal and adult heart and adult skeletal muscle, and it is here named cardiac Leiomodin (C-Lmod; HGMW-approved symbol LMOD2). Human C-Lmod is located near the hypertrophic cardiomyopathy locus CMH6 on human chromosome 7q3, potentially implicating it in this disease. Our data demonstrate that the Lmods are evolutionarily related and display tissue-specific patterns of expression distinct from, but overlapping with, the expression of Tmod isoforms. Copyright 2001 Academic Press.

  13. Evolutionary history of chordate PAX genes: dynamics of change in a complex gene family.

    Directory of Open Access Journals (Sweden)

    Vanessa Rodrigues Paixão-Côrtes

    Full Text Available Paired box (PAX genes are transcription factors that play important roles in embryonic development. Although the PAX gene family occurs in animals only, it is widely distributed. Among the vertebrates, its 9 genes appear to be the product of complete duplication of an original set of 4 genes, followed by an additional partial duplication. Although some studies of PAX genes have been conducted, no comprehensive survey of these genes across the entire taxonomic unit has yet been attempted. In this study, we conducted a detailed comparison of PAX sequences from 188 chordates, which revealed restricted variation. The absence of PAX4 and PAX8 among some species of reptiles and birds was notable; however, all 9 genes were present in all 74 mammalian genomes investigated. A search for signatures of selection indicated that all genes are subject to purifying selection, with a possible constraint relaxation in PAX4, PAX7, and PAX8. This result indicates asymmetric evolution of PAX family genes, which can be associated with the emergence of adaptive novelties in the chordate evolutionary trajectory.

  14. Annotated bibliography

    International Nuclear Information System (INIS)

    1997-08-01

    Under a cooperative agreement with the U.S. Department of Energy's Office of Science and Technology, Waste Policy Institute (WPI) is conducting a five-year research project to develop a research-based approach for integrating communication products in stakeholder involvement related to innovative technology. As part of the research, WPI developed this annotated bibliography which contains almost 100 citations of articles/books/resources involving topics related to communication and public involvement aspects of deploying innovative cleanup technology. To compile the bibliography, WPI performed on-line literature searches (e.g., Dialog, International Association of Business Communicators Public Relations Society of America, Chemical Manufacturers Association, etc.), consulted past years proceedings of major environmental waste cleanup conferences (e.g., Waste Management), networked with professional colleagues and DOE sites to gather reports or case studies, and received input during the August 1996 Research Design Team meeting held to discuss the project's research methodology. Articles were selected for annotation based upon their perceived usefulness to the broad range of public involvement and communication practitioners

  15. The nitrate transporter (NRT gene family in poplar.

    Directory of Open Access Journals (Sweden)

    Hua Bai

    Full Text Available Nitrate is an important nutrient required for plant growth. It also acts as a signal regulating plant development. Nitrate is actively taken up and transported by nitrate transporters (NRT, which form a large family with many members and distinct functions. In contrast to Arabidopsis and rice there is little information about the NRT family in woody plants such as Populus. In this study, a comprehensive analysis of the Populus NRT family was performed. Sixty-eight PtNRT1/PTR, 6 PtNRT2, and 5 PtNRT3 genes were identified in the P. trichocarpa genome. Phylogenetic analysis confirmed that the genes of the NRT family are divided into three clades: NRT1/PTR with four subclades, NRT2, and NRT3. Topological analysis indicated that all members of PtNRT1/PTR and PtNRT2 have 8 to 12 trans-membrane domains, whereas the PtNRT3 proteins have no or up to two trans-membrane domains. Four PtNRT3 members were predicted as secreted proteins. Microarray analyses revealed tissue-specific expression patterns of PtNRT genes with distinct clusters of NRTs for roots, for the elongation zone of the apical stem segment and the developing xylem and a further cluster for leaves, bark and wood. A comparison of different poplar species (P. trichocarpa, P. tremula, P. euphratica, P. fremontii x P. angustifolia, and P. x canescens showed that the tissue-specific patterns of the NRT genes varied to some extent with species. Bioinformatic analysis of putative cis-regulatory elements in the promoter regions of PtNRT family retrieved motifs suggesting the regulation of the NRT genes by N metabolism, by energy and carbon metabolism, and by phytohormones and stress. Multivariate analysis suggested that the combination and abundance of motifs in distinct promoters may lead to tissue-specificity. Our genome wide analysis of the PtNRT genes provides a valuable basis for functional analysis towards understanding the role of nitrate transporters for tree growth.

  16. Phylogenetic molecular function annotation

    International Nuclear Information System (INIS)

    Engelhardt, Barbara E; Jordan, Michael I; Repo, Susanna T; Brenner, Steven E

    2009-01-01

    It is now easier to discover thousands of protein sequences in a new microbial genome than it is to biochemically characterize the specific activity of a single protein of unknown function. The molecular functions of protein sequences have typically been predicted using homology-based computational methods, which rely on the principle that homologous proteins share a similar function. However, some protein families include groups of proteins with different molecular functions. A phylogenetic approach for predicting molecular function (sometimes called 'phylogenomics') is an effective means to predict protein molecular function. These methods incorporate functional evidence from all members of a family that have functional characterizations using the evolutionary history of the protein family to make robust predictions for the uncharacterized proteins. However, they are often difficult to apply on a genome-wide scale because of the time-consuming step of reconstructing the phylogenies of each protein to be annotated. Our automated approach for function annotation using phylogeny, the SIFTER (Statistical Inference of Function Through Evolutionary Relationships) methodology, uses a statistical graphical model to compute the probabilities of molecular functions for unannotated proteins. Our benchmark tests showed that SIFTER provides accurate functional predictions on various protein families, outperforming other available methods.

  17. Diverse roles of ERECTA family genes in plant development.

    Science.gov (United States)

    Shpak, Elena D

    2013-12-01

    Multiple receptor-like kinases (RLKs) enable intercellular communication that coordinates growth and development of plant tissues. ERECTA family receptors (ERfs) are an ancient family of leucine-rich repeat RLKs that in Arabidopsis consists of three genes: ERECTA, ERL1, and ERL2. ERfs sense secreted cysteine-rich peptides from the EPF/EPFL family and transmit the signal through a MAP kinase cascade. This review discusses the functions of ERfs in stomata development, in regulation of longitudinal growth of aboveground organs, during reproductive development, and in the shoot apical meristem. In addition the role of ERECTA in plant responses to biotic and abiotic factors is examined. Elena D. Shpak (Corresponding author). © 2013 Institute of Botany, Chinese Academy of Sciences.

  18. Molecular study of the perforin gene in familial hematological malignancies

    Directory of Open Access Journals (Sweden)

    El Abed Rim

    2011-09-01

    Full Text Available Abstract Perforin gene (PRF1 mutations have been identified in some patients diagnosed with the familial form of hemophagocytic lymphohistiocytosis (HLH and in patients with lymphoma. The aim of the present study was to determine whether patients with a familial aggregation of hematological malignancies harbor germline perforin gene mutations. For this purpose, 81 unrelated families from Tunisia and France with aggregated hematological malignancies were investigated. The variants detected in the PRF1 coding region amounted to 3.7% (3/81. Two of the three variants identified were previously described: the p.Ala91Val pathogenic mutation and the p.Asn252Ser polymorphism. A new p.Ala 211Val missense substitution was identified in two related Tunisian patients. In order to assess the pathogenicity of this new variation, bioinformatic tools were used to predict its effects on the perforin protein structure and at the mRNA level. The segregation of the mutant allele was studied in the family of interest and a control population was screened. The fact that this variant was not found to occur in 200 control chromosomes suggests that it may be pathogenic. However, overexpression of mutated PRF1 in rat basophilic leukemia cells did not affect the lytic function of perforin differently from the wild type protein.

  19. Bioinformatics Analysis of MAPKKK Family Genes in Medicago truncatula

    Directory of Open Access Journals (Sweden)

    Wei Li

    2016-04-01

    Full Text Available Mitogen‐activated protein kinase kinase kinase (MAPKKK is a component of the MAPK cascade pathway that plays an important role in plant growth, development, and response to abiotic stress, the functions of which have been well characterized in several plant species, such as Arabidopsis, rice, and maize. In this study, we performed genome‐wide and systemic bioinformatics analysis of MAPKKK family genes in Medicago truncatula. In total, there were 73 MAPKKK family members identified by search of homologs, and they were classified into three subfamilies, MEKK, ZIK, and RAF. Based on the genomic duplication function, 72 MtMAPKKK genes were located throughout all chromosomes, but they cluster in different chromosomes. Using microarray data and high‐throughput sequencing‐data, we assessed their expression profiles in growth and development processes; these results provided evidence for exploring their important functions in developmental regulation, especially in the nodulation process. Furthermore, we investigated their expression in abiotic stresses by RNA‐seq, which confirmed their critical roles in signal transduction and regulation processes under stress. In summary, our genome‐wide, systemic characterization and expressional analysis of MtMAPKKK genes will provide insights that will be useful for characterizing the molecular functions of these genes in M. truncatula.

  20. Repair of DNA damage in the human metallothionein gene family

    International Nuclear Information System (INIS)

    Leadon, S.A.; Snowden, M.M.

    1987-01-01

    In order to distinguish enhanced repair of a sequence due to its transcriptional activity from enhanced repair due to chromatin alterations brought about by integration of a sequence into the genome, we have investigated the repair of damage both in endogenous genes and in cell lines that contain an integrated gene with an inducible promoter. The endogenous genes we are studying are the metallothioneins (MTs), a multigene family in man consisting of about 10-12 members. Cultured cells were exposed to 10-J/m 2 uv light and allowed to repair in the presence of bromodeoxyuridine. The DNA was then isolated, digested with Eco RI, and fully hybrid density DNA made by semiconservative synthesis was separated from unreplicated DNA by centrifugation in CsCl density gradients. Unreplicated, parental-density DNA was then reacted with a monoclonal antibody against bromouracil. 1 ref., 1 fig., 1 tab

  1. Transcriptome sequencing and annotation of the microalgae Dunaliella tertiolecta: Pathway description and gene discovery for production of next-generation biofuels

    Directory of Open Access Journals (Sweden)

    Bibby Kyle

    2011-03-01

    Full Text Available Abstract Background Biodiesel or ethanol derived from lipids or starch produced by microalgae may overcome many of the sustainability challenges previously ascribed to petroleum-based fuels and first generation plant-based biofuels. The paucity of microalgae genome sequences, however, limits gene-based biofuel feedstock optimization studies. Here we describe the sequencing and de novo transcriptome assembly for the non-model microalgae species, Dunaliella tertiolecta, and identify pathways and genes of importance related to biofuel production. Results Next generation DNA pyrosequencing technology applied to D. tertiolecta transcripts produced 1,363,336 high quality reads with an average length of 400 bases. Following quality and size trimming, ~ 45% of the high quality reads were assembled into 33,307 isotigs with a 31-fold coverage and 376,482 singletons. Assembled sequences and singletons were subjected to BLAST similarity searches and annotated with Gene Ontology (GO and Kyoto Encyclopedia of Genes and Genomes (KEGG orthology (KO identifiers. These analyses identified the majority of lipid and starch biosynthesis and catabolism pathways in D. tertiolecta. Conclusions The construction of metabolic pathways involved in the biosynthesis and catabolism of fatty acids, triacylglycrols, and starch in D. tertiolecta as well as the assembled transcriptome provide a foundation for the molecular genetics and functional genomics required to direct metabolic engineering efforts that seek to enhance the quantity and character of microalgae-based biofuel feedstock.

  2. Annotation of Differential Gene Expression in Small Yellow Follicles of a Broiler-Type Strain of Taiwan Country Chickens in Response to Acute Heat Stress.

    Science.gov (United States)

    Cheng, Chuen-Yu; Tu, Wei-Lin; Wang, Shih-Han; Tang, Pin-Chi; Chen, Chih-Feng; Chen, Hsin-Hsin; Lee, Yen-Pai; Chen, Shuen-Ei; Huang, San-Yuan

    2015-01-01

    This study investigated global gene expression in the small yellow follicles (6-8 mm diameter) of broiler-type B strain Taiwan country chickens (TCCs) in response to acute heat stress. Twelve 30-wk-old TCC hens were divided into four groups: control hens maintained at 25°C and hens subjected to 38°C acute heat stress for 2 h without recovery (H2R0), with 2-h recovery (H2R2), and with 6-h recovery (H2R6). Small yellow follicles were collected for RNA isolation and microarray analysis at the end of each time point. Results showed that 69, 51, and 76 genes were upregulated and 58, 15, 56 genes were downregulated after heat treatment of H2R0, H2R2, and H2R6, respectively, using a cutoff value of two-fold or higher. Gene ontology analysis revealed that these differentially expressed genes are associated with the biological processes of cell communication, developmental process, protein metabolic process, immune system process, and response to stimuli. Upregulation of heat shock protein 25, interleukin 6, metallopeptidase 1, and metalloproteinase 13, and downregulation of type II alpha 1 collagen, discoidin domain receptor tyrosine kinase 2, and Kruppel-like factor 2 suggested that acute heat stress induces proteolytic disintegration of the structural matrix and inflamed damage and adaptive responses of gene expression in the follicle cells. These suggestions were validated through gene expression, using quantitative real-time polymerase chain reaction. Functional annotation clarified that interleukin 6-related pathways play a critical role in regulating acute heat stress responses in the small yellow follicles of TCC hens.

  3. Polymorphism in the interferon-{alpha} gene family

    Energy Technology Data Exchange (ETDEWEB)

    Golovleva, I.; Lundgren, E.; Beckman, L. [Univ. of Umea (Sweden); Kandefer-Szerszen, M. [Maria Curie-Sklodowska Univ., Lublin (Poland)

    1996-09-01

    A pronounced genetic polymorphism of the interferon type I gene family has been assumed on the basis of RFLP analysis of the genomic region as well as the large number of sequences published compared to the number of loci. However, IFNA2 is the only locus that has been carefully analyzed concerning gene frequency, and only naturally occurring rare alleles have been found. We have extended the studies on a variation of expressed sequences by studying the IFNA1, IFNA2, IFNA10, IFNA13, IFNA14, and IFNA17 genes. Genomic white-blood-cell DNA from a population sample of blood donors and from a family material were screened by single-nucleotide primer extension (allele-specific primer extension) of PCR fragments. Because of sequence similarities, in some cases {open_quotes}nested{close_quotes} PCR was used, and, when applicable, restriction analysis or control sequencing was performed. All individuals carried the interferon-{alpha} 1 and interferon-{alpha} 13 variants but not the LeIF D variant. At the IFNA2 and IFNA14 loci only one sequence variant was found, while in the IFNA10 and IFNA17 groups two alleles were detected in each group. The IFNA10 and IFNA17 alleles segregated in families and showed a close fit to the Hardy-Weinberg equilibrium. There was a significant linkage disequilibrium between IFNA10 and IFNA17 alleles. The fact that the extent of genetic polymorphism was lower than expected suggests that a majority of the previously described gene sequences represent nonpolymorphic rare mutants that may have arisen in tumor cell lines. 44 refs., 4 figs., 4 tabs.

  4. Gene screening in a Chinese family with Marfan syndrome

    Directory of Open Access Journals (Sweden)

    Wen-Jiao Xia

    2016-05-01

    Full Text Available AIM:To analyze the causative gene mutation for Marfan syndrome(MFSwith autosomal dominant hereditary in a Chinese family in Liaoning Province,China. METHODS: Venous blood was collected and candidate gene was selected to design primers according to the clinical phenotype. With genomic polymerase chain reaction(PCRperformed, the coding exons and their flanking intron in sequences of candidate gene were sequenced,DNA fragments separated by agarose gel electrophoresis and direct sequencing method was used to determine the pathogenic gene.RESULTS:Phenotype of the proband was presented as ectopic lentis. Sequencing of the coding regions of FBN1 gene showed the presence of a heterozygous A→G transversion at nucleotide 640 in the 7 exon of FBN1 and the missense mutation made for Glycine into Serine(G214S. CONCLUSION:A heterozygous mutation of FBN1 c.A640G(p.G214Sis responsible for the Marfan syndrome in the four generation Chinese pedigree.

  5. Genome-wide identification and tissue-specific expression analysis of nucleotide binding site-leucine rich repeat gene family in Cicer arietinum (kabuli chickpea).

    Science.gov (United States)

    Sharma, Ranu; Rawat, Vimal; Suresh, C G

    2017-12-01

    The nucleotide binding site-leucine rich repeat (NBS-LRR) proteins play an important role in the defense mechanisms against pathogens. Using bioinformatics approach, we identified and annotated 104 NBS-LRR genes in chickpea. Phylogenetic analysis points to their diversification into two families namely TIR-NBS-LRR and non-TIR-NBS-LRR. Gene architecture revealed intron gain/loss events in this resistance gene family during their independent evolution into two families. Comparative genomics analysis elucidated its evolutionary relationship with other fabaceae species. Around 50% NBS-LRRs reside in macro-syntenic blocks underlining positional conservation along with sequence conservation of NBS-LRR genes in chickpea. Transcriptome sequencing data provided evidence for their transcription and tissue-specific expression. Four cis -regulatory elements namely WBOX, DRE, CBF, and GCC boxes, that commonly occur in resistance genes, were present in the promoter regions of these genes. Further, the findings will provide a strong background to use candidate disease resistance NBS-encoding genes and identify their specific roles in chickpea.

  6. Analysis of antisense expression by whole genome tiling microarrays and siRNAs suggests mis-annotation of Arabidopsis orphan protein-coding genes.

    Directory of Open Access Journals (Sweden)

    Casey R Richardson

    2010-05-01

    Full Text Available MicroRNAs (miRNAs and trans-acting small-interfering RNAs (tasi-RNAs are small (20-22 nt long RNAs (smRNAs generated from hairpin secondary structures or antisense transcripts, respectively, that regulate gene expression by Watson-Crick pairing to a target mRNA and altering expression by mechanisms related to RNA interference. The high sequence homology of plant miRNAs to their targets has been the mainstay of miRNA prediction algorithms, which are limited in their predictive power for other kingdoms because miRNA complementarity is less conserved yet transitive processes (production of antisense smRNAs are active in eukaryotes. We hypothesize that antisense transcription and associated smRNAs are biomarkers which can be computationally modeled for gene discovery.We explored rice (Oryza sativa sense and antisense gene expression in publicly available whole genome tiling array transcriptome data and sequenced smRNA libraries (as well as C. elegans and found evidence of transitivity of MIRNA genes similar to that found in Arabidopsis. Statistical analysis of antisense transcript abundances, presence of antisense ESTs, and association with smRNAs suggests several hundred Arabidopsis 'orphan' hypothetical genes are non-coding RNAs. Consistent with this hypothesis, we found novel Arabidopsis homologues of some MIRNA genes on the antisense strand of previously annotated protein-coding genes. A Support Vector Machine (SVM was applied using thermodynamic energy of binding plus novel expression features of sense/antisense transcription topology and siRNA abundances to build a prediction model of miRNA targets. The SVM when trained on targets could predict the "ancient" (deeply conserved class of validated Arabidopsis MIRNA genes with an accuracy of 84%, and 76% for "new" rapidly-evolving MIRNA genes.Antisense and smRNA expression features and computational methods may identify novel MIRNA genes and other non-coding RNAs in plants and potentially other

  7. The Tomato Terpene Synthase Gene Family1[W][OA

    Science.gov (United States)

    Falara, Vasiliki; Akhtar, Tariq A.; Nguyen, Thuong T.H.; Spyropoulou, Eleni A.; Bleeker, Petra M.; Schauvinhold, Ines; Matsuba, Yuki; Bonini, Megan E.; Schilmiller, Anthony L.; Last, Robert L.; Schuurink, Robert C.; Pichersky, Eran

    2011-01-01

    Compounds of the terpenoid class play numerous roles in the interactions of plants with their environment, such as attracting pollinators and defending the plant against pests. We show here that the genome of cultivated tomato (Solanum lycopersicum) contains 44 terpene synthase (TPS) genes, including 29 that are functional or potentially functional. Of these 29 TPS genes, 26 were expressed in at least some organs or tissues of the plant. The enzymatic functions of eight of the TPS proteins were previously reported, and here we report the specific in vitro catalytic activity of 10 additional tomato terpene synthases. Many of the tomato TPS genes are found in clusters, notably on chromosomes 1, 2, 6, 8, and 10. All TPS family clades previously identified in angiosperms are also present in tomato. The largest clade of functional TPS genes found in tomato, with 12 members, is the TPS-a clade, and it appears to encode only sesquiterpene synthases, one of which is localized to the mitochondria, while the rest are likely cytosolic. A few additional sesquiterpene synthases are encoded by TPS-b clade genes. Some of the tomato sesquiterpene synthases use z,z-farnesyl diphosphate in vitro as well, or more efficiently than, the e,e-farnesyl diphosphate substrate. Genes encoding monoterpene synthases are also prevalent, and they fall into three clades: TPS-b, TPS-g, and TPS-e/f. With the exception of two enzymes involved in the synthesis of ent-kaurene, the precursor of gibberellins, no other tomato TPS genes could be demonstrated to encode diterpene synthases so far. PMID:21813655

  8. Functional genomics tools applied to plant metabolism: a survey on plant respiration, its connections and the annotation of complex gene functions

    Directory of Open Access Journals (Sweden)

    Wagner L. Araújo

    2012-09-01

    Full Text Available The application of post-genomic techniques in plant respiration studies has greatly improved our ability to assign functions to gene products. In addition it has also revealed previously unappreciated interactions between distal elements of metabolism. Such results have reinforced the need to consider plant respiratory metabolism as part of a complex network and making sense of such interactions will ultimately require the construction of predictive and mechanistic models. Transcriptomics, proteomics, metabolomics and the quantification of metabolic flux will be of great value in creating such models both by facilitating the annotation of complex gene function, determining their structure and by furnishing the quantitative data required to test them. In this review we highlight how these experimental approaches have contributed to our current understanding of plant respiratory metabolism and its interplay with associated process (e.g. photosynthesis, photorespiration and nitrogen metabolism. We also discuss how data from these techniques may be integrated, with the ultimate aim of identifying mechanisms that control and regulate plant respiration and discovering novel gene functions with potential biotechnological implications.

  9. Amelogenesis Imperfecta: 1 Family, 2 Phenotypes, and 2 Mutated Genes.

    Science.gov (United States)

    Prasad, M K; Laouina, S; El Alloussi, M; Dollfus, H; Bloch-Zupan, A

    2016-12-01

    Amelogenesis imperfecta (AI) is a clinically and genetically heterogeneous group of diseases characterized by enamel defects. The authors have identified a large consanguineous Moroccan family segregating different clinical subtypes of hypoplastic and hypomineralized AI in different individuals within the family. Using targeted next-generation sequencing, the authors identified a novel heterozygous nonsense mutation in COL17A1 (c.1873C>T, p.R625*) segregating with hypoplastic AI and a novel homozygous 8-bp deletion in C4orf26 (c.39_46del, p.Cys14Glyfs*18) segregating with hypomineralized-hypoplastic AI in this family. This study highlights the phenotypic and genotypic heterogeneity of AI that can exist even within a single consanguineous family. Furthermore, the identification of novel mutations in COL17A1 and C4orf26 and their correlation with distinct AI phenotypes can contribute to a better understanding of the pathophysiology of AI and the contribution of these genes to amelogenesis. © International & American Associations for Dental Research 2016.

  10. Differential expression pattern of UBX family genes in Caenorhabditis elegans

    International Nuclear Information System (INIS)

    Yamauchi, Seiji; Sasagawa, Yohei; Ogura, Teru; Yamanaka, Kunitoshi

    2007-01-01

    UBX (ubiquitin regulatory X)-containing proteins belong to an evolutionary conserved protein family and determine the specificity of p97/VCP/Cdc48p function by binding as its adaptors. Caenorhabditis elegans was found to possess six UBX-containing proteins, named UBXN-1 to -6. However, no general or specific function of them has been revealed. During the course of understanding not only their function but also specified function of p97, we investigated spatial and temporal expression patterns of six ubxn genes in this study. Transcript analyses showed that the expression pattern of each ubxn gene was different throughout worm's development and may show potential developmental dynamics in their function, especially ubxn-5 was expressed specifically in the spermatogenic germline, suggesting a crucial role in spermatogenesis. In addition, as ubxn-4 expression was induced by ER stress, it would function as an ERAD factor in C. elegans. In vivo expression analysis by using GFP translational fusion constructs revealed that six ubxn genes show distinct expression patterns. These results altogether demonstrate that the expression of all six ubxn genes of C. elegans is differently regulated

  11. The roles of gene duplication, gene conversion and positive selection in rodent Esp and Mup pheromone gene families with comparison to the Abp family.

    Science.gov (United States)

    Karn, Robert C; Laukaitis, Christina M

    2012-01-01

    Three proteinaceous pheromone families, the androgen-binding proteins (ABPs), the exocrine-gland secreting peptides (ESPs) and the major urinary proteins (MUPs) are encoded by large gene families in the genomes of Mus musculus and Rattus norvegicus. We studied the evolutionary histories of the Mup and Esp genes and compared them with what is known about the Abp genes. Apparently gene conversion has played little if any role in the expansion of the mouse Class A and Class B Mup genes and pseudogenes, and the rat Mups. By contrast, we found evidence of extensive gene conversion in many Esp genes although not in all of them. Our studies of selection identified at least two amino acid sites in β-sheets as having evolved under positive selection in the mouse Class A and Class B MUPs and in rat MUPs. We show that selection may have acted on the ESPs by determining K(a)/K(s) for Exon 3 sequences with and without the converted sequence segment. While it appears that purifying selection acted on the ESP signal peptides, the secreted portions of the ESPs probably have undergone much more rapid evolution. When the inner gene converted fragment sequences were removed, eleven Esp paralogs were present in two or more pairs with K(a)/K(s) >1.0 and thus we propose that positive selection is detectable by this means in at least some mouse Esp paralogs. We compare and contrast the evolutionary histories of all three mouse pheromone gene families in light of their proposed functions in mouse communication.

  12. Facilitating functional annotation of chicken microarray data

    Directory of Open Access Journals (Sweden)

    Gresham Cathy R

    2009-10-01

    Full Text Available Abstract Background Modeling results from chicken microarray studies is challenging for researchers due to little functional annotation associated with these arrays. The Affymetrix GenChip chicken genome array, one of the biggest arrays that serve as a key research tool for the study of chicken functional genomics, is among the few arrays that link gene products to Gene Ontology (GO. However the GO annotation data presented by Affymetrix is incomplete, for example, they do not show references linked to manually annotated functions. In addition, there is no tool that facilitates microarray researchers to directly retrieve functional annotations for their datasets from the annotated arrays. This costs researchers amount of time in searching multiple GO databases for functional information. Results We have improved the breadth of functional annotations of the gene products associated with probesets on the Affymetrix chicken genome array by 45% and the quality of annotation by 14%. We have also identified the most significant diseases and disorders, different types of genes, and known drug targets represented on Affymetrix chicken genome array. To facilitate functional annotation of other arrays and microarray experimental datasets we developed an Array GO Mapper (AGOM tool to help researchers to quickly retrieve corresponding functional information for their dataset. Conclusion Results from this study will directly facilitate annotation of other chicken arrays and microarray experimental datasets. Researchers will be able to quickly model their microarray dataset into more reliable biological functional information by using AGOM tool. The disease, disorders, gene types and drug targets revealed in the study will allow researchers to learn more about how genes function in complex biological systems and may lead to new drug discovery and development of therapies. The GO annotation data generated will be available for public use via AgBase website and

  13. De novo transcriptome assembly, functional annotation and differential gene expression analysis of juvenile and adult E. fetida, a model oligochaete used in ecotoxicological studies

    Directory of Open Access Journals (Sweden)

    Michelle Thunders

    Full Text Available Abstract Background Earthworms are sensitive to toxic chemicals present in the soil and so are useful indicator organisms for soil health. Eisenia fetida are commonly used in ecotoxicological studies; therefore the assembly of a baseline transcriptome is important for subsequent analyses exploring the impact of toxin exposure on genome wide gene expression. Results This paper reports on the de novo transcriptome assembly of E. fetida using Trinity, a freely available software tool. Trinotate was used to carry out functional annotation of the Trinity generated transcriptome file and the transdecoder generated peptide sequence file along with BLASTX, BLASTP and HMMER searches and were loaded into a Sqlite3 database. To identify differentially expressed transcripts; each of the original sequence files were aligned to the de novo assembled transcriptome using Bowtie and then RSEM was used to estimate expression values based on the alignment. EdgeR was used to calculate differential expression between the two conditions, with an FDR corrected P value cut off of 0.001, this returned six significantly differentially expressed genes. Initial BLASTX hits of these putative genes included hits with annelid ferritin and lysozyme proteins, as well as fungal NADH cytochrome b5 reductase and senescence associated proteins. At a cut off of P = 0.01 there were a further 26 differentially expressed genes. Conclusion These data have been made publicly available, and to our knowledge represent the most comprehensive available transcriptome for E. fetida assembled from RNA sequencing data. This provides important groundwork for subsequent ecotoxicogenomic studies exploring the impact of the environment on global gene expression in E. fetida and other earthworm species.

  14. Estimating the annotation error rate of curated GO database sequence annotations

    Directory of Open Access Journals (Sweden)

    Brown Alfred L

    2007-05-01

    Full Text Available Abstract Background Annotations that describe the function of sequences are enormously important to researchers during laboratory investigations and when making computational inferences. However, there has been little investigation into the data quality of sequence function annotations. Here we have developed a new method of estimating the error rate of curated sequence annotations, and applied this to the Gene Ontology (GO sequence database (GOSeqLite. This method involved artificially adding errors to sequence annotations at known rates, and used regression to model the impact on the precision of annotations based on BLAST matched sequences. Results We estimated the error rate of curated GO sequence annotations in the GOSeqLite database (March 2006 at between 28% and 30%. Annotations made without use of sequence similarity based methods (non-ISS had an estimated error rate of between 13% and 18%. Annotations made with the use of sequence similarity methodology (ISS had an estimated error rate of 49%. Conclusion While the overall error rate is reasonably low, it would be prudent to treat all ISS annotations with caution. Electronic annotators that use ISS annotations as the basis of predictions are likely to have higher false prediction rates, and for this reason designers of these systems should consider avoiding ISS annotations where possible. Electronic annotators that use ISS annotations to make predictions should be viewed sceptically. We recommend that curators thoroughly review ISS annotations before accepting them as valid. Overall, users of curated sequence annotations from the GO database should feel assured that they are using a comparatively high quality source of information.

  15. Global Analysis of miRNA Gene Clusters and Gene Families Reveals Dynamic and Coordinated Expression

    Directory of Open Access Journals (Sweden)

    Li Guo

    2014-01-01

    Full Text Available To further understand the potential expression relationships of miRNAs in miRNA gene clusters and gene families, a global analysis was performed in 4 paired tumor (breast cancer and adjacent normal tissue samples using deep sequencing datasets. The compositions of miRNA gene clusters and families are not random, and clustered and homologous miRNAs may have close relationships with overlapped miRNA species. Members in the miRNA group always had various expression levels, and even some showed larger expression divergence. Despite the dynamic expression as well as individual difference, these miRNAs always indicated consistent or similar deregulation patterns. The consistent deregulation expression may contribute to dynamic and coordinated interaction between different miRNAs in regulatory network. Further, we found that those clustered or homologous miRNAs that were also identified as sense and antisense miRNAs showed larger expression divergence. miRNA gene clusters and families indicated important biological roles, and the specific distribution and expression further enrich and ensure the flexible and robust regulatory network.

  16. Approaches to Working with Children, Young People and Families for Traveller, Irish Traveller, Gypsy, Roma and Show People Communities. Annotated Bibliography for the Children's Workforce Development Council

    Science.gov (United States)

    Robinson, Mark; Martin, Kerry; Wilkin, Carol

    2008-01-01

    This annoted bibliography relays a range of issues and approaches to working with Travellers, Irish Travellers, Gypsies, Roma and Show People. This is an accompanying document to the literature review report, ED501860.

  17. Sequencing, De Novo Assembly, and Annotation of the Transcriptome of the Endangered Freshwater Pearl Bivalve, Cristaria plicata, Provides Novel Insights into Functional Genes and Marker Discovery.

    Directory of Open Access Journals (Sweden)

    Bharat Bhusan Patnaik

    Full Text Available The freshwater mussel Cristaria plicata (Bivalvia: Eulamellibranchia: Unionidae, is an economically important species in molluscan aquaculture due to its use in pearl farming. The species have been listed as endangered in South Korea due to the loss of natural habitats caused by anthropogenic activities. The decreasing population and a lack of genomic information on the species is concerning for environmentalists and conservationists. In this study, we conducted a de novo transcriptome sequencing and annotation analysis of C. plicata using Illumina HiSeq 2500 next-generation sequencing (NGS technology, the Trinity assembler, and bioinformatics databases to prepare a sustainable resource for the identification of candidate genes involved in immunity, defense, and reproduction.The C. plicata transcriptome analysis included a total of 286,152,584 raw reads and 281,322,837 clean reads. The de novo assembly identified a total of 453,931 contigs and 374,794 non-redundant unigenes with average lengths of 731.2 and 737.1 bp, respectively. Furthermore, 100% coverage of C. plicata mitochondrial genes within two unigenes supported the quality of the assembler. In total, 84,274 unigenes showed homology to entries in at least one database, and 23,246 unigenes were allocated to one or more Gene Ontology (GO terms. The most prominent GO biological process, cellular component, and molecular function categories (level 2 were cellular process, membrane, and binding, respectively. A total of 4,776 unigenes were mapped to 123 biological pathways in the KEGG database. Based on the GO terms and KEGG annotation, the unigenes were suggested to be involved in immunity, stress responses, sex-determination, and reproduction. A total of 17,251 cDNA simple sequence repeats (cSSRs were identified from 61,141 unigenes (size of >1 kb with the most abundant being dinucleotide repeats.This dataset represents the first transcriptome analysis of the endangered mollusc, C. plicata

  18. Differential roles of TGIF family genes in mammalian reproduction

    Directory of Open Access Journals (Sweden)

    Renfree Marilyn B

    2011-09-01

    Full Text Available Abstract Background TG-interacting factors (TGIFs belong to a family of TALE-homeodomain proteins including TGIF1, TGIF2 and TGIFLX/Y in human. Both TGIF1 and TGIF2 act as transcription factors repressing TGF-β signalling. Human TGIFLX and its orthologue, Tex1 in the mouse, are X-linked genes that are only expressed in the adult testis. TGIF2 arose from TGIF1 by duplication, whereas TGIFLX arose by retrotransposition to the X-chromosome. These genes have not been characterised in any non-eutherian mammals. We therefore studied the TGIF family in the tammar wallaby (a marsupial mammal to investigate their roles in reproduction and how and when these genes may have evolved their functions and chromosomal locations. Results Both TGIF1 and TGIF2 were present in the tammar genome on autosomes but TGIFLX was absent. Tammar TGIF1 shared a similar expression pattern during embryogenesis, sexual differentiation and in adult tissues to that of TGIF1 in eutherian mammals, suggesting it has been functionally conserved. Tammar TGIF2 was ubiquitously expressed throughout early development as in the human and mouse, but in the adult, it was expressed only in the gonads and spleen, more like the expression pattern of human TGIFLX and mouse Tex1. Tammar TGIF2 mRNA was specifically detected in round and elongated spermatids. There was no mRNA detected in mature spermatozoa. TGIF2 protein was specifically located in the cytoplasm of spermatids, and in the residual body and the mid-piece of the mature sperm tail. These data suggest that tammar TGIF2 may participate in spermiogenesis, like TGIFLX does in eutherians. TGIF2 was detected for the first time in the ovary with mRNA produced in the granulosa and theca cells, suggesting it may also play a role in folliculogenesis. Conclusions The restricted and very similar expression of tammar TGIF2 to X-linked paralogues in eutherians suggests that the evolution of TGIF1, TGIF2 and TGIFLX in eutherians was accompanied by

  19. Analysis of the WUSCHEL-RELATED HOMEOBOX gene family in Pinus pinaster: New insights into the gene family evolution.

    Science.gov (United States)

    Alvarez, José M; Bueno, Natalia; Cañas, Rafael A; Avila, Concepción; Cánovas, Francisco M; Ordás, Ricardo J

    2018-02-01

    WUSCHEL-RELATED HOMEOBOX (WOX) genes are key players controlling stem cells in plants and can be divided into three clades according to the time of their appearance during plant evolution. Our knowledge of stem cell function in vascular plants other than angiosperms is limited, they separated from gymnosperms ca 300 million years ago and their patterning during embryogenesis differs significantly. For this reason, we have used the model gymnosperm Pinus pinaster to identify WOX genes and perform a thorough analysis of their gene expression patterns. Using transcriptomic data from a comprehensive range of tissues and stages of development we have shown three major outcomes: that the P. pinaster genome encodes at least fourteen members of the WOX family spanning all the major clades, that the genome of gymnosperms contains a WOX gene with no homologues in angiosperms representing a transitional stage between intermediate- and WUS-clade proteins, and that we can detect discrete WUS and WOX5 transcripts for the first time in a gymnosperm. Copyright © 2017 Elsevier Masson SAS. All rights reserved.

  20. Down-regulation of the cyprinid herpesvirus-3 annotated genes in cultured cells maintained at restrictive high temperature.

    Science.gov (United States)

    Ilouze, Maya; Dishon, Arnon; Kotler, Moshe

    2012-10-01

    Cyprinid herpesvirus-3 (CyHV-3) is a member of the Alloherpesviridae, in the order Herpesvirales. It causes a fatal disease in carp and koi fish. The disease is seasonal and is active when water temperatures ranges from 18 to 28 °C. Little is known about how and where the virus is preserved between the permissive seasons. The hallmark of the herpesviruses is their ability to become latent, persisting in the host in an apparently inactive state for varying periods of time. Hence, it could be expected that CyHV-3 enter a latent period. CyHV-3 has so far been shown to persist in fish maintained under restrictive temperatures, while shifting the fish to permissive conditions reactivates the virus. Previously, we demonstrated that cultured cells infected with CyHV-3 at 22 °C and subsequently transferred to a restrictive temperature of 30 °C preserve the virus for 30 days. The present report shows that cultured carp cells maintained and exposed to CyHV-3 at 30 °C are abortively infected; that is, autonomous viral DNA synthesis is hampered and the viral genome is not multiplied. Under these conditions, 91 of the 156 viral annotated ORFs were initially transcribed. These transcripts were down-regulated and gradually shut off over 18 days post-infection, while two viral transcripts encoded by ORFs 114 and 115 were preserved in the infected cells for 18 days p.i. These experiments, carried out in cultured cells, suggest that fish could be infected at a high non-permissive temperature and harbor the viral genome without producing viral particles. Copyright © 2012 Elsevier B.V. All rights reserved.

  1. Identification of ALK as the Major Familial Neuroblastoma Predisposition Gene

    Science.gov (United States)

    Mossë, Yalë P; Laudenslager, Marci; Longo, Luca; Cole, Kristina A; Wood, Andrew; Attiyeh, Edward F; Laquaglia, Michael J; Sennett, Rachel; Lynch, Jill E; Perri, Patrizia; Laureys, Geneviève; Speleman, Frank; Hakonarson, Hakon; Torkamani, Ali; Schork, Nicholas J; Brodeur, Garrett M; Tonini, Gian Paolo; Rappaport, Eric; Devoto, Marcella; Maris, John M

    2009-01-01

    SUMMARY Survival rates for the childhood cancer neuroblastoma have not substantively improved despite dramatic escalation in chemotherapy intensity. Like most human cancers, this embryonal malignancy can be inherited, but the genetic etiology of familial and sporadically occurring neuroblastoma was largely unknown. Here we show that germline mutations in the anaplastic lymphoma kinase gene (ALK) explain the majority of hereditary neuroblastomas, and that activating mutations can also be somatically acquired. We first identified a significant linkage signal at the short arm of chromosome 2 (maximum nonparametric LOD=4.23 at rs1344063) using a whole-genome scan in neuroblastoma pedigrees. Resequencing of regional candidate genes identified three separate missense mutations in the tyrosine kinase domain of ALK (G1128A, R1192P and R1275Q) that segregated with the disease in eight separate families. Examination of 491 sporadically occurring human neuroblastoma samples showed that the ALK locus was gained in 22.8%, and highly amplified in an additional 3.3%, and that these aberrations were highly associated with death from disease (P=0.0003). Resequencing of 194 high-risk neuroblastoma samples showed somatically acquired mutations within the tyrosine kinase domain in 12.4%. Nine of the ten mutations map to critical regions of the kinase domain and were predicted to be oncogenic drivers with high probability. Mutations resulted in constitutive phosphorylation consistent with activation, and targeted knockdown of ALK mRNA resulted in profound growth inhibition of 4 of 4 cell lines harboring mutant or amplified ALK, as well as 2 of 6 wild type for ALK. Our results demonstrate that heritable mutations of ALK are the major cause of familial neuroblastoma, and that germline or acquired activation of this cell surface kinase is a tractable therapeutic target for this lethal pediatric malignancy. PMID:18724359

  2. Pipeline to upgrade the genome annotations

    Directory of Open Access Journals (Sweden)

    Lijin K. Gopi

    2017-12-01

    Full Text Available Current era of functional genomics is enriched with good quality draft genomes and annotations for many thousands of species and varieties with the support of the advancements in the next generation sequencing technologies (NGS. Around 25,250 genomes, of the organisms from various kingdoms, are submitted in the NCBI genome resource till date. Each of these genomes was annotated using various tools and knowledge-bases that were available during the period of the annotation. It is obvious that these annotations will be improved if the same genome is annotated using improved tools and knowledge-bases. Here we present a new genome annotation pipeline, strengthened with various tools and knowledge-bases that are capable of producing better quality annotations from the consensus of the predictions from different tools. This resource also perform various additional annotations, apart from the usual gene predictions and functional annotations, which involve SSRs, novel repeats, paralogs, proteins with transmembrane helices, signal peptides etc. This new annotation resource is trained to evaluate and integrate all the predictions together to resolve the overlaps and ambiguities of the boundaries. One of the important highlights of this resource is the capability of predicting the phylogenetic relations of the repeats using the evolutionary trace analysis and orthologous gene clusters. We also present a case study, of the pipeline, in which we upgrade the genome annotation of Nelumbo nucifera (sacred lotus. It is demonstrated that this resource is capable of producing an improved annotation for a better understanding of the biology of various organisms.

  3. Genome-Wide Analysis of the Expression of WRKY Family Genes in Different Developmental Stages of Wild Strawberry (Fragaria vesca Fruit.

    Directory of Open Access Journals (Sweden)

    Heying Zhou

    Full Text Available WRKY proteins play important regulatory roles in plant developmental processes such as senescence, trichome initiation and embryo morphogenesis. In strawberry, only FaWRKY1 (Fragaria × ananassa has been characterized, leaving numerous WRKY genes to be identified and their function characterized. The publication of the draft genome sequence of the strawberry genome allowed us to conduct a genome-wide search for WRKY proteins in Fragaria vesca, and to compare the identified proteins with their homologs in model plants. Fifty-nine FvWRKY genes were identified and annotated from the F. vesca genome. Detailed analysis, including gene classification, annotation, phylogenetic evaluation, conserved motif determination and expression profiling, based on RNA-seq data, were performed on all members of the family. Additionally, the expression patterns of the WRKY genes in different fruit developmental stages were further investigated using qRT-PCR, to provide a foundation for further comparative genomics and functional studies of this important class of transcriptional regulators in strawberry.

  4. RNA-Seq analysis and annotation of a draft blueberry genome assembly identifies candidate genes involved in fruit ripening, biosynthesis of bioactive compounds, and stage-specific alternative splicing.

    Science.gov (United States)

    Gupta, Vikas; Estrada, April D; Blakley, Ivory; Reid, Rob; Patel, Ketan; Meyer, Mason D; Andersen, Stig Uggerhøj; Brown, Allan F; Lila, Mary Ann; Loraine, Ann E

    2015-01-01

    Blueberries are a rich source of antioxidants and other beneficial compounds that can protect against disease. Identifying genes involved in synthesis of bioactive compounds could enable the breeding of berry varieties with enhanced health benefits. Toward this end, we annotated a previously sequenced draft blueberry genome assembly using RNA-Seq data from five stages of berry fruit development and ripening. Genome-guided assembly of RNA-Seq read alignments combined with output from ab initio gene finders produced around 60,000 gene models, of which more than half were similar to proteins from other species, typically the grape Vitis vinifera. Comparison of gene models to the PlantCyc database of metabolic pathway enzymes identified candidate genes involved in synthesis of bioactive compounds, including bixin, an apocarotenoid with potential disease-fighting properties, and defense-related cyanogenic glycosides, which are toxic. Cyanogenic glycoside (CG) biosynthetic enzymes were highly expressed in green fruit, and a candidate CG detoxification enzyme was up-regulated during fruit ripening. Candidate genes for ethylene, anthocyanin, and 400 other biosynthetic pathways were also identified. Homology-based annotation using Blast2GO and InterPro assigned Gene Ontology terms to around 15,000 genes. RNA-Seq expression profiling showed that blueberry growth, maturation, and ripening involve dynamic gene expression changes, including coordinated up- and down-regulation of metabolic pathway enzymes and transcriptional regulators. Analysis of RNA-seq alignments identified developmentally regulated alternative splicing, promoter use, and 3' end formation. We report genome sequence, gene models, functional annotations, and RNA-Seq expression data that provide an important new resource enabling high throughput studies in blueberry.

  5. Molecular evolution of the polyamine oxidase gene family in Metazoa

    Directory of Open Access Journals (Sweden)

    Polticelli Fabio

    2012-06-01

    Full Text Available Abstract Background Polyamine oxidase enzymes catalyze the oxidation of polyamines and acetylpolyamines. Since polyamines are basic regulators of cell growth and proliferation, their homeostasis is crucial for cell life. Members of the polyamine oxidase gene family have been identified in a wide variety of animals, including vertebrates, arthropodes, nematodes, placozoa, as well as in plants and fungi. Polyamine oxidases (PAOs from yeast can oxidize spermine, N1-acetylspermine, and N1-acetylspermidine, however, in vertebrates two different enzymes, namely spermine oxidase (SMO and acetylpolyamine oxidase (APAO, specifically catalyze the oxidation of spermine, and N1-acetylspermine/N1-acetylspermidine, respectively. Little is known about the molecular evolutionary history of these enzymes. However, since the yeast PAO is able to catalyze the oxidation of both acetylated and non acetylated polyamines, and in vertebrates these functions are addressed by two specialized polyamine oxidase subfamilies (APAO and SMO, it can be hypothesized an ancestral reference for the former enzyme from which the latter would have been derived. Results We analysed 36 SMO, 26 APAO, and 14 PAO homologue protein sequences from 54 taxa including various vertebrates and invertebrates. The analysis of the full-length sequences and the principal domains of vertebrate and invertebrate PAOs yielded consensus primary protein sequences for vertebrate SMOs and APAOs, and invertebrate PAOs. This analysis, coupled to molecular modeling techniques, also unveiled sequence regions that confer specific structural and functional properties, including substrate specificity, by the different PAO subfamilies. Molecular phylogenetic trees revealed a basal position of all the invertebrates PAO enzymes relative to vertebrate SMOs and APAOs. PAOs from insects constitute a monophyletic clade. Two PAO variants sampled in the amphioxus are basal to the dichotomy between two well supported

  6. BEACON: automated tool for Bacterial GEnome Annotation ComparisON.

    Science.gov (United States)

    Kalkatawi, Manal; Alam, Intikhab; Bajic, Vladimir B

    2015-08-18

    Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON's utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27%, while the number of genes without any function assignment is reduced. We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/ .

  7. BEACON: automated tool for Bacterial GEnome Annotation ComparisON

    KAUST Repository

    Kalkatawi, Manal M.

    2015-08-18

    Background Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). Results The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON’s utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27 %, while the number of genes without any function assignment is reduced. Conclusions We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/

  8. The Zebrafish GenomeWiki: a crowdsourcing approach to connect the long tail for zebrafish gene annotation

    OpenAIRE

    Singh, Meghna; Bhartiya, Deeksha; Maini, Jayant; Sharma, Meenakshi; Singh, Angom Ramcharan; Kadarkaraisamy, Subburaj; Rana, Rajiv; Sabharwal, Ankit; Nanda, Srishti; Ramachandran, Aravindhakshan; Mittal, Ashish; Kapoor, Shruti; Sehgal, Paras; Asad, Zainab; Kaushik, Kriti

    2014-01-01

    A large repertoire of gene-centric data has been generated in the field of zebrafish biology. Although the bulk of these data are available in the public domain, most of them are not readily accessible or available in nonstandard formats. One major challenge is to unify and integrate these widely scattered data sources. We tested the hypothesis that active community participation could be a viable option to address this challenge. We present here our approach to create standards for assimilat...

  9. Annotation of differentially expressed genes in the somatic embryogenesis of musa and their location in the banana genome.

    Science.gov (United States)

    Maldonado-Borges, Josefina Ines; Ku-Cauich, José Roberto; Escobedo-Graciamedrano, Rosa Maria

    2013-01-01

    Analysis of cDNA-AFLP was used to study the genes expressed in zygotic and somatic embryogenesis of Musa acuminata Colla ssp. malaccensis, and a comparison was made between their differential transcribed fragments (TDFs) and the sequenced genome of the double haploid- (DH-) Pahang of the malaccensis subspecies that is available in the network. A total of 253 transcript-derived fragments (TDFs) were detected with apparent size of 100-4000 bp using 5 pairs of AFLP primers, of which 21 were differentially expressed during the different stages of banana embryogenesis; 15 of the sequences have matched DH-Pahang chromosomes, with 7 of them being homologous to gene sequences encoding either known or putative protein domains of higher plants. Four TDF sequences were located in all Musa chromosomes, while the rest were located in one or two chromosomes. Their putative individual function is briefly reviewed based on published information, and the potential roles of these genes in embryo development are discussed. Thus the availability of the genome of Musa and the information of TDFs sequences presented here opens new possibilities for an in-depth study of the molecular and biochemical research of zygotic and somatic embryogenesis of Musa.

  10. Search for 5'-leader regulatory RNA structures based on gene annotation aided by the RiboGap database.

    Science.gov (United States)

    Naghdi, Mohammad Reza; Smail, Katia; Wang, Joy X; Wade, Fallou; Breaker, Ronald R; Perreault, Jonathan

    2017-03-15

    The discovery of noncoding RNAs (ncRNAs) and their importance for gene regulation led us to develop bioinformatics tools to pursue the discovery of novel ncRNAs. Finding ncRNAs de novo is challenging, first due to the difficulty of retrieving large numbers of sequences for given gene activities, and second due to exponential demands on calculation needed for comparative genomics on a large scale. Recently, several tools for the prediction of conserved RNA secondary structure were developed, but many of them are not designed to uncover new ncRNAs, or are too slow for conducting analyses on a large scale. Here we present various approaches using the database RiboGap as a primary tool for finding known ncRNAs and for uncovering simple sequence motifs with regulatory roles. This database also can be used to easily extract intergenic sequences of eubacteria and archaea to find conserved RNA structures upstream of given genes. We also show how to extend analysis further to choose the best candidate ncRNAs for experimental validation. Copyright © 2017 Elsevier Inc. All rights reserved.

  11. Annotating the human genome with Disease Ontology

    Science.gov (United States)

    Osborne, John D; Flatow, Jared; Holko, Michelle; Lin, Simon M; Kibbe, Warren A; Zhu, Lihua (Julie); Danila, Maria I; Feng, Gang; Chisholm, Rex L

    2009-01-01

    Background The human genome has been extensively annotated with Gene Ontology for biological functions, but minimally computationally annotated for diseases. Results We used the Unified Medical Language System (UMLS) MetaMap Transfer tool (MMTx) to discover gene-disease relationships from the GeneRIF database. We utilized a comprehensive subset of UMLS, which is disease-focused and structured as a directed acyclic graph (the Disease Ontology), to filter and interpret results from MMTx. The results were validated against the Homayouni gene collection using recall and precision measurements. We compared our results with the widely used Online Mendelian Inheritance in Man (OMIM) annotations. Conclusion The validation data set suggests a 91% recall rate and 97% precision rate of disease annotation using GeneRIF, in contrast with a 22% recall and 98% precision using OMIM. Our thesaurus-based approach allows for comparisons to be made between disease containing databases and allows for increased accuracy in disease identification through synonym matching. The much higher recall rate of our approach demonstrates that annotating human genome with Disease Ontology and GeneRIF for diseases dramatically increases the coverage of the disease annotation of human genome. PMID:19594883

  12. Genomewide analysis of MATE-type gene family in maize reveals ...

    Indian Academy of Sciences (India)

    Huasheng Zhu and Jiandong Wu contributed equally to this work. As a group of secondary active transporters, the MATE gene family consists of multiple genes that widely exist in ..... Roots of the stress-treated plants were collected at 0,.

  13. TreeFam: a curated database of phylogenetic trees of animal gene families

    DEFF Research Database (Denmark)

    Li, Heng; Coghlan, Avril; Ruan, Jue

    2006-01-01

    TreeFam is a database of phylogenetic trees of gene families found in animals. It aims to develop a curated resource that presents the accurate evolutionary history of all animal gene families, as well as reliable ortholog and paralog assignments. Curated families are being added progressively......, based on seed alignments and trees in a similar fashion to Pfam. Release 1.1 of TreeFam contains curated trees for 690 families and automatically generated trees for another 11 646 families. These represent over 128 000 genes from nine fully sequenced animal genomes and over 45 000 other animal proteins...

  14. Identification of a novel gene family that includes the interferon-inducible human genes 6–16 and ISG12

    Directory of Open Access Journals (Sweden)

    Parker Nadeene

    2004-01-01

    Full Text Available Abstract Background The human 6–16 and ISG12 genes are transcriptionally upregulated in a variety of cell types in response to type I interferon (IFN. The predicted products of these genes are small (12.9 and 11.5 kDa respectively, hydrophobic proteins that share 36% overall amino acid identity. Gene disruption and over-expression studies have so far failed to reveal any biochemical or cellular roles for these proteins. Results We have used in silico analyses to identify a novel family of genes (the ISG12 gene family related to both the human 6–16 and ISG12 genes. Each ISG12 family member codes for a small hydrophobic protein containing a conserved ~80 amino-acid motif (the ISG12 motif. So far we have detected 46 family members in 25 organisms, ranging from unicellular eukaryotes to humans. Humans have four ISG12 genes: the 6–16 gene at chromosome 1p35 and three genes (ISG12(a, ISG12(b and ISG12(c clustered at chromosome 14q32. Mice have three family members (ISG12(a, ISG12(b1 and ISG12(b2 clustered at chromosome 12F1 (syntenic with human chromosome 14q32. There does not appear to be a murine 6–16 gene. On the basis of phylogenetic analyses, genomic organisation and intron-alignments we suggest that this family has arisen through divergent inter- and intra-chromosomal gene duplication events. The transcripts from human and mouse genes are detectable, all but two (human ISG12(b and ISG12(c being upregulated in response to type I IFN in the cell lines tested. Conclusions Members of the eukaryotic ISG12 gene family encode a small hydrophobic protein with at least one copy of a newly defined motif of ~80 amino-acids (the ISG12 motif. In higher eukaryotes, many of the genes have acquired a responsiveness to type I IFN during evolution suggesting that a role in resisting cellular or environmental stress may be a unifying property of all family members. Analysis of gene-function in higher eukaryotes is complicated by the possibility of

  15. Genomic resources for gene discovery, functional genome annotation, and evolutionary studies of maize and its close relatives.

    Science.gov (United States)

    Wang, Chao; Shi, Xue; Liu, Lin; Li, Haiyan; Ammiraju, Jetty S S; Kudrna, David A; Xiong, Wentao; Wang, Hao; Dai, Zhaozhao; Zheng, Yonglian; Lai, Jinsheng; Jin, Weiwei; Messing, Joachim; Bennetzen, Jeffrey L; Wing, Rod A; Luo, Meizhong

    2013-11-01

    Maize is one of the most important food crops and a key model for genetics and developmental biology. A genetically anchored and high-quality draft genome sequence of maize inbred B73 has been obtained to serve as a reference sequence. To facilitate evolutionary studies in maize and its close relatives, much like the Oryza Map Alignment Project (OMAP) (www.OMAP.org) bacterial artificial chromosome (BAC) resource did for the rice community, we constructed BAC libraries for maize inbred lines Zheng58, Chang7-2, and Mo17 and maize wild relatives Zea mays ssp. parviglumis and Tripsacum dactyloides. Furthermore, to extend functional genomic studies to maize and sorghum, we also constructed binary BAC (BIBAC) libraries for the maize inbred B73 and the sorghum landrace Nengsi-1. The BAC/BIBAC vectors facilitate transfer of large intact DNA inserts from BAC clones to the BIBAC vector and functional complementation of large DNA fragments. These seven Zea Map Alignment Project (ZMAP) BAC/BIBAC libraries have average insert sizes ranging from 92 to 148 kb, organellar DNA from 0.17 to 2.3%, empty vector rates between 0.35 and 5.56%, and genome equivalents of 4.7- to 8.4-fold. The usefulness of the Parviglumis and Tripsacum BAC libraries was demonstrated by mapping clones to the reference genome. Novel genes and alleles present in these ZMAP libraries can now be used for functional complementation studies and positional or homology-based cloning of genes for translational genomics.

  16. IIS--Integrated Interactome System: a web-based platform for the annotation, analysis and visualization of protein-metabolite-gene-drug interactions by integrating a variety of data sources and tools.

    Science.gov (United States)

    Carazzolle, Marcelo Falsarella; de Carvalho, Lucas Miguel; Slepicka, Hugo Henrique; Vidal, Ramon Oliveira; Pereira, Gonçalo Amarante Guimarães; Kobarg, Jörg; Meirelles, Gabriela Vaz

    2014-01-01

    High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two

  17. Cytokinin Regulation of Gene Expression in the AHP Gene Family in Arabidopsis thaliana

    Czech Academy of Sciences Publication Activity Database

    Hradilová, Jana; Malbeck, Jiří; Brzobohatý, Břetislav

    2007-01-01

    Roč. 26, č. 3 (2007), s. 229-244 ISSN 0721-7595 R&D Projects: GA MŠk LN00A081; GA MŠk 1M06030; GA MŠk(CZ) LC06034; GA AV ČR(CZ) IAA600380507; GA AV ČR IAA600040612 Institutional research plan: CEZ:AV0Z50380511; CEZ:AV0Z50040702 Source of funding: V - iné verejné zdroje ; V - iné verejné zdroje ; V - iné verejné zdroje ; V - iné verejné zdroje Keywords : gene expression * AHP gene family * cytokinin signal transduction Subject RIV: EF - Botanics Impact factor: 2.220, year: 2007

  18. Identification and expression profiling analysis of TCP family genes involved in growth and development in maize.

    Science.gov (United States)

    Chai, Wenbo; Jiang, Pengfei; Huang, Guoyu; Jiang, Haiyang; Li, Xiaoyu

    2017-10-01

    The TCP family is a group of plant-specific transcription factors. TCP genes encode proteins harboring bHLH structure, which is implicated in DNA binding and protein-protein interactions and known as the TCP domain. TCP genes play important roles in plant development and have been evolutionarily and functionally elaborated in various plants, however, no overall phylogenetic analysis or expression profiling of TCP genes in Zea mays has been reported. In the present study, a systematic analysis of molecular evolution and functional prediction of TCP family genes in maize ( Z . mays L.) has been conducted. We performed a genome-wide survey of TCP genes in maize, revealing the gene structure, chromosomal location and phylogenetic relationship of family members. Microsynteny between grass species and tissue-specific expression profiles were also investigated. In total, 29 TCP genes were identified in the maize genome, unevenly distributed on the 10 maize chromosomes. Additionally, ZmTCP genes were categorized into nine classes based on phylogeny and purifying selection may largely be responsible for maintaining the functions of maize TCP genes. What's more, microsynteny analysis suggested that TCP genes have been conserved during evolution. Finally, expression analysis revealed that most TCP genes are expressed in the stem and ear, which suggests that ZmTCP genes influence stem and ear growth. This result is consistent with the previous finding that maize TCP genes represses the growth of axillary organs and enables the formation of female inflorescences. Altogether, this study presents a thorough overview of TCP family in maize and provides a new perspective on the evolution of this gene family. The results also indicate that TCP family genes may be involved in development stage in plant growing conditions. Additionally, our results will be useful for further functional analysis of the TCP gene family in maize.

  19. Molecular cloning of RBCS genes in Selaginella and the evolution of the rbcS gene family

    Directory of Open Access Journals (Sweden)

    Wang Bo

    2015-01-01

    Full Text Available Rubisco small subunits (RBCS are encoded by a nuclear rbcS multigene family in higher plants and green algae. However, owing to the lack of rbcS sequences in lycophytes, the characteristics of rbcS genes in lycophytes is unclear. Recently, the complete genome sequence of the lycophyte Selaginella moellendorffii provided the first insight into the rbcS gene family in lycophytes. To understand further the characteristics of rbcS genes in other Selaginella, the full length of rbcS genes (rbcS1 and rbcS2 from two other Selaginella species were isolated. Both rbcS1 and rbcS2 genes shared more than 97% identity among three Selaginella species. RBCS proteins from Selaginella contained the Pfam RBCS domain F00101, which was a major domain of other plant RBCS proteins. To explore the evolution of the rbcS gene family across Selaginella and other plants, we identified and performed comparative analysis of the rbcS gene family among 16 model plants based on a genome-wide analysis. The results showed that (i two rbcS genes were obtained in Selaginella, which is the second fewest number of rbcS genes among the 16 representative plants; (ii an expansion of rbcS genes occurred in the moss Physcomitrella patens; (iii only RBCS proteins from angiosperms contained the Pfam PF12338 domains, and (iv a pattern of concerted evolution existed in the rbcS gene family. Our study provides new insights into the evolution of the rbcS gene family in Selaginella and other plants.

  20. Molecular characterization of edestin gene family in Cannabis sativa L.

    Science.gov (United States)

    Docimo, Teresa; Caruso, Immacolata; Ponzoni, Elena; Mattana, Monica; Galasso, Incoronata

    2014-11-01

    Globulins are the predominant class of seed storage proteins in a wide variety of plants. In many plant species globulins are present in several isoforms encoded by gene families. The major seed storage protein of Cannabis sativa L. is the globulin edestin, widely known for its nutritional potential. In this work, we report the isolation of seven cDNAs encoding for edestin from the C. sativa variety Carmagnola. Southern blot hybridization is in agreement with the number of identified edestin genes. All seven sequences showed the characteristic globulin features, but they result to be divergent members/forms of two edestin types. According to their sequence similarity four forms named CsEde1A, CsEde1B, CsEde1C, CsEde1D have been assigned to the edestin type 1 and the three forms CsEde2A, CsEde2B, CsEde2C to the edestin type 2. Analysis of the coding sequences revealed a high percentage of similarity (98-99%) among the different forms belonging to the same type, which decreased significantly to approximately 64% between the forms belonging to different types. Quantitative RT-PCR analysis revealed that both edestin types are expressed in developing hemp seeds and the amount of CsEde1 was 4.44 ± 0.10 higher than CsEde2. Both edestin types exhibited a high percentage of arginine (11-12%), but CsEde2 resulted particularly rich in methionine residues (2.36%) respect to CsEde1 (0.82%). The amino acid composition determined in CsEde1 and CsEde2 types suggests that these seed proteins can be used to improve the nutritional quality of plant food-stuffs. Copyright © 2014 Elsevier Masson SAS. All rights reserved.

  1. Concept annotation in the CRAFT corpus.

    Science.gov (United States)

    Bada, Michael; Eckert, Miriam; Evans, Donald; Garcia, Kristin; Shipley, Krista; Sitnikov, Dmitry; Baumgartner, William A; Cohen, K Bretonnel; Verspoor, Karin; Blake, Judith A; Hunter, Lawrence E

    2012-07-09

    Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text. This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP) community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released). Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement. As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens), our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection), the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are freely available at http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml.

  2. The red deer Cervus elaphus genome CerEla1.0: sequencing, annotating, genes, and chromosomes.

    Science.gov (United States)

    Bana, Nóra Á; Nyiri, Anna; Nagy, János; Frank, Krisztián; Nagy, Tibor; Stéger, Viktor; Schiller, Mátyás; Lakatos, Péter; Sugár, László; Horn, Péter; Barta, Endre; Orosz, László

    2018-01-02

    We present here the de novo genome assembly CerEla1.0 for the red deer, Cervus elaphus, an emblematic member of the natural megafauna of the Northern Hemisphere. Humans spread the species in the South. Today, the red deer is also a farm-bred animal and is becoming a model animal in biomedical and population studies. Stag DNA was sequenced at 74× coverage by Illumina technology. The ALLPATHS-LG assembly of the reads resulted in 34.7 × 10 3 scaffolds, 26.1 × 10 3 of which were utilized in Cer.Ela1.0. The assembly spans 3.4 Gbp. For building the red deer pseudochromosomes, a pre-established genetic map was used for main anchor points. A nearly complete co-linearity was found between the mapmarker sequences of the deer genetic map and the order and orientation of the orthologous sequences in the syntenic bovine regions. Syntenies were also conserved at the in-scaffold level. The cM distances corresponded to 1.34 Mbp uniformly along the deer genome. Chromosomal rearrangements between deer and cattle were demonstrated. 2.8 × 10 6 SNPs, 365 × 10 3 indels and 19368 protein-coding genes were identified in CerEla1.0, along with positions for centromerons. CerEla1.0 demonstrates the utilization of dual references, i.e., when a target genome (here C. elaphus) already has a pre-established genetic map, and is combined with the well-established whole genome sequence of a closely related species (here Bos taurus). Genome-wide association studies (GWAS) that CerEla1.0 (NCBI, MKHE00000000) could serve for are discussed.

  3. Investigation of genes encoding calcineurin B-like protein family in legumes and their expression analyses in chickpea (Cicer arietinum L..

    Directory of Open Access Journals (Sweden)

    Mukesh Kumar Meena

    Full Text Available Calcium ion (Ca2+ is a ubiquitous second messenger that transmits various internal and external signals including stresses and, therefore, is important for plants' response process. Calcineurin B-like proteins (CBLs are one of the plant calcium sensors, which sense and convey the changes in cytosolic Ca2+-concentration for response process. A search in four leguminous plant (soybean, Medicago truncatula, common bean and chickpea genomes identified 9 to 15 genes in each species that encode CBL proteins. Sequence analyses of CBL peptides and coding sequences (CDS suggested that there are nine original CBL genes in these legumes and some of them were multiplied during whole genome or local gene duplication. Coding sequences of chickpea CBL genes (CaCBL were cloned from their cDNAs and sequenced, and their annotations in the genome assemblies were corrected accordingly. Analyses of protein sequences and gene structures of CBL family in plant kingdom indicated its diverse origin but showed a remarkable conservation in overall protein structure with appearance of complex gene structure in the course of evolution. Expression of CaCBL genes in different tissues and in response to different stress and hormone treatment were studied. Most of the CaCBL genes exhibited high expression in flowers. Expression profile of CaCBL genes in response to different abiotic stresses and hormones related to development and stresses (ABA, auxin, cytokinin, SA and JA at different time intervals suggests their diverse roles in development and plant defence in addition to abiotic stress tolerance. These data not only contribute to a better understanding of the complex regulation of chickpea CBL gene family, but also provide valuable information for further research in chickpea functional genomics.

  4. Investigation of genes encoding calcineurin B-like protein family in legumes and their expression analyses in chickpea (Cicer arietinum L.).

    Science.gov (United States)

    Meena, Mukesh Kumar; Ghawana, Sanjay; Sardar, Atish; Dwivedi, Vikas; Khandal, Hitaishi; Roy, Riti; Chattopadhyay, Debasis

    2015-01-01

    Calcium ion (Ca2+) is a ubiquitous second messenger that transmits various internal and external signals including stresses and, therefore, is important for plants' response process. Calcineurin B-like proteins (CBLs) are one of the plant calcium sensors, which sense and convey the changes in cytosolic Ca2+-concentration for response process. A search in four leguminous plant (soybean, Medicago truncatula, common bean and chickpea) genomes identified 9 to 15 genes in each species that encode CBL proteins. Sequence analyses of CBL peptides and coding sequences (CDS) suggested that there are nine original CBL genes in these legumes and some of them were multiplied during whole genome or local gene duplication. Coding sequences of chickpea CBL genes (CaCBL) were cloned from their cDNAs and sequenced, and their annotations in the genome assemblies were corrected accordingly. Analyses of protein sequences and gene structures of CBL family in plant kingdom indicated its diverse origin but showed a remarkable conservation in overall protein structure with appearance of complex gene structure in the course of evolution. Expression of CaCBL genes in different tissues and in response to different stress and hormone treatment were studied. Most of the CaCBL genes exhibited high expression in flowers. Expression profile of CaCBL genes in response to different abiotic stresses and hormones related to development and stresses (ABA, auxin, cytokinin, SA and JA) at different time intervals suggests their diverse roles in development and plant defence in addition to abiotic stress tolerance. These data not only contribute to a better understanding of the complex regulation of chickpea CBL gene family, but also provide valuable information for further research in chickpea functional genomics.

  5. The IQD gene family in soybean: structure, phylogeny, evolution and expression.

    Directory of Open Access Journals (Sweden)

    Lin Feng

    Full Text Available Members of the plant-specific IQ67-domain (IQD protein family are involved in plant development and the basal defense response. Although systematic characterization of this family has been carried out in Arabidopsis, tomato (Solanum lycopersicum, Brachypodium distachyon and rice (Oryza sativa, systematic analysis and expression profiling of this gene family in soybean (Glycine max have not previously been reported. In this study, we identified and structurally characterized IQD genes in the soybean genome. A complete set of 67 soybean IQD genes (GmIQD1-67 was identified using Blast search tools, and the genes were clustered into four subfamilies (IQD I-IV based on phylogeny. These soybean IQD genes are distributed unevenly across all 20 chromosomes, with 30 segmental duplication events, suggesting that segmental duplication has played a major role in the expansion of the soybean IQD gene family. Analysis of the Ka/Ks ratios showed that the duplicated genes of the GmIQD family primarily underwent purifying selection. Microsynteny was detected in most pairs: genes in clade 1-3 might be present in genome regions that were inverted, expanded or contracted after the divergence; most gene pairs in clade 4 showed high conservation with little rearrangement among these gene-residing regions. Of the soybean IQD genes examined, six were most highly expressed in young leaves, six in flowers, one in roots and two in nodules. Our qRT-PCR analysis of 24 soybean IQD III genes confirmed that these genes are regulated by MeJA stress. Our findings present a comprehensive overview of the soybean IQD gene family and provide insights into the evolution of this family. In addition, this work lays a solid foundation for further experiments aimed at determining the biological functions of soybean IQD genes in growth and development.

  6. Diagnostic Yield of Sequencing Familial Hypercholesterolemia Genes in Severe Hypercholesterolemia

    Science.gov (United States)

    Khera, Amit V.; Won, Hong-Hee; Peloso, Gina M.; Lawson, Kim S.; Bartz, Traci M.; Deng, Xuan; van Leeuwen, Elisabeth M.; Natarajan, Pradeep; Emdin, Connor A.; Bick, Alexander G.; Morrison, Alanna C.; Brody, Jennifer A.; Gupta, Namrata; Nomura, Akihiro; Kessler, Thorsten; Duga, Stefano; Bis, Joshua C.; van Duijn, Cornelia M.; Cupples, L. Adrienne; Psaty, Bruce; Rader, Daniel J.; Danesh, John; Schunkert, Heribert; McPherson, Ruth; Farrall, Martin; Watkins, Hugh; Lander, Eric; Wilson, James G.; Correa, Adolfo; Boerwinkle, Eric; Merlini, Piera Angelica; Ardissino, Diego; Saleheen, Danish; Gabriel, Stacey; Kathiresan, Sekar

    2017-01-01

    Background About 7% of US adults have severe hypercholesterolemia (untreated LDL cholesterol ≥190 mg/dl). Such high LDL levels may be due to familial hypercholesterolemia (FH), a condition caused by a single mutation in any of three genes. Lifelong elevations in LDL cholesterol in FH mutation carriers may confer CAD risk beyond that captured by a single LDL cholesterol measurement. Objectives Assess the prevalence of a FH mutation among those with severe hypercholesterolemia and determine whether CAD risk varies according to mutation status beyond the observed LDL cholesterol. Methods Three genes causative for FH (LDLR, APOB, PCSK9) were sequenced in 26,025 participants from 7 case-control studies (5,540 CAD cases, 8,577 CAD-free controls) and 5 prospective cohort studies (11,908 participants). FH mutations included loss-of-function variants in LDLR, missense mutations in LDLR predicted to be damaging, and variants linked to FH in ClinVar, a clinical genetics database. Results Among 8,577 CAD-free control participants, 430 had LDL cholesterol ≥190 mg/dl; of these, only eight (1.9%) carried a FH mutation. Similarly, among 11,908 participants from 5 prospective cohorts, 956 had LDL cholesterol ≥190 mg/dl and of these, only 16 (1.7%) carried a FH mutation. Within any stratum of observed LDL cholesterol, risk of CAD was higher among FH mutation carriers when compared with non-carriers. When compared to a reference group with LDL cholesterol <130 mg/dl and no mutation, participants with LDL cholesterol ≥190 mg/dl and no FH mutation had six-fold higher risk for CAD (OR 6.0; 95%CI 5.2–6.9) whereas those with LDL cholesterol ≥190 mg/dl as well as a FH mutation demonstrated twenty-two fold increased risk (OR 22.3; 95%CI 10.7–53.2). Conclusions Among individuals with LDL cholesterol ≥190 mg/dl, gene sequencing identified a FH mutation in <2%. However, for any given observed LDL cholesterol, FH mutation carriers are at substantially increased risk for CAD

  7. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes

    Energy Technology Data Exchange (ETDEWEB)

    Brettin, Thomas; Davis, James J.; Disz, Terry; Edwards, Robert A.; Gerdes, Svetlana; Olsen, Gary J.; Olson, Robert; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D.; Shukla, Maulik; Thomason, James A.; Stevens, Rick; Vonstein, Veronika; Wattam, Alice R.; Xia, Fangfang

    2015-02-10

    The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.

  8. RASTtk: a modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes.

    Science.gov (United States)

    Brettin, Thomas; Davis, James J; Disz, Terry; Edwards, Robert A; Gerdes, Svetlana; Olsen, Gary J; Olson, Robert; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D; Shukla, Maulik; Thomason, James A; Stevens, Rick; Vonstein, Veronika; Wattam, Alice R; Xia, Fangfang

    2015-02-10

    The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.

  9. Assembly, Annotation, and Analysis of Multiple Mycorrhizal Fungal Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Initiative Consortium, Mycorrhizal Genomics; Kuo, Alan; Grigoriev, Igor; Kohler, Annegret; Martin, Francis

    2013-03-08

    Mycorrhizal fungi play critical roles in host plant health, soil community structure and chemistry, and carbon and nutrient cycling, all areas of intense interest to the US Dept. of Energy (DOE) Joint Genome Institute (JGI). To this end we are building on our earlier sequencing of the Laccaria bicolor genome by partnering with INRA-Nancy and the mycorrhizal research community in the MGI to sequence and analyze dozens of mycorrhizal genomes of all Basidiomycota and Ascomycota orders and multiple ecological types (ericoid, orchid, and ectomycorrhizal). JGI has developed and deployed high-throughput sequencing techniques, and Assembly, RNASeq, and Annotation Pipelines. In 2012 alone we sequenced, assembled, and annotated 12 draft or improved genomes of mycorrhizae, and predicted ~;;232831 genes and ~;;15011 multigene families, All of this data is publicly available on JGI MycoCosm (http://jgi.doe.gov/fungi/), which provides access to both the genome data and tools with which to analyze the data. Preliminary comparisons of the current total of 14 public mycorrhizal genomes suggest that 1) short secreted proteins potentially involved in symbiosis are more enriched in some orders than in others amongst the mycorrhizal Agaricomycetes, 2) there are wide ranges of numbers of genes involved in certain functional categories, such as signal transduction and post-translational modification, and 3) novel gene families are specific to some ecological types.

  10. Search for intracranial aneurysm susceptibility gene(s using Finnish families

    Directory of Open Access Journals (Sweden)

    Ryynänen Markku

    2002-08-01

    Full Text Available Abstract Background Cerebrovascular disease is the third leading cause of death in the United States, and about one-fourth of cerebrovascular deaths are attributed to ruptured intracranial aneurysms (IA. Epidemiological evidence suggests that IAs cluster in families, and are therefore probably genetic. Identification of individuals at risk for developing IAs by genetic tests will allow concentration of diagnostic imaging on high-risk individuals. We used model-free linkage analysis based on allele sharing with a two-stage design for a genome-wide scan to identify chromosomal regions that may harbor IA loci. Methods We previously estimated sibling relative risk in the Finnish population at between 9 and 16, and proceeded with a genome-wide scan for loci predisposing to IA. In 85 Finnish families with two or more affected members, 48 affected sibling pairs (ASPs were available for our genetic study. Power calculations indicated that 48 ASPs were adequate to identify chromosomal regions likely to harbor predisposing genes and that a liberal stage I lod score threshold of 0.8 provided a reasonable balance between detection of false positive regions and failure to detect real loci with moderate effect. Results Seven chromosomal regions exceeded the stage I lod score threshold of 0.8 and five exceeded 1.0. The most significant region, on chromosome 19q, had a maximum multipoint lod score (MLS of 2.6. Conclusions Our study provides evidence for the locations of genes predisposing to IA. Further studies are necessary to elucidate the genes and their role in the pathophysiology of IA, and to design genetic tests.

  11. Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads.

    Science.gov (United States)

    Huson, Daniel H; Tappu, Rewati; Bazinet, Adam L; Xie, Chao; Cummings, Michael P; Nieselt, Kay; Williams, Rohan

    2017-01-25

    Microbiome sequencing projects typically collect tens of millions of short reads per sample. Depending on the goals of the project, the short reads can either be subjected to direct sequence analysis or be assembled into longer contigs. The assembly of whole genomes from metagenomic sequencing reads is a very difficult problem. However, for some questions, only specific genes of interest need to be assembled. This is then a gene-centric assembly where the goal is to assemble reads into contigs for a family of orthologous genes. We present a new method for performing gene-centric assembly, called protein-alignment-guided assembly, and provide an implementation in our metagenome analysis tool MEGAN. Genes are assembled on the fly, based on the alignment of all reads against a protein reference database such as NCBI-nr. Specifically, the user selects a gene family based on a classification such as KEGG and all reads binned to that gene family are assembled. Using published synthetic community metagenome sequencing reads and a set of 41 gene families, we show that the performance of this approach compares favorably with that of full-featured assemblers and that of a recently published HMM-based gene-centric assembler, both in terms of the number of reference genes detected and of the percentage of reference sequence covered. Protein-alignment-guided assembly of orthologous gene families complements whole-metagenome assembly in a new and very useful way.

  12. annot8r: GO, EC and KEGG annotation of EST datasets

    Directory of Open Access Journals (Sweden)

    Schmid Ralf

    2008-04-01

    Full Text Available Abstract Background The expressed sequence tag (EST methodology is an attractive option for the generation of sequence data for species for which no completely sequenced genome is available. The annotation and comparative analysis of such datasets poses a formidable challenge for research groups that do not have the bioinformatics infrastructure of major genome sequencing centres. Therefore, there is a need for user-friendly tools to facilitate the annotation of non-model species EST datasets with well-defined ontologies that enable meaningful cross-species comparisons. To address this, we have developed annot8r, a platform for the rapid annotation of EST datasets with GO-terms, EC-numbers and KEGG-pathways. Results annot8r automatically downloads all files relevant for the annotation process and generates a reference database that stores UniProt entries, their associated Gene Ontology (GO, Enzyme Commission (EC and Kyoto Encyclopaedia of Genes and Genomes (KEGG annotation and additional relevant data. For each of GO, EC and KEGG, annot8r extracts a specific sequence subset from the UniProt dataset based on the information stored in the reference database. These three subsets are then formatted for BLAST searches. The user provides the protein or nucleotide sequences to be annotated and annot8r runs BLAST searches against these three subsets. The BLAST results are parsed and the corresponding annotations retrieved from the reference database. The annotations are saved both as flat files and also in a relational postgreSQL results database to facilitate more advanced searches within the results. annot8r is integrated with the PartiGene suite of EST analysis tools. Conclusion annot8r is a tool that assigns GO, EC and KEGG annotations for data sets resulting from EST sequencing projects both rapidly and efficiently. The benefits of an underlying relational database, flexibility and the ease of use of the program make it ideally suited for non

  13. Molecular analysis of the NDP gene in two families with Norrie disease.

    Science.gov (United States)

    Rivera-Vega, M Refugio; Chiñas-Lopez, Silvet; Vaca, Ana Luisa Jimenez; Arenas-Sordo, M Luz; Kofman-Alfaro, Susana; Messina-Baas, Olga; Cuevas-Covarrubias, Sergio Alberto

    2005-04-01

    To describe the molecular defects in the Norrie disease protein (NDP) gene in two families with Norrie disease (ND). We analysed two families with ND at molecular level through polymerase chain reaction, DNA sequence analysis and GeneScan. Two molecular defects found in the NDP gene were: a missense mutation (265C > G) within codon 97 that resulted in the interchange of arginine by proline, and a partial deletion in the untranslated 3' region of exon 3 of the NDP gene. Clinical findings were more severe in the family that presented the partial deletion. We also diagnosed the carrier status of one daughter through GeneScan; this method proved to be a useful tool for establishing female carriers of ND. Here we report two novel mutations in the NDP gene in Mexican patients and propose that GeneScan is a viable mean of establishing ND carrier status.

  14. FGF: A web tool for Fishing Gene Family in a whole genome database

    DEFF Research Database (Denmark)

    Zheng, Hongkun; Shi, Junjie; Fang, Xiaodong

    2007-01-01

    to efficiently search for and identify gene families. The FGF output displays the results as visual phylogenetic trees including information on gene structure, chromosome position, duplication fate and selective pressure. It is particularly useful to identify pseudogenes and detect changes in gene structure. FGF...

  15. FGF: A web tool for Fishing Gene Family in a whole genome database

    DEFF Research Database (Denmark)

    Zheng, Hongkun; Shi, Junjie; Fang, Xiaodong

    2007-01-01

    Gene duplication is an important process in evolution. The availability of genome sequences of a number of organisms has made it possible to conduct comprehensive searches for duplicated genes enabling informative studies of their evolution. We have established the FGF (Fishing Gene Family) progr...... is freely available on a web server at http://fgf.genomics.org.cn/...

  16. Multiple independent insertions of 5S rRNA genes in the spliced-leader gene family of trypanosome species.

    Science.gov (United States)

    Beauparlant, Marc A; Drouin, Guy

    2014-02-01

    Analyses of the 5S rRNA genes found in the spliced-leader (SL) gene repeat units of numerous trypanosome species suggest that such linkages were not inherited from a common ancestor, but were the result of independent 5S rRNA gene insertions. In trypanosomes, 5S rRNA genes are found either in the tandemly repeated units coding for SL genes or in independent tandemly repeated units. Given that trypanosome species where 5S rRNA genes are within the tandemly repeated units coding for SL genes are phylogenetically related, one might hypothesize that this arrangement is the result of an ancestral insertion of 5S rRNA genes into the tandemly repeated SL gene family of trypanosomes. Here, we use the types of 5S rRNA genes found associated with SL genes, the flanking regions of the inserted 5S rRNA genes and the position of these insertions to show that most of the 5S rRNA genes found within SL gene repeat units of trypanosome species were not acquired from a common ancestor but are the results of independent insertions. These multiple 5S rRNA genes insertion events in trypanosomes are likely the result of frequent founder events in different hosts and/or geographical locations in species having short generation times.

  17. Gender in childhood obesity: family environment, hormones, and genes.

    Science.gov (United States)

    Wisniewski, Amy B; Chernausek, Steven D

    2009-01-01

    The prevalence of obesity among children in the United States represents a pool of latent morbidity. Though the prevalence of obesity has increased in both boys and girls, the causes and consequences differ between the sexes. Thus, interventions proposed to treat and prevent childhood obesity will need to account for these differences. This review examines gender differences in the presentation of obesity in children and describes environmental, hormonal, and genetic factors that contribute to observed gender differences. A search of peer-reviewed, published literature was performed with PubMed for articles published from January 1974 through October 2008. Search terms used were obesity, sex, gender, hormones, family environment, body composition, adiposity, and genes. Studies of children aged 0 to 18 years were included, and only articles published in English were reviewed for consideration. Articles that illustrated gender differences in either the presentation or underlying mechanisms of obesity in children were reviewed for content, and their bibliographies were used to identify other relevant literature. Gender differences in childhood obesity have been understudied partially because of how we define the categories of overweight and obesity. Close examination of studies revealed that gender differences were common, both before and during puberty. Boys and girls differ in body composition, patterns of weight gain, hormone biology, and the susceptibility to certain social, ethnic, genetic, and environmental factors. Our understanding of how gender differences in pediatric populations relate to the pathogenesis of obesity and the subsequent development of associated comorbid states is critical to developing and implementing both therapeutic and preventive interventions.

  18. Extensive lineage-specific gene duplication and evolution of the spiggin multi-gene family in stickleback

    Directory of Open Access Journals (Sweden)

    Nishida Mutsumi

    2007-11-01

    Full Text Available Abstract Background The threespine stickleback (Gasterosteus aculeatus has a characteristic reproductive mode; mature males build nests using a secreted glue-like protein called spiggin. Although recent studies reported multiple occurrences of genes that encode this glue-like protein spiggin in threespine and ninespine sticklebacks, it is still unclear how many genes compose the spiggin multi-gene family. Results Genome sequence analysis of threespine stickleback showed that there are at least five spiggin genes and two pseudogenes, whereas a single spiggin homolog occurs in the genomes of other fishes. Comparative genome sequence analysis demonstrated that Muc19, a single-copy mucous gene in human and mouse, is an ortholog of spiggin. Phylogenetic and molecular evolutionary analyses of these sequences suggested that an ancestral spiggin gene originated from a member of the mucin gene family as a single gene in the common ancestor of teleosts, and gene duplications of spiggin have occurred in the stickleback lineage. There was inter-population variation in the copy number of spiggin genes and positive selection on some codons, indicating that additional gene duplication/deletion events and adaptive evolution at some amino acid sites may have occurred in each stickleback population. Conclusion A number of spiggin genes exist in the threespine stickleback genome. Our results provide insight into the origin and dynamic evolutionary process of the spiggin multi-gene family in the threespine stickleback lineage. The dramatic evolution of genes for mucous substrates may have contributed to the generation of distinct characteristics such as "bio-glue" in vertebrates.

  19. Functional annotation of hierarchical modularity.

    Directory of Open Access Journals (Sweden)

    Kanchana Padmanabhan

    Full Text Available In biological networks of molecular interactions in a cell, network motifs that are biologically relevant are also functionally coherent, or form functional modules. These functionally coherent modules combine in a hierarchical manner into larger, less cohesive subsystems, thus revealing one of the essential design principles of system-level cellular organization and function-hierarchical modularity. Arguably, hierarchical modularity has not been explicitly taken into consideration by most, if not all, functional annotation systems. As a result, the existing methods would often fail to assign a statistically significant functional coherence score to biologically relevant molecular machines. We developed a methodology for hierarchical functional annotation. Given the hierarchical taxonomy of functional concepts (e.g., Gene Ontology and the association of individual genes or proteins with these concepts (e.g., GO terms, our method will assign a Hierarchical Modularity Score (HMS to each node in the hierarchy of functional modules; the HMS score and its p-value measure functional coherence of each module in the hierarchy. While existing methods annotate each module with a set of "enriched" functional terms in a bag of genes, our complementary method provides the hierarchical functional annotation of the modules and their hierarchically organized components. A hierarchical organization of functional modules often comes as a bi-product of cluster analysis of gene expression data or protein interaction data. Otherwise, our method will automatically build such a hierarchy by directly incorporating the functional taxonomy information into the hierarchy search process and by allowing multi-functional genes to be part of more than one component in the hierarchy. In addition, its underlying HMS scoring metric ensures that functional specificity of the terms across different levels of the hierarchical taxonomy is properly treated. We have evaluated our

  20. Ubiquitous Annotation Systems

    DEFF Research Database (Denmark)

    Hansen, Frank Allan

    2006-01-01

    Ubiquitous annotation systems allow users to annotate physical places, objects, and persons with digital information. Especially in the field of location based information systems much work has been done to implement adaptive and context-aware systems, but few efforts have focused on the general...... requirements for linking information to objects in both physical and digital space. This paper surveys annotation techniques from open hypermedia systems, Web based annotation systems, and mobile and augmented reality systems to illustrate different approaches to four central challenges ubiquitous annotation...... systems have to deal with: anchoring, structuring, presentation, and authoring. Through a number of examples each challenge is discussed and HyCon, a context-aware hypermedia framework developed at the University of Aarhus, Denmark, is used to illustrate an integrated approach to ubiquitous annotations...

  1. Undefined familial colorectal cancer and the role of pleiotropism in cancer susceptibility genes.

    Science.gov (United States)

    Dobbins, Sara E; Broderick, Peter; Chubb, Daniel; Kinnersley, Ben; Sherborne, Amy L; Houlston, Richard S

    2016-10-01

    Although family history is a major risk factor for colorectal cancer (CRC) a genetic diagnosis cannot be obtained in over 50 % of familial cases when screened for known CRC cancer susceptibility genes. The genetics of undefined-familial CRC is complex and recent studies have implied additional clinically actionable mutations for CRC in susceptibility genes for other cancers. To clarify the contribution of non-CRC susceptibility genes to undefined-familial CRC we conducted a mutational screen of 114 cancer susceptibility genes in 847 patients with early-onset undefined-familial CRC and 1609 controls by analysing high-coverage exome sequencing data. We implemented American College of Medical Genetics and Genomics standards and guidelines for assigning pathogenicity to variants. Globally across all 114 cancer susceptibility genes no statistically significant enrichment of likely pathogenic variants was shown (6.7 % cases 57/847, 5.3 % controls 85/1609; P = 0.15). Moreover there was no significant enrichment of mutations in genes such as TP53 or BRCA2 which have been proposed for clinical testing in CRC. In conclusion, while we identified genes that may be considered interesting candidates as determinants of CRC risk warranting further research, there is currently scant evidence to support a role for genes other than those responsible for established CRC syndromes in the clinical management of familial CRC.

  2. Snap: an integrated SNP annotation platform

    DEFF Research Database (Denmark)

    Li, Shengting; Ma, Lijia; Li, Heng

    2007-01-01

    Snap (Single Nucleotide Polymorphism Annotation Platform) is a server designed to comprehensively analyze single genes and relationships between genes basing on SNPs in the human genome. The aim of the platform is to facilitate the study of SNP finding and analysis within the framework of medical...

  3. Conservation, Divergence, and Genome-Wide Distribution of PAL and POX A Gene Families in Plants.

    Science.gov (United States)

    Rawal, H C; Singh, N K; Sharma, T R

    2013-01-01

    Genome-wide identification and phylogenetic and syntenic comparison were performed for the genes responsible for phenylalanine ammonia lyase (PAL) and peroxidase A (POX A) enzymes in nine plant species representing very diverse groups like legumes (Glycine max and Medicago truncatula), fruits (Vitis vinifera), cereals (Sorghum bicolor, Zea mays, and Oryza sativa), trees (Populus trichocarpa), and model dicot (Arabidopsis thaliana) and monocot (Brachypodium distachyon) species. A total of 87 and 1045 genes in PAL and POX A gene families, respectively, have been identified in these species. The phylogenetic and syntenic comparison along with motif distributions shows a high degree of conservation of PAL genes, suggesting that these genes may predate monocot/eudicot divergence. The POX A family genes, present in clusters at the subtelomeric regions of chromosomes, might be evolving and expanding with higher rate than the PAL gene family. Our analysis showed that during the expansion of POX A gene family, many groups and subgroups have evolved, resulting in a high level of functional divergence among monocots and dicots. These results will act as a first step toward the understanding of monocot/eudicot evolution and functional characterization of these gene families in the future.

  4. Conservation, Divergence, and Genome-Wide Distribution of PAL and POX A Gene Families in Plants

    Directory of Open Access Journals (Sweden)

    H. C. Rawal

    2013-01-01

    Full Text Available Genome-wide identification and phylogenetic and syntenic comparison were performed for the genes responsible for phenylalanine ammonia lyase (PAL and peroxidase A (POX A enzymes in nine plant species representing very diverse groups like legumes (Glycine max and Medicago truncatula, fruits (Vitis vinifera, cereals (Sorghum bicolor, Zea mays, and Oryza sativa, trees (Populus trichocarpa, and model dicot (Arabidopsis thaliana and monocot (Brachypodium distachyon species. A total of 87 and 1045 genes in PAL and POX A gene families, respectively, have been identified in these species. The phylogenetic and syntenic comparison along with motif distributions shows a high degree of conservation of PAL genes, suggesting that these genes may predate monocot/eudicot divergence. The POX A family genes, present in clusters at the subtelomeric regions of chromosomes, might be evolving and expanding with higher rate than the PAL gene family. Our analysis showed that during the expansion of POX A gene family, many groups and subgroups have evolved, resulting in a high level of functional divergence among monocots and dicots. These results will act as a first step toward the understanding of monocot/eudicot evolution and functional characterization of these gene families in the future.

  5. Transcriptional profiling of the human fibrillin/LTBP gene family, key regulators of mesenchymal cell functions

    DEFF Research Database (Denmark)

    Davis, Margaret R.; Andersson, Robin; Severin, Jessica

    2014-01-01

    in the structure of the extracellular matrix and controlling the bioavailability of TGFβ family members. Genes encoding these proteins show differential expression in mesenchymal cell types which synthesize the extracellular matrix. We have investigated the promoter regions of the seven gene family members using...... of the family members were expressed in a range of mesenchymal and other cell types, often associated with use of alternative promoters or transcription start sites within a promoter in different cell types. FBN3 was the lowest expressed gene, and was found only in embryonic and fetal tissues. The different...

  6. Identification of a novel Gig2 gene family specific to non-amniote vertebrates.

    Directory of Open Access Journals (Sweden)

    Yi-Bing Zhang

    Full Text Available Gig2 (grass carp reovirus (GCRV-induced gene 2 is first identified as a novel fish interferon (IFN-stimulated gene (ISG. Overexpression of a zebrafish Gig2 gene can protect cultured fish cells from virus infection. In the present study, we identify a novel gene family that is comprised of genes homologous to the previously characterized Gig2. EST/GSS search and in silico cloning identify 190 Gig2 homologous genes in 51 vertebrate species ranged from lampreys to amphibians. Further large-scale search of vertebrate and invertebrate genome databases indicate that Gig2 gene family is specific to non-amniotes including lampreys, sharks/rays, ray-finned fishes and amphibians. Phylogenetic analysis and synteny analysis reveal lineage-specific expansion of Gig2 gene family and also provide valuable evidence for the fish-specific genome duplication (FSGD hypothesis. Although Gig2 family proteins exhibit no significant sequence similarity to any known proteins, a typical Gig2 protein appears to consist of two conserved parts: an N-terminus that bears very low homology to the catalytic domains of poly(ADP-ribose polymerases (PARPs, and a novel C-terminal domain that is unique to this gene family. Expression profiling of zebrafish Gig2 family genes shows that some duplicate pairs have diverged in function via acquisition of novel spatial and/or temporal expression under stresses. The specificity of this gene family to non-amniotes might contribute to a large extent to distinct physiology in non-amniote vertebrates.

  7. Genetic control of functional traits related to photosynthesis and water use efficiency in Pinus pinaster Ait. drought response: integration of genome annotation, allele association and QTL detection for candidate gene identification.

    Science.gov (United States)

    de Miguel, Marina; Cabezas, José-Antonio; de María, Nuria; Sánchez-Gómez, David; Guevara, María-Ángeles; Vélez, María-Dolores; Sáez-Laguna, Enrique; Díaz, Luis-Manuel; Mancha, Jose-Antonio; Barbero, María-Carmen; Collada, Carmen; Díaz-Sala, Carmen; Aranda, Ismael; Cervera, María-Teresa

    2014-06-12

    Understanding molecular mechanisms that control photosynthesis and water use efficiency in response to drought is crucial for plant species from dry areas. This study aimed to identify QTL for these traits in a Mediterranean conifer and tested their stability under drought. High density linkage maps for Pinus pinaster were used in the detection of QTL for photosynthesis and water use efficiency at three water irrigation regimes. A total of 28 significant and 27 suggestive QTL were found. QTL detected for photochemical traits accounted for the higher percentage of phenotypic variance. Functional annotation of genes within the QTL suggested 58 candidate genes for the analyzed traits. Allele association analysis in selected candidate genes showed three SNPs located in a MYB transcription factor that were significantly associated with efficiency of energy capture by open PSII reaction centers and specific leaf area. The integration of QTL mapping of functional traits, genome annotation and allele association yielded several candidate genes involved with molecular control of photosynthesis and water use efficiency in response to drought in a conifer species. The results obtained highlight the importance of maintaining the integrity of the photochemical machinery in P. pinaster drought response.

  8. Saltatory Evolution of the Ectodermal Neural Cortex Gene Family at the Vertebrate Origin

    Science.gov (United States)

    Feiner, Nathalie; Murakami, Yasunori; Breithut, Lisa; Mazan, Sylvie; Meyer, Axel; Kuraku, Shigehiro

    2013-01-01

    The ectodermal neural cortex (ENC) gene family, whose members are implicated in neurogenesis, is part of the kelch repeat superfamily. To date, ENC genes have been identified only in osteichthyans, although other kelch repeat-containing genes are prevalent throughout bilaterians. The lack of elaborate molecular phylogenetic analysis with exhaustive taxon sampling has obscured the possible link of the establishment of this gene family with vertebrate novelties. In this study, we identified ENC homologs in diverse vertebrates by means of database mining and polymerase chain reaction screens. Our analysis revealed that the ENC3 ortholog was lost in the basal eutherian lineage through single-gene deletion and that the triplication between ENC1, -2, and -3 occurred early in vertebrate evolution. Including our original data on the catshark and the zebrafish, our comparison revealed high conservation of the pleiotropic expression pattern of ENC1 and shuffling of expression domains between ENC1, -2, and -3. Compared with many other gene families including developmental key regulators, the ENC gene family is unique in that conventional molecular phylogenetic inference could identify no obvious invertebrate ortholog. This suggests a composite nature of the vertebrate-specific gene repertoire, consisting not only of de novo genes introduced at the vertebrate origin but also of long-standing genes with no apparent invertebrate orthologs. Some of the latter, including the ENC gene family, may be too rapidly evolving to provide sufficient phylogenetic signals marking orthology to their invertebrate counterparts. Such gene families that experienced saltatory evolution likely remain to be explored and might also have contributed to phenotypic evolution of vertebrates. PMID:23843192

  9. Comparative genomic mapping of the bovine Fragile Histidine Triad (FHIT tumour suppressor gene: characterization of a 2 Mb BAC contig covering the locus, complete annotation of the gene, analysis of cDNA and of physiological expression profiles

    Directory of Open Access Journals (Sweden)

    Boussaha Mekki

    2006-05-01

    Full Text Available Abstract Background The Fragile Histidine Triad gene (FHIT is an oncosuppressor implicated in many human cancers, including vesical tumors. FHIT is frequently hit by deletions caused by fragility at FRA3B, the most active of human common fragile sites, where FHIT lays. Vesical tumors affect also cattle, including animals grazing in the wild on bracken fern; compounds released by the fern are known to induce chromosome fragility and may trigger cancer with the interplay of latent Papilloma virus. Results The bovine FHIT was characterized by assembling a contig of 78 BACs. Sequence tags were designed on human exons and introns and used directly to select bovine BACs, or compared with sequence data in the bovine genome database or in the trace archive of the bovine genome sequencing project, and adapted before use. FHIT is split in ten exons like in man, with exons 5 to 9 coding for a 149 amino acids protein. VISTA global alignments between bovine genomic contigs retrieved from the bovine genome database and the human FHIT region were performed. Conservation was extremely high over a 2 Mb region spanning the whole FHIT locus, including the size of introns. Thus, the bovine FHIT covers about 1.6 Mb compared to 1.5 Mb in man. Expression was analyzed by RT-PCR and Northern blot, and was found to be ubiquitous. Four cDNA isoforms were isolated and sequenced, that originate from an alternative usage of three variants of exon 4, revealing a size very close to the major human FHIT cDNAs. Conclusion A comparative genomic approach allowed to assemble a contig of 78 BACs and to completely annotate a 1.6 Mb region spanning the bovine FHIT gene. The findings confirmed the very high level of conservation between human and bovine genomes and the importance of comparative mapping to speed the annotation process of the recently sequenced bovine genome. The detailed knowledge of the genomic FHIT region will allow to study the role of FHIT in bovine cancerogenesis

  10. Comparative genomic mapping of the bovine Fragile Histidine Triad (FHIT) tumour suppressor gene: characterization of a 2 Mb BAC contig covering the locus, complete annotation of the gene, analysis of cDNA and of physiological expression profiles.

    Science.gov (United States)

    Uboldi, Cristina; Guidi, Elena; Roperto, Sante; Russo, Valeria; Roperto, Franco; Di Meo, Giulia Pia; Iannuzzi, Leopoldo; Floriot, Sandrine; Boussaha, Mekki; Eggen, André; Ferretti, Luca

    2006-05-23

    The Fragile Histidine Triad gene (FHIT) is an oncosuppressor implicated in many human cancers, including vesical tumors. FHIT is frequently hit by deletions caused by fragility at FRA3B, the most active of human common fragile sites, where FHIT lays. Vesical tumors affect also cattle, including animals grazing in the wild on bracken fern; compounds released by the fern are known to induce chromosome fragility and may trigger cancer with the interplay of latent Papilloma virus. The bovine FHIT was characterized by assembling a contig of 78 BACs. Sequence tags were designed on human exons and introns and used directly to select bovine BACs, or compared with sequence data in the bovine genome database or in the trace archive of the bovine genome sequencing project, and adapted before use. FHIT is split in ten exons like in man, with exons 5 to 9 coding for a 149 amino acids protein. VISTA global alignments between bovine genomic contigs retrieved from the bovine genome database and the human FHIT region were performed. Conservation was extremely high over a 2 Mb region spanning the whole FHIT locus, including the size of introns. Thus, the bovine FHIT covers about 1.6 Mb compared to 1.5 Mb in man. Expression was analyzed by RT-PCR and Northern blot, and was found to be ubiquitous. Four cDNA isoforms were isolated and sequenced, that originate from an alternative usage of three variants of exon 4, revealing a size very close to the major human FHIT cDNAs. A comparative genomic approach allowed to assemble a contig of 78 BACs and to completely annotate a 1.6 Mb region spanning the bovine FHIT gene. The findings confirmed the very high level of conservation between human and bovine genomes and the importance of comparative mapping to speed the annotation process of the recently sequenced bovine genome. The detailed knowledge of the genomic FHIT region will allow to study the role of FHIT in bovine cancerogenesis, especially of vesical papillomavirus-associated cancers of

  11. Comparative genomic analysis reveals independent expansion of a lineage-specific gene family in vertebrates: The class II cytokine receptors and their ligands in mammals and fish

    Directory of Open Access Journals (Sweden)

    Mogensen Knud

    2003-07-01

    Full Text Available Abstract Background The high degree of sequence conservation between coding regions in fish and mammals can be exploited to identify genes in mammalian genomes by comparison with the sequence of similar genes in fish. Conversely, experimentally characterized mammalian genes may be used to annotate fish genomes. However, gene families that escape this principle include the rapidly diverging cytokines that regulate the immune system, and their receptors. A classic example is the class II helical cytokines (HCII including type I, type II and lambda interferons, IL10 related cytokines (IL10, IL19, IL20, IL22, IL24 and IL26 and their receptors (HCRII. Despite the report of a near complete pufferfish (Takifugu rubripes genome sequence, these genes remain undescribed in fish. Results We have used an original strategy based both on conserved amino acid sequence and gene structure to identify HCII and HCRII in the genome of another pufferfish, Tetraodon nigroviridis that is amenable to laboratory experiments. The 15 genes that were identified are highly divergent and include a single interferon molecule, three IL10 related cytokines and their potential receptors together with two Tissue Factor (TF. Some of these genes form tandem clusters on the Tetraodon genome. Their expression pattern was determined in different tissues. Most importantly, Tetraodon interferon was identified and we show that the recombinant protein can induce antiviral MX gene expression in Tetraodon primary kidney cells. Similar results were obtained in Zebrafish which has 7 MX genes. Conclusion We propose a scheme for the evolution of HCII and their receptors during the radiation of bony vertebrates and suggest that the diversification that played an important role in the fine-tuning of the ancestral mechanism for host defense against infections probably followed different pathways in amniotes and fish.

  12. Genome-wide characterization of Toll-like receptor gene family in common carp (Cyprinus carpio) and their involvement in host immune response to Aeromonas hydrophila infection.

    Science.gov (United States)

    Gong, Yiwen; Feng, Shuaisheng; Li, Shangqi; Zhang, Yan; Zhao, Zixia; Hu, Mou; Xu, Peng; Jiang, Yanliang

    2017-12-01

    The Toll-like receptor (TLR) gene family is a class of conserved pattern recognition receptors, which play an essential role in innate immunity providing efficient defense against invading microbial pathogens. Although TLRs have been extensively characterized in both invertebrates and vertebrates, a comprehensive analysis of TLRs in common carp is lacking. In the present study, we have conducted the first genome-wide systematic analysis of common carp (Cyprinus carpio) TLR genes. A set of 27 common carp TLR genes were identified and characterized. Sequence similarity analysis, functional domain prediction and phylogenetic analysis supported their annotation and orthologies. By examining the gene copy number of TLR genes across several vertebrates, gene duplications and losses were observed. The expression patterns of TLR genes were examined during early developmental stages and in various healthy tissues, and the results showed that TLR genes were ubiquitously expressed, indicating a likely role in maintaining homeostasis. Moreover, the differential expression of TLRs was examined after Aeromons hydrophila infection, and showed that most TLR genes were induced, with diverse patterns. TLR1, TLR4-2, TLR4-3, TLR22-2, TLR22-3 were significantly up-regulated at minimum one timepoint, whereas TLR2-1, TLR4-1, TLR7-1 and TLR7-2 were significantly down-regulated. Our results suggested that TLR genes play critical roles in the common carp immune response. Collectively, our findings provide fundamental genomic resources for future studies on fish disease management and disease-resistance selective breeding strategy development. Copyright © 2017 Elsevier Inc. All rights reserved.

  13. A shared promoter region suggests a common ancestor for the human VCX/Y, SPANX, and CSAG gene families and the murine CYPT family

    DEFF Research Database (Denmark)

    Hansen, Martin A; Nielsen, John E; Retelska, Dorota

    2008-01-01

    , sequences corresponding to the shared promoter region of the CYPT family were identified at 39 loci. Most loci were located immediately upstream of genes belonging to the VCX/Y, SPANX, or CSAG gene families. Sequence comparison of the loci revealed a conserved CYPT promoter-like (CPL) element featuring TATA...... cell types. The genomic regions harboring the gene families were rich in direct and inverted segmental duplications (SD), which may facilitate gene conversion and rapid evolution. The conserved CPL and the common expression profiles suggest that the human VCX/Y, SPANX, and CSAG2 gene families together......Many testis-specific genes from the sex chromosomes are subject to rapid evolution, which can make it difficult to identify murine genes in the human genome. The murine CYPT gene family includes 15 members, but orthologs were undetectable in the human genome. However, using refined homology search...

  14. A Patient With Desmoid Tumors and Familial FAP Having Frame Shift Mutation of the APC Gene

    Directory of Open Access Journals (Sweden)

    Sanambar Sadighi

    2017-02-01

    Full Text Available Desmoids tumors, characterized by monoclonal proliferation of myofibroblasts, could occur in 5-10% of patients with familial adenomatous polyposis (FAP as an extra-colonic manifestation of the disease. FAP can develop when there is a germ-line mutation in the adenomatous polyposis coli gene. Although mild or attenuated FAP may follow mutations in 5΄ extreme of the gene, it is more likely that 3΄ extreme mutations haveamore severe manifestation of thedisease. A 28-year-old woman was admitted to the Cancer Institute of Iran with an abdominal painful mass. She had strong family history of FAP and underwent prophylactic total colectomy. Pre-operative CT scans revealed a large mass. Microscopic observation showed diffuse fibroblast cell infiltration of the adjacent tissue structures. Peripheral blood DNA extraction followed by adenomatous polyposis coli gene exon by exon sequencing was performed to investigate the mutation in adenomatous polyposis coli gene. Analysis of DNA sequencing demonstrated a mutation of 4 bpdeletions at codon 1309-1310 of the exon 16 of adenomatous polyposis coli gene sequence which was repeated in 3 members of the family. Some of them had desmoid tumor without classical FAP history. Even when there is no familial history of adenomatous polyposis, the adenomatous polyposis coli gene mutation should be investigated in cases of familial desmoids tumors for a suitable prevention. The 3΄ extreme of the adenomatous polyposis coli gene is still the best likely location in such families.

  15. Repeat-associated plasticity in the Helicobacter pylori RD gene family.

    Science.gov (United States)

    Shak, Joshua R; Dick, Jonathan J; Meinersmann, Richard J; Perez-Perez, Guillermo I; Blaser, Martin J

    2009-11-01

    The bacterium Helicobacter pylori is remarkable for its ability to persist in the human stomach for decades without provoking sterilizing immunity. Since repetitive DNA can facilitate adaptive genomic flexibility via increased recombination, insertion, and deletion, we searched the genomes of two H. pylori strains for nucleotide repeats. We discovered a family of genes with extensive repetitive DNA that we have termed the H. pylori RD gene family. Each gene of this family is composed of a conserved 3' region, a variable mid-region encoding 7 and 11 amino acid repeats, and a 5' region containing one of two possible alleles. Analysis of five complete genome sequences and PCR genotyping of 42 H. pylori strains revealed extensive variation between strains in the number, location, and arrangement of RD genes. Furthermore, examination of multiple strains isolated from a single subject's stomach revealed intrahost variation in repeat number and composition. Despite prior evidence that the protein products of this gene family are expressed at the bacterial cell surface, enzyme-linked immunosorbent assay and immunoblot studies revealed no consistent seroreactivity to a recombinant RD protein by H. pylori-positive hosts. The pattern of repeats uncovered in the RD gene family appears to reflect slipped-strand mispairing or domain duplication, allowing for redundancy and subsequent diversity in genotype and phenotype. This novel family of hypervariable genes with conserved, repetitive, and allelic domains may represent an important locus for understanding H. pylori persistence in its natural host.

  16. Ancient signals: comparative genomics of plant MAPK and MAPKK gene families

    DEFF Research Database (Denmark)

    Hamel, Louis-Philippe; Nicole, Marie-Claude; Sritubtim, Somrudee

    2006-01-01

    MAPK signal transduction modules play crucial roles in regulating many biological processes in plants, and their components are encoded by highly conserved genes. The recent availability of genome sequences for rice and poplar now makes it possible to examine how well the previously described...... Arabidopsis MAPK and MAPKK gene family structures represent the broader evolutionary situation in plants, and analysis of gene expression data for MPK and MKK genes in all three species allows further refinement of those families, based on functionality. The Arabidopsis MAPK nomenclature appears sufficiently...

  17. ACID: annotation of cassette and integron data

    Directory of Open Access Journals (Sweden)

    Stokes Harold W

    2009-04-01

    Full Text Available Abstract Background Although integrons and their associated gene cassettes are present in ~10% of bacteria and can represent up to 3% of the genome in which they are found, very few have been properly identified and annotated in public databases. These genetic elements have been overlooked in comparison to other vectors that facilitate lateral gene transfer between microorganisms. Description By automating the identification of integron integrase genes and of the non-coding cassette-associated attC recombination sites, we were able to assemble a database containing all publicly available sequence information regarding these genetic elements. Specialists manually curated the database and this information was used to improve the automated detection and annotation of integrons and their encoded gene cassettes. ACID (annotation of cassette and integron data can be searched using a range of queries and the data can be downloaded in a number of formats. Users can readily annotate their own data and integrate it into ACID using the tools provided. Conclusion ACID is a community resource providing easy access to annotations of integrons and making tools available to detect them in novel sequence data. ACID also hosts a forum to prompt integron-related discussion, which can hopefully lead to a more universal definition of this genetic element.

  18. Correction of the Caulobacter crescentus NA1000 genome annotation.

    Directory of Open Access Journals (Sweden)

    Bert Ely

    Full Text Available Bacterial genome annotations are accumulating rapidly in the GenBank database and the use of automated annotation technologies to create these annotations has become the norm. However, these automated methods commonly result in a small, but significant percentage of genome annotation errors. To improve accuracy and reliability, we analyzed the Caulobacter crescentus NA1000 genome utilizing computer programs Artemis and MICheck to manually examine the third codon position GC content, alignment to a third codon position GC frame plot peak, and matches in the GenBank database. We identified 11 new genes, modified the start site of 113 genes, and changed the reading frame of 38 genes that had been incorrectly annotated. Furthermore, our manual method of identifying protein-coding genes allowed us to remove 112 non-coding regions that had been designated as coding regions. The improved NA1000 genome annotation resulted in a reduction in the use of rare codons since noncoding regions with atypical codon usage were removed from the annotation and 49 new coding regions were added to the annotation. Thus, a more accurate codon usage table was generated as well. These results demonstrate that a comparison of the location of peaks third codon position GC content to the location of protein coding regions could be used to verify the annotation of any genome that has a GC content that is greater than 60%.

  19. The liver transcriptome of suckermouth armoured catfish (Pterygoplichthys anisitsi, Loricariidae): Identification of expansions in defensome gene families

    International Nuclear Information System (INIS)

    Parente, Thiago E.; Moreira, Daniel A.; Magalhães, Maithê G.P.; Andrade, Paula C.C. de; Furtado, Carolina; Haas, Brian J.; Stegeman, John J.; Hahn, Mark E.

    2017-01-01

    Pterygoplichthys is a genus of related suckermouth armoured catfishes native to South America, which have invaded tropical and subtropical regions worldwide. Physiological features, including an augmented resistance to organic xenobiotics, may have aided their settlement in foreign habitats. The liver transcriptome of Pterygoplichthys anisitsi was sequenced and used to characterize the diversity of mRNAs potentially involved in the responses to natural and anthropogenic chemicals. In total, 66,642 transcripts were assembled. Among the identified defensome genes, cytochromes P450 (CYP) were the most abundant, followed by sulfotransferases (SULT), nuclear receptors (NR) and ATP binding cassette transporters (ABC). A novel expansion in the CYP2Y subfamily was identified, as well as an independent expansion of the CYP2AAs. Two expansions were also observed among SULT1. Thirty-two transcripts were classified into twelve subfamilies of NR, while 21 encoded ABC transporters. The diversity of defensome transcripts sequenced herein could contribute to this species' resistance to organic xenobiotics. - Highlights: • Resistance of P. anisitsi to organic pollutants (e.g. Diesel oil) is elevated. • > 60 thousand transcripts were assembled at high sequencing depth. • > 20 thousand transcripts were annotated using UniProt and GO databases. • Complete coding sequence for 183 defensome transcripts were identified. • Most abundant defensome families were P450 followed by SULT, NR and ABC transporters.

  20. Ultra Large Gene Families: A Matter of Adaptation or Genomic Parasites?

    Directory of Open Access Journals (Sweden)

    Philipp H. Schiffer

    2016-08-01

    Full Text Available Gene duplication is an important mechanism of molecular evolution. It offers a fast track to modification, diversification, redundancy or rescue of gene function. However, duplication may also be neutral or (slightly deleterious, and often ends in pseudo-geneisation. Here, we investigate the phylogenetic distribution of ultra large gene families on long and short evolutionary time scales. In particular, we focus on a family of NACHT-domain and leucine-rich-repeat-containing (NLR-genes, which we previously found in large numbers to occupy one chromosome arm of the zebrafish genome. We were interested to see whether such a tight clustering is characteristic for ultra large gene families. Our data reconfirm that most gene family inflations are lineage-specific, but we can only identify very few gene clusters. Based on our observations we hypothesise that, beyond a certain size threshold, ultra large gene families continue to proliferate in a mechanism we term “run-away evolution”. This process might ultimately lead to the failure of genomic integrity and drive species to extinction.

  1. Evolution of the defensin-like gene family in grass genomes

    Indian Academy of Sciences (India)

    that the DEFL gene family is subjected to purifying selection. However, sliding window analysis .... sorghum from DOE-JGI Community Sequencing Program ..... This work was supported by the National Key Technologies Re- search and ...

  2. Complexity of rice Hsp100 gene family: lessons from rice genome ...

    Indian Academy of Sciences (India)

    Madhu Sudhan

    2007-03-29

    Mar 29, 2007 ... Chaperonins are a class of molecular chaperones found in prokaryotes and in the ... Keywords. Chaperone, gene family, Hsp100, Oryza sativa ..... Sculpting the proteome with AAA+ proteases and disassembly machines; Cell ...

  3. Exclusion of known gene for enamel development in two Brazilian families with amelogenesis imperfecta.

    Science.gov (United States)

    Santos, Maria C L G; Hart, P Suzanne; Ramaswami, Mukundhan; Kanno, Cláudia M; Hart, Thomas C; Line, Sergio R P

    2007-01-31

    Amelogenesis imperfecta (AI) is a genetically heterogeneous group of diseases that result in defective development of tooth enamel. Mutations in several enamel proteins and proteinases have been associated with AI. The object of this study was to evaluate evidence of etiology for the six major candidate gene loci in two Brazilian families with AI. Genomic DNA was obtained from family members and all exons and exon-intron boundaries of the ENAM, AMBN, AMELX, MMP20, KLK4 and Amelotin gene were amplified and sequenced. Each family was also evaluated for linkage to chromosome regions known to contain genes important in enamel development. The present study indicates that the AI in these two families is not caused by any of the known loci for AI or any of the major candidate genes proposed in the literature. These findings indicate extensive genetic heterogeneity for non-syndromic AI.

  4. A family with X-linked anophthalmia: exclusion of SOX3 as a candidate gene.

    Science.gov (United States)

    Slavotinek, Anne; Lee, Stephen S; Hamilton, Steven P

    2005-10-01

    We report on a four-generation family with X-linked anophthalmia in four affected males and show that this family has LOD scores consistent with linkage to Xq27, the third family reported to be linked to the ANOP1 locus. We sequenced the SOX3 gene at Xq27 as a candidate gene for the X-linked anophthalmia based on the high homology of this gene to SOX2, a gene previously mutated in bilateral anophthlamia. However, no amino acid sequence alterations were identified in SOX3. We have improved the definition of the phenotype in males with anophthalmia linked to the ANOP1 locus, as microcephaly, ocular colobomas, and severe renal malformations have not been described in families linked to ANOP1. (c) 2005 Wiley-Liss, Inc.

  5. "It's good to know": experiences of gene identification and result disclosure in familial epilepsies.

    Science.gov (United States)

    Vears, Danya F; Dunn, Karen L; Wake, Samantha A; Scheffer, Ingrid E

    2015-05-01

    Recognition of the role of genetics in the epilepsies has increased dramatically, impacting on clinical practice across many epilepsy syndromes. There is limited research investigating the impact of gene identification on individuals and families with epilepsy. While research has focused on the impact of delivering genetic information to families at the time of diagnosis in genetic diseases more broadly, little is known about how genetic results in epileptic diseases influences people's lives many years after it has been conveyed. This study used qualitative methods to explore the experience of receiving a genetic result in people with familial epilepsy. Interviews were conducted with individuals with familial epilepsies in whom the underlying genetic mutation had been identified. Recorded interviews underwent thematic analysis. 20 individuals from three families with different epilepsy syndromes and causative genes were interviewed. Multiple generations within families were studied. The mean time from receiving the genetic result prior to interview was 10.9 years (range 5-14 years). Three major themes were identified: 1) living with epilepsy: an individual's experience of the severity of epilepsy in their family influenced their view. 2) Clinical utility of the test: participants expressed varying reactions to receiving a genetic result. While for some it provided helpful information and relief, others were not surprised by the finding given the familial context. Some valued the use of genetic information for reproductive decision-making, particularly in the setting of severely affected family members. While altruistic reasons for participating in genetic research were discussed, participants emphasised the benefit of participation to them and their families. 3) 'Talking about the family genes': individuals reported poor communication between family members about their epilepsy and its genetic implications. The results provide important insights into the family

  6. Gene structure, phylogeny and expression profile of the sucrose synthase gene family in cacao (Theobroma cacao L.).

    Science.gov (United States)

    Li, Fupeng; Hao, Chaoyun; Yan, Lin; Wu, Baoduo; Qin, Xiaowei; Lai, Jianxiong; Song, Yinghui

    2015-09-01

    In higher plants, sucrose synthase (Sus, EC 2.4.1.13) is widely considered as a key enzyme involved in sucrose metabolism. Although, several paralogous genes encoding different isozymes of Sus have been identified and characterized in multiple plant genomes, to date detailed information about the Sus genes is lacking for cacao. This study reports the identification of six novel Sus genes from economically important cacao tree. Analyses of the gene structure and phylogeny of the Sus genes demonstrated evolutionary conservation in the Sus family across cacao and other plant species. The expression of cacao Sus genes was investigated via real-time PCR in various tissues, different developmental phases of leaf, flower bud and pod. The Sus genes exhibited distinct but partially redundant expression profiles in cacao, with TcSus1, TcSus5 and TcSus6, being the predominant genes in the bark with phloem, TcSus2 predominantly expressing in the seed during the stereotype stage. TcSus3 and TcSus4 were significantly detected more in the pod husk and seed coat along the pod development, and showed development dependent expression profiles in the cacao pod. These results provide new insights into the evolution, and basic information that will assist in elucidating the functions of cacao Sus gene family.

  7. Gene-Environment Interplay, Family Relationships, and Child Adjustment

    Science.gov (United States)

    Horwitz, Briana N.; Neiderhiser, Jenae M.

    2011-01-01

    This paper reviews behavioral genetic research from the past decade that has moved beyond simply studying the independent influences of genes and environments. The studies considered in this review have instead focused on understanding gene-environment interplay, including genotype-environment correlation (rGE) and genotype x environment…

  8. Identification and analysis of YELLOW protein family genes in the silkworm, Bombyx mori

    Directory of Open Access Journals (Sweden)

    Yi Yong-Zhu

    2006-08-01

    Full Text Available Abstract Background The major royal jelly proteins/yellow (MRJP/YELLOW family possesses several physiological and chemical functions in the development of Apis mellifera and Drosophila melanogaster. Each protein of the family has a conserved domain named MRJP. However, there is no report of MRJP/YELLOW family proteins in the Lepidoptera. Results Using the YELLOW protein sequence in Drosophila melanogaster to BLAST silkworm EST database, we found a gene family composed of seven members with a conserved MRJP domain each and named it YELLOW protein family of Bombyx mori. We completed the cDNA sequences with RACE method. The protein of each member possesses a MRJP domain and a putative cleavable signal peptide consisting of a hydrophobic sequence. In view of genetic evolution, the whole Bm YELLOW protein family composes a monophyletic group, which is distinctly separate from Drosophila melanogaster and Apis mellifera. We then showed the tissue expression profiles of Bm YELLOW protein family genes by RT-PCR. Conclusion A Bombyx mori YELLOW protein family is found to be composed of at least seven members. The low homogeneity and unique pattern of gene expression by each member among the family ensure us to prophesy that the members of Bm YELLOW protein family would play some important physiological functions in silkworm development.

  9. X-exome sequencing of 405 unresolved families identifies seven novel intellectual disability genes

    OpenAIRE

    Hu, H.; Haas, S.A.; Chelly, J.; Van Esch, H.; Raynaud, M.; de Brouwer, A.P.M.; Weinert, S.; Froyen, G.; Frints, S.G.M.; Laumonnier, F.; Zemojtel, T.; Love, M.I.; Richard, H.; Emde, A.K.; Bienek, M.

    2016-01-01

    X-linked intellectual disability (XLID) is a clinically and genetically heterogeneous disorder. During the past two decades in excess of 100 X-chromosome ID genes have been identified. Yet, a large number of families mapping to the X-chromosome remained unresolved suggesting that more XLID genes or loci are yet to be identified. Here, we have investigated 405 unresolved families with XLID. We employed massively parallel sequencing of all X-chromosome exons in the index males. The majority of ...

  10. Distinct Gene Expression Signatures in Lynch Syndrome and Familial Colorectal Cancer Type X

    DEFF Research Database (Denmark)

    Valentin, Mev; Therkildsen, Christina; Veerla, Srinivas

    2013-01-01

    Heredity is estimated to cause at least 20% of colorectal cancer. The hereditary nonpolyposis colorectal cancer subset is divided into Lynch syndrome and familial colorectal cancer type X (FCCTX) based on presence of mismatch repair (MMR) gene defects.......Heredity is estimated to cause at least 20% of colorectal cancer. The hereditary nonpolyposis colorectal cancer subset is divided into Lynch syndrome and familial colorectal cancer type X (FCCTX) based on presence of mismatch repair (MMR) gene defects....

  11. Novel genetic variants in miR-191 gene and familial ovarian cancer

    International Nuclear Information System (INIS)

    Shen, Jie; DiCioccio, Richard; Odunsi, Kunle; Lele, Shashikant B; Zhao, Hua

    2010-01-01

    Half of the familial aggregation of ovarian cancer can't be explained by any known risk genes, suggesting the existence of other genetic risk factors. Some of these unknown factors may not be traditional protein encoding genes. MicroRNA (miRNA) plays a critical role in tumorigenesis, but it is still unknown if variants in miRNA genes lead to predisposition to cancer. Considering the fact that miRNA regulates a number of tumor suppressor genes (TSGs) and oncogenes, genetic variations in miRNA genes could affect the levels of expression of TSGs or oncogenes and, thereby, cancer risk. To test this hypothesis in familial ovarian cancer, we screened for genetic variants in thirty selected miRNA genes, which are predicted to regulate key ovarian cancer genes and are reported to be misexpressed in ovarian tumor tissues, in eighty-three patients with familial ovarian cancer. All of the patients are non-carriers of any known BRCA1/2 or mismatch repair (MMR) gene mutations. Seven novel genetic variants were observed in four primary or precursor miRNA genes. Among them, three rare variants were found in the precursor or primary precursor of the miR-191 gene. In functional assays, the one variant located in the precursor of miR-191 resulted in conformational changes in the predicted secondary structures, and consequently altered the expression of mature miR-191. In further analysis, we found that this particular variant exists in five family members who had ovarian cancer. Our findings suggest that there are novel genetic variants in miRNA genes, and those certain genetic variants in miRNA genes can affect the expression of mature miRNAs and, consequently, might alter the regulation of TSGs or oncogenes. Additionally, the variant might be potentially associated with the development of familial ovarian cancer

  12. a photoreceptor gene mutation in an indigenous black african family

    African Journals Online (AJOL)

    MUTATION IN AN INDIGENOUS. BLACK AFRICAN FAMILY WITH. RETINITIS PIGMENTOSA. IDENTIFIED USING A RAPID. SCREENING APPROACH FOR. COMMON RHODOPSIN. MUTATIONS. JGreenberg, T Franz, R Goliath, R Ramesar. Hereditary retinal degenerations may be subdivided into those affecting ...

  13. Evaluation of three automated genome annotations for Halorhabdus utahensis.

    Directory of Open Access Journals (Sweden)

    Peter Bakke

    2009-07-01

    Full Text Available Genome annotations are accumulating rapidly and depend heavily on automated annotation systems. Many genome centers offer annotation systems but no one has compared their output in a systematic way to determine accuracy and inherent errors. Errors in the annotations are routinely deposited in databases such as NCBI and used to validate subsequent annotation errors. We submitted the genome sequence of halophilic archaeon Halorhabdus utahensis to be analyzed by three genome annotation services. We have examined the output from each service in a variety of ways in order to compare the methodology and effectiveness of the annotations, as well as to explore the genes, pathways, and physiology of the previously unannotated genome. The annotation services differ considerably in gene calls, features, and ease of use. We had to manually identify the origin of replication and the species-specific consensus ribosome-binding site. Additionally, we conducted laboratory experiments to test H. utahensis growth and enzyme activity. Current annotation practices need to improve in order to more accurately reflect a genome's biological potential. We make specific recommendations that could improve the quality of microbial annotation projects.

  14. [Analysis of the NDP gene in a Chinese family with X-linked recessive Norrie disease].

    Science.gov (United States)

    Mei, Libin; Huang, Yanru; Pan, Qian; Liang, Desheng; Wu, Lingqian

    2015-05-01

    The purpose of the current research was to investigate the NDP (Norrie disease protein) gene in one Chinese family with Norrie disease (ND) and to characterize the related clinical features. Clinical data of the proband and his family members were collected. Complete ophthalmic examinations were carried out on the proband. Genomic DNA was extracted from peripheral blood leukocytes of 35 family members. Molecular analysis of the NDP gene was performed by polymerase chain reaction and direct sequencing of all exons and flanking regions. A hemizygous NDP missense mutation c.362G > A (p.Arg121Gln) in exon 3 was identified in the affected members, but not in any of the unaffected family individuals. The missense mutation c.362G > A in NDP is responsible for the Norrie disease in this family. This discovery will help provide the family members with accurate and reliable genetic counseling and prenatal diagnosis.

  15. Dictionary-driven protein annotation.

    Science.gov (United States)

    Rigoutsos, Isidore; Huynh, Tien; Floratos, Aris; Parida, Laxmi; Platt, Daniel

    2002-09-01

    Computational methods seeking to automatically determine the properties (functional, structural, physicochemical, etc.) of a protein directly from the sequence have long been the focus of numerous research groups. With the advent of advanced sequencing methods and systems, the number of amino acid sequences that are being deposited in the public databases has been increasing steadily. This has in turn generated a renewed demand for automated approaches that can annotate individual sequences and complete genomes quickly, exhaustively and objectively. In this paper, we present one such approach that is centered around and exploits the Bio-Dictionary, a collection of amino acid patterns that completely covers the natural sequence space and can capture functional and structural signals that have been reused during evolution, within and across protein families. Our annotation approach also makes use of a weighted, position-specific scoring scheme that is unaffected by the over-representation of well-conserved proteins and protein fragments in the databases used. For a given query sequence, the method permits one to determine, in a single pass, the following: local and global similarities between the query and any protein already present in a public database; the likeness of the query to all available archaeal/ bacterial/eukaryotic/viral sequences in the database as a function of amino acid position within the query; the character of secondary structure of the query as a function of amino acid position within the query; the cytoplasmic, transmembrane or extracellular behavior of the query; the nature and position of binding domains, active sites, post-translationally modified sites, signal peptides, etc. In terms of performance, the proposed method is exhaustive, objective and allows for the rapid annotation of individual sequences and full genomes. Annotation examples are presented and discussed in Results, including individual queries and complete genomes that were

  16. Identification of the WRKY gene family and functional analysis of two genes in Caragana intermedia.

    Science.gov (United States)

    Wan, Yongqing; Mao, Mingzhu; Wan, Dongli; Yang, Qi; Yang, Feiyun; Mandlaa; Li, Guojing; Wang, Ruigang

    2018-02-09

    WRKY transcription factors, one of the largest families of transcriptional regulators in plants, play important roles in plant development and various stress responses. The WRKYs of Caragana intermedia are still not well characterized, although many WRKYs have been identified in various plant species. We identified 53 CiWRKY genes from C. intermedia transcriptome data, 28 of which exhibited complete open reading frames (ORFs). These CiWRKYs were divided into three groups via phylogenetic analysis according to their WRKY domains and zinc finger motifs. Conserved domain analysis showed that the CiWRKY proteins contain a highly conserved WRKYGQK motif and two variant motifs (WRKYGKK and WKKYEEK). The subcellular localization of CiWRKY26 and CiWRKY28-1 indicated that these two proteins localized exclusively to nuclei, supporting their role as transcription factors. The expression patterns of the 28 CiWRKYs with complete ORFs were examined through quantitative real-time PCR (qRT-PCR) in various tissues and under different abiotic stresses (drought, cold, salt, high-pH and abscisic acid (ABA)). The results showed that each CiWRKY responded to at least one stress treatment. Furthermore, overexpression of CiWRKY75-1 and CiWRKY40-4 in Arabidopsis thaliana suppressed the drought stress tolerance of the plants and delayed leaf senescence, respectively. Fifty-three CiWRKY genes from the C. intermedia transcriptome were identified and divided into three groups via phylogenetic analysis. The expression patterns of the 28 CiWRKYs under different abiotic stresses suggested that each CiWRKY responded to at least one stress treatment. Overexpression of CiWRKY75-1 and CiWRKY40-4 suppressed the drought stress tolerance of Arabidopsis and delayed leaf senescence, respectively. These results provide a basis for the molecular mechanism through which CiWRKYs mediate stress tolerance.

  17. Identification of a novel FBN1 gene mutation in a large Pakistani family with Marfan syndrome

    NARCIS (Netherlands)

    Micheal, S.; Khan, M.I.; Akhtar, F.; Weiss, M.M.; Islam, F.; Ali, M.; Qamar, R.; Maugeri, A.; Hollander, A.I. den

    2012-01-01

    PURPOSE: To describe a novel mutation in the fibrillin-1 (FBN1) gene in a large Pakistani family with autosomal dominant Marfan syndrome (MFS). METHODS: Blood samples were collected of 11 family members affected with Marfan syndrome, and DNA was isolated by phenol-extraction. The coding exons of

  18. The SOD gene family in tomato: identification, phylogenetic relationships and expression patterns

    Directory of Open Access Journals (Sweden)

    kun feng

    2016-08-01

    Full Text Available Superoxide dismutases (SODs are critical antioxidant enzymes that protect organisms from reactive oxygen species (ROS caused by adverse conditions, and have been widely found in the cytoplasm, chloroplasts, and mitochondria of eukaryotic and prokaryotic cells. Tomato (Solanum lycopersicum L. is an important economic crop and is cultivated worldwide. However, abiotic and biotic stresses severely hinder growth and development of the plant, which affects the production and quality of the crop. To reveal the potential roles of SOD genes under various stresses, we performed a systematic analysis of the tomato SOD gene family and analyzed the expression patterns of SlSOD genes in response to abiotic stresses at the whole-genome level. The characteristics of the SlSOD gene family were determined by analyzing gene structure, conserved motifs, chromosomal distribution, phylogenetic relationships, and expression patterns. We determined that there are at least nine SOD genes in tomato, including four Cu/ZnSODs, three FeSODs, and one MnSOD, and they are unevenly distributed on 12 chromosomes. Phylogenetic analyses of SOD genes from tomato and other plant species were separated into two groups with a high bootstrap value, indicating that these SOD genes were present before the monocot-dicot split. Additionally, many cis-elements that respond to different stresses were found in the promoters of nine SlSOD genes. Gene expression analysis based on RNA-seq data showed that most genes were expressed in all tested tissues, with the exception of SlSOD6 and SlSOD8, which were only expressed in young fruits. Microarray data analysis showed that most members of the SlSOD gene family were altered under salt- and drought-stress conditions. This genome-wide analysis of SlSOD genes helps to clarify the function of SlSOD genes under different stress conditions and provides information to aid in further understanding the evolutionary relationships of SOD genes in plants.

  19. Genome-Wide Identification and Analysis of the TIFY Gene Family in Grape

    Science.gov (United States)

    Zhang, Yucheng; Gao, Min; Singer, Stacy D.; Fei, Zhangjun; Wang, Hua; Wang, Xiping

    2012-01-01

    Background The TIFY gene family constitutes a plant-specific group of genes with a broad range of functions. This family encodes four subfamilies of proteins, including ZML, TIFY, PPD and JASMONATE ZIM-Domain (JAZ) proteins. JAZ proteins are targets of the SCFCOI1 complex, and function as negative regulators in the JA signaling pathway. Recently, it has been reported in both Arabidopsis and rice that TIFY genes, and especially JAZ genes, may be involved in plant defense against insect feeding, wounding, pathogens and abiotic stresses. Nonetheless, knowledge concerning the specific expression patterns and evolutionary history of plant TIFY family members is limited, especially in a woody species such as grape. Methodology/Principal Findings A total of two TIFY, four ZML, two PPD and 11 JAZ genes were identified in the Vitis vinifera genome. Phylogenetic analysis of TIFY protein sequences from grape, Arabidopsis and rice indicated that the grape TIFY proteins are more closely related to those of Arabidopsis than those of rice. Both segmental and tandem duplication events have been major contributors to the expansion of the grape TIFY family. In addition, synteny analysis between grape and Arabidopsis demonstrated that homologues of several grape TIFY genes were found in the corresponding syntenic blocks of Arabidopsis, suggesting that these genes arose before the divergence of lineages that led to grape and Arabidopsis. Analyses of microarray and quantitative real-time RT-PCR expression data revealed that grape TIFY genes are not a major player in the defense against biotrophic pathogens or viruses. However, many of these genes were responsive to JA and ABA, but not SA or ET. Conclusion The genome-wide identification, evolutionary and expression analyses of grape TIFY genes should facilitate further research of this gene family and provide new insights regarding their evolutionary history and regulatory control. PMID:22984514

  20. Natural killer cell receptor genes in the family Equidae: not only Ly49.

    Directory of Open Access Journals (Sweden)

    Jan Futas

    Full Text Available Natural killer (NK cells have important functions in immunity. NK recognition in mammals can be mediated through killer cell immunoglobulin-like receptors (KIR and/or killer cell lectin-like Ly49 receptors. Genes encoding highly variable NK cell receptors (NKR represent rapidly evolving genomic regions. No single conservative model of NKR genes was observed in mammals. Single-copy low polymorphic NKR genes present in one mammalian species may expand into highly polymorphic multigene families in other species. In contrast to other non-rodent mammals, multiple Ly49-like genes appear to exist in the horse, while no functional KIR genes were observed in this species. In this study, Ly49 and KIR were sought and their evolution was characterized in the entire family Equidae. Genomic sequences retrieved showed the presence of at least five highly conserved polymorphic Ly49 genes in horses, asses and zebras. These findings confirmed that the expansion of Ly49 occurred in the entire family. Several KIR-like sequences were also identified in the genome of Equids. Besides a previously identified non-functional KIR-Immunoglobulin-like transcript fusion gene (KIR-ILTA and two putative pseudogenes, a KIR3DL-like sequence was analyzed. In contrast to previous observations made in the horse, the KIR3DL sequence, genomic organization and mRNA expression suggest that all Equids might produce a functional KIR receptor protein molecule with a single non-mutated immune tyrosine-based inhibition motif (ITIM domain. No evidence for positive selection in the KIR3DL gene was found. Phylogenetic analysis including rhinoceros and tapir genomic DNA and deduced amino acid KIR-related sequences showed differences between families and even between species within the order Perissodactyla. The results suggest that the order Perissodactyla and its family Equidae with expanded Ly49 genes and with a potentially functional KIR gene may represent an interesting model for

  1. Natural Killer Cell Receptor Genes in the Family Equidae: Not only Ly49

    Science.gov (United States)

    Futas, Jan; Horin, Petr

    2013-01-01

    Natural killer (NK) cells have important functions in immunity. NK recognition in mammals can be mediated through killer cell immunoglobulin-like receptors (KIR) and/or killer cell lectin-like Ly49 receptors. Genes encoding highly variable NK cell receptors (NKR) represent rapidly evolving genomic regions. No single conservative model of NKR genes was observed in mammals. Single-copy low polymorphic NKR genes present in one mammalian species may expand into highly polymorphic multigene families in other species. In contrast to other non-rodent mammals, multiple Ly49-like genes appear to exist in the horse, while no functional KIR genes were observed in this species. In this study, Ly49 and KIR were sought and their evolution was characterized in the entire family Equidae. Genomic sequences retrieved showed the presence of at least five highly conserved polymorphic Ly49 genes in horses, asses and zebras. These findings confirmed that the expansion of Ly49 occurred in the entire family. Several KIR-like sequences were also identified in the genome of Equids. Besides a previously identified non-functional KIR-Immunoglobulin-like transcript fusion gene (KIR-ILTA) and two putative pseudogenes, a KIR3DL-like sequence was analyzed. In contrast to previous observations made in the horse, the KIR3DL sequence, genomic organization and mRNA expression suggest that all Equids might produce a functional KIR receptor protein molecule with a single non-mutated immune tyrosine-based inhibition motif (ITIM) domain. No evidence for positive selection in the KIR3DL gene was found. Phylogenetic analysis including rhinoceros and tapir genomic DNA and deduced amino acid KIR-related sequences showed differences between families and even between species within the order Perissodactyla. The results suggest that the order Perissodactyla and its family Equidae with expanded Ly49 genes and with a potentially functional KIR gene may represent an interesting model for evolutionary biology of

  2. carboxylate synthase gene family in Arabidopsis, rice, grapevine

    African Journals Online (AJOL)

    Yomi

    2012-01-16

    Jan 16, 2012 ... evolutionary relationships of ACS genes in the four plant species. Chromosomal .... classification was consistent with the report from. Jakubowicz et al. ..... Analysis of the genome sequence of the flowering plant Arabidopsis ...

  3. Analysis and comparison of very large metagenomes with fast clustering and functional annotation

    Directory of Open Access Journals (Sweden)

    Li Weizhong

    2009-10-01

    Full Text Available Abstract Background The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely time-consuming, and there are often many novel sequences in these metagenomes that are not fully utilized. The number of available metagenomes is rapidly increasing, so fast and efficient metagenome comparison methods are in great demand. Results The new metagenomic data analysis method Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (RAMMCAP was developed using an ultra-fast sequence clustering algorithm, fast protein family annotation tools, and a novel statistical metagenome comparison method that employs a unique graphic interface. RAMMCAP processes extremely large datasets with only moderate computational effort. It identifies raw read clusters and protein clusters that may include novel gene families, and compares metagenomes using clusters or functional annotations calculated by RAMMCAP. In this study, RAMMCAP was applied to the two largest available metagenomic collections, the "Global Ocean Sampling" and the "Metagenomic Profiling of Nine Biomes". Conclusion RAMMCAP is a very fast method that can cluster and annotate one million metagenomic reads in only hundreds of CPU hours. It is available from http://tools.camera.calit2.net/camera/rammcap/.

  4. Identification of pathogenic gene variants in small families with intellectually disabled siblings by exome sequencing.

    Science.gov (United States)

    Schuurs-Hoeijmakers, Janneke H M; Vulto-van Silfhout, Anneke T; Vissers, Lisenka E L M; van de Vondervoort, Ilse I G M; van Bon, Bregje W M; de Ligt, Joep; Gilissen, Christian; Hehir-Kwa, Jayne Y; Neveling, Kornelia; del Rosario, Marisol; Hira, Gausiya; Reitano, Santina; Vitello, Aurelio; Failla, Pinella; Greco, Donatella; Fichera, Marco; Galesi, Ornella; Kleefstra, Tjitske; Greally, Marie T; Ockeloen, Charlotte W; Willemsen, Marjolein H; Bongers, Ernie M H F; Janssen, Irene M; Pfundt, Rolph; Veltman, Joris A; Romano, Corrado; Willemsen, Michèl A; van Bokhoven, Hans; Brunner, Han G; de Vries, Bert B A; de Brouwer, Arjan P M

    2013-12-01

    Intellectual disability (ID) is a common neurodevelopmental disorder affecting 1-3% of the general population. Mutations in more than 10% of all human genes are considered to be involved in this disorder, although the majority of these genes are still unknown. We investigated 19 small non-consanguineous families with two to five affected siblings in order to identify pathogenic gene variants in known, novel and potential ID candidate genes. Non-consanguineous families have been largely ignored in gene identification studies as small family size precludes prior mapping of the genetic defect. Using exome sequencing, we identified pathogenic mutations in three genes, DDHD2, SLC6A8, and SLC9A6, of which the latter two have previously been implicated in X-linked ID phenotypes. In addition, we identified potentially pathogenic mutations in BCORL1 on the X-chromosome and in MCM3AP, PTPRT, SYNE1, and ZNF528 on autosomes. We show that potentially pathogenic gene variants can be identified in small, non-consanguineous families with as few as two affected siblings, thus emphasising their value in the identification of syndromic and non-syndromic ID genes.

  5. New mutations in the NHS gene in Nance-Horan Syndrome families from the Netherlands

    NARCIS (Netherlands)

    Florijn, Ralph J.; Loves, Willem; Maillette de Buy Wenniger-Prick, Liesbeth J. J. M.; Mannens, Marcel M. A. M.; Tijmes, Nel; Brooks, Simon P.; Hardcastle, Alison J.; Bergen, Arthur A. B.

    2006-01-01

    Mutations in the NHS gene cause Nance-Horan Syndrome (NHS), a rare X-chromosomal recessive disorder with variable features, including congenital cataract, microphthalmia, a peculiar form of the ear and dental anomalies. We investigated the NHS gene in four additional families with NHS from the

  6. Germline heterozygous variants in genes associated with familial hemophagocytic lymphohistiocytosis as a cause of increased bleeding

    DEFF Research Database (Denmark)

    Fager Ferrari, Marcus; Leinoe, Eva; Rossing, Maria

    2018-01-01

    Familial hemophagocytic lymphohistiocytosis (FHL) is caused by biallelic variants in genes regulating granule secretion in cytotoxic lymphocytes. In FHL3-5, the affected genes UNC13D, STX11 and STXBP2 have further been shown to regulate the secretion of platelet granules, giving rise to compromised...

  7. X-exome sequencing of 405 unresolved families identifies seven novel intellectual disability genes

    NARCIS (Netherlands)

    Hu, H; Haas, S.A.; Chelly, J.; Esch, H. Van; Raynaud, M.; Brouwer, A.P. de; Weinert, S.; Froyen, G.; Frints, S.G.; Laumonnier, F.; Zemojtel, T.; Love, M.I.; Richard, H.; Emde, A.K.; Bienek, M.; Jensen, C.; Hambrock, M.; Fischer, U.; Langnick, C.; Feldkamp, M.; Wissink-Lindhout, W.; Lebrun, N.; Castelnau, L.; Rucci, J.; Montjean, R.; Dorseuil, O.; Billuart, P.; Stuhlmann, T.; Shaw, M.; Corbett, M.A.; Gardner, A.; Willis-Owen, S.; Tan, C.; Friend, K.L.; Belet, S.; Roozendaal, K.E. van; Jimenez-Pocquet, M.; Moizard, M.P.; Ronce, N.; Sun, R.; O'Keeffe, S.; Chenna, R.; Bommel, A. van; Goke, J.; Hackett, A.; Field, M.; Christie, L.; Boyle, J.; Haan, E.; Nelson, J.; Turner, G.; Baynam, G.; Gillessen-Kaesbach, G.; Muller, U.; Steinberger, D.; Budny, B.; Badura-Stronka, M.; Latos-Bielenska, A.; Ousager, L.B.; Wieacker, P.; Rodriguez Criado, G.; Bondeson, M.L.; Anneren, G.; Dufke, A.; Cohen, M.; Maldergem, L. Van; Vincent-Delorme, C.; Echenne, B.; Simon-Bouy, B.; Kleefstra, T.; Willemsen, M.H.; Fryns, J.P.; Devriendt, K.; Ullmann, R.; Vingron, M.; Wrogemann, K.; Wienker, T.F.; Tzschach, A.; Bokhoven, H. van; Gecz, J.; Jentsch, T.J.; Chen, W.; Ropers, H.H.; Kalscheuer, V.M.

    2016-01-01

    X-linked intellectual disability (XLID) is a clinically and genetically heterogeneous disorder. During the past two decades in excess of 100 X-chromosome ID genes have been identified. Yet, a large number of families mapping to the X-chromosome remained unresolved suggesting that more XLID genes or

  8. X-exome sequencing of 405 unresolved families identifies seven novel intellectual disability genes

    DEFF Research Database (Denmark)

    Hu, H; Haas, S A; Chelly, J

    2016-01-01

    X-linked intellectual disability (XLID) is a clinically and genetically heterogeneous disorder. During the past two decades in excess of 100 X-chromosome ID genes have been identified. Yet, a large number of families mapping to the X-chromosome remained unresolved suggesting that more XLID genes...

  9. Linkage studies and mutation analysis of the PDEB gene in 23 families with Leber congenital amaurosis

    DEFF Research Database (Denmark)

    Riess, O; Weber, B; Nørremølle, Anne

    1992-01-01

    as to whether mutations in the human PDEB gene might cause LCA. We have previously cloned and characterized the human homologue of the mouse Pdeb gene and have mapped it to chromosome 4p16.3. In this study, a total of 23 LCA families of various ethnic backgrounds have been investigated. Linkage analysis using...

  10. Expressional and Biochemical Characterization of Rice Disease Resistance Gene Xa3/Xa26 Family

    Institute of Scientific and Technical Information of China (English)

    Songjie Xu; Yinglong Cao; Xianghua Li; Shiping Wang

    2007-01-01

    The rice (Oryza sativa L.) Xa3/Xa26 gene, conferring race-specific resistance to bacterial blight disease and encoding a leucine-rich repeat (LRR) receptor kinase-like protein, belongs to a multigene family consisting of tandem clustered homologous genes, colocalizing with several uncharacterized genes for resistance to bacterial blight or fungal blast. To provide more information on the expressional and biochemical characteristics of the Xa3/Xa26 family, we analyzed the family members. Four Xa3/Xa26 family members in the indica rice variety Teqing, which carries a bacterial blight resistance gene with a chromosomal location tightly linked to Xa3/Xa26, and five Xa3/Xa26 family members in the japonica rice variety Nipponbare, which carries at least one uncharacterized blast resistance gene, were constitutively expressed in leaf tissue. The result suggests that some of the family members may be candidates of these uncharacterized resistance genes. At least five putative N-glycosylation sites in the LRR domain of XA3/XA26 protein are not glycosylated. The XA3/XA26 and its family members MRKa and MRKc all possess the consensus sequences of paired cysteines, which putatively function in dimerization of the receptor proteins for signal transduction, immediately before the first LRR and immediately after the last LRR. However, no homo-dimer between the XA3/XA26 molecules or hetero-dimer between XA3/XA26 and MRKa or MRKc were formed, indicating that XA3/XA26 protein might function either as a monomer or a hetero-dimer formed with other protein outside of the XA3/XA26 family. These results provide valuable information for further extensive investigation into this multiple protein family.

  11. A phylogenomic gene cluster resource: The phylogeneticallyinferred groups (PhlGs) database

    Energy Technology Data Exchange (ETDEWEB)

    Dehal, Paramvir S.; Boore, Jeffrey L.

    2005-08-25

    We present here the PhIGs database, a phylogenomic resource for sequenced genomes. Although many methods exist for clustering gene families, very few attempt to create truly orthologous clusters sharing descent from a single ancestral gene across a range of evolutionary depths. Although these non-phylogenetic gene family clusters have been used broadly for gene annotation, errors are known to be introduced by the artifactual association of slowly evolving paralogs and lack of annotation for those more rapidly evolving. A full phylogenetic framework is necessary for accurate inference of function and for many studies that address pattern and mechanism of the evolution of the genome. The automated generation of evolutionary gene clusters, creation of gene trees, determination of orthology and paralogy relationships, and the correlation of this information with gene annotations, expression information, and genomic context is an important resource to the scientific community.

  12. The importance of melanoma inhibitory activity gene family in the tumor progression of oral cancer.

    Science.gov (United States)

    Sasahira, Tomonori; Bosserhoff, Anja Katrin; Kirita, Tadaaki

    2018-05-01

    Oral squamous cell carcinoma has a high potential for locoregional invasion and nodal metastasis. Consequently, early detection of such malignancies is of immense importance. The melanoma inhibitory activity (MIA) gene family comprises MIA, MIA2, transport and Golgi organization protein 1 (TANGO), and otoraplin (OTOR). These members of the MIA gene family have a highly conserved Src homology 3 (SH3)-like structure. Although the molecules of this family share 34-45% amino acid homology and 47-59% cDNA sequence homology, those members, excluding OTOR, play different tumor-associated functions. MIA has a pivotal role in the progression and metastasis of melanoma; MIA2 and TANGO have been suggested to possess tumor-suppressive functions; and OTOR is uniquely expressed in cochlea of the inner ear. Therefore, the definite functions of the MIA gene family in cancer cells remain unclear. Since the members of the MIA gene family are secreted proteins, these molecules might be useful tumor markers that can be detected in the body fluids, including serum and saliva. In this review, we described the molecular biological functions of the MIA gene family in oral cancer. © 2018 Japanese Society of Pathology and John Wiley & Sons Australia, Ltd.

  13. Genome-wide characterization of the WRKY gene family in radish (Raphanus sativus L.) reveals its critical functions under different abiotic stresses.

    Science.gov (United States)

    Karanja, Bernard Kinuthia; Fan, Lianxue; Xu, Liang; Wang, Yan; Zhu, Xianwen; Tang, Mingjia; Wang, Ronghua; Zhang, Fei; Muleke, Everlyne M'mbone; Liu, Liwang

    2017-11-01

    The radish WRKY gene family was genome-widely identified and played critical roles in response to multiple abiotic stresses. The WRKY is among the largest transcription factors (TFs) associated with multiple biological activities for plant survival, including control response mechanisms against abiotic stresses such as heat, salinity, and heavy metals. Radish is an important root vegetable crop and therefore characterization and expression pattern investigation of WRKY transcription factors in radish is imperative. In the present study, 126 putative WRKY genes were retrieved from radish genome database. Protein sequence and annotation scrutiny confirmed that RsWRKY proteins possessed highly conserved domains and zinc finger motif. Based on phylogenetic analysis results, RsWRKYs candidate genes were divided into three groups (Group I, II and III) with the number 31, 74, and 20, respectively. Additionally, gene structure analysis revealed that intron-exon patterns of the WRKY genes are highly conserved in radish. Linkage map analysis indicated that RsWRKY genes were distributed with varying densities over nine linkage groups. Further, RT-qPCR analysis illustrated the significant variation of 36 RsWRKY genes under one or more abiotic stress treatments, implicating that they might be stress-responsive genes. In total, 126 WRKY TFs were identified from the R. sativus genome wherein, 35 of them showed abiotic stress-induced expression patterns. These results provide a genome-wide characterization of RsWRKY TFs and baseline for further functional dissection and molecular evolution investigation, specifically for improving abiotic stress resistances with an ultimate goal of increasing yield and quality of radish.

  14. Duplications and losses in gene families of rust pathogens highlight putative effectors

    Directory of Open Access Journals (Sweden)

    Amanda L. Pendleton

    2014-06-01

    Full Text Available Rust fungi are a group of fungal pathogens that cause some of the world’s most destructive diseases of trees and crops. A shared characteristic among rust fungi is obligate biotrophy, the inability to complete a lifecycle without a host. This dependence on a host species likely affects patterns of gene expansion, contraction, and innovation within rust pathogen genomes. The establishment of disease by biotrophic pathogens is reliant upon effector proteins that are encoded in the fungal genome and secreted from the pathogen into the host’s cell apoplast or within the cells. This study uses a comparative genomic approach to elucidate putative effectors and determine their evolutionary histories. We used OrthoMCL to identify nearly 20,000 gene families in proteomes of sixteen diverse fungal species, which include fifteen basidiomycetes and one ascomycete. We inferred patterns of duplication and loss for each gene family and identified families with distinctive patterns of expansion/contraction associated with the evolution of rust fungal genomes. To recognize potential contributors for the unique features of rust pathogens, we identified families harboring secreted proteins that: i arose or expanded in rust pathogens relative to other fungi, or ii contracted or were lost in rust fungal genomes. While the origin of rust fungi appears to be associated with considerable gene loss, there are many gene duplications associated with each sampled rust fungal genome. We also highlight two putative effector gene families that have expanded in Cqf that we hypothesize have roles in pathogenicity.

  15. Duplications and losses in gene families of rust pathogens highlight putative effectors.

    Science.gov (United States)

    Pendleton, Amanda L; Smith, Katherine E; Feau, Nicolas; Martin, Francis M; Grigoriev, Igor V; Hamelin, Richard; Nelson, C Dana; Burleigh, J Gordon; Davis, John M

    2014-01-01

    Rust fungi are a group of fungal pathogens that cause some of the world's most destructive diseases of trees and crops. A shared characteristic among rust fungi is obligate biotrophy, the inability to complete a lifecycle without a host. This dependence on a host species likely affects patterns of gene expansion, contraction, and innovation within rust pathogen genomes. The establishment of disease by biotrophic pathogens is reliant upon effector proteins that are encoded in the fungal genome and secreted from the pathogen into the host's cell apoplast or within the cells. This study uses a comparative genomic approach to elucidate putative effectors and determine their evolutionary histories. We used OrthoMCL to identify nearly 20,000 gene families in proteomes of 16 diverse fungal species, which include 15 basidiomycetes and one ascomycete. We inferred patterns of duplication and loss for each gene family and identified families with distinctive patterns of expansion/contraction associated with the evolution of rust fungal genomes. To recognize potential contributors for the unique features of rust pathogens, we identified families harboring secreted proteins that: (i) arose or expanded in rust pathogens relative to other fungi, or (ii) contracted or were lost in rust fungal genomes. While the origin of rust fungi appears to be associated with considerable gene loss, there are many gene duplications associated with each sampled rust fungal genome. We also highlight two putative effector gene families that have expanded in Cqf that we hypothesize have roles in pathogenicity.

  16. Genome-Wide Analysis of the AP2/ERF Family in Eucalyptus grandis: An Intriguing Over-Representation of Stress-Responsive DREB1/CBF Genes

    Science.gov (United States)

    SanClemente, H.; Mounet, F.; Dunand, C.; Marque, G.; Marque, C.; Teulières, C.

    2015-01-01

    Background The AP2/ERF family includes a large number of developmentally and physiologically important transcription factors sharing an AP2 DNA-binding domain. Among them DREB1/CBF and DREB2 factors are known as master regulators respectively of cold and heat/osmotic stress responses. Experimental Approaches The manual annotation of AP2/ERF family from Eucalyptus grandis, Malus, Populus and Vitis genomes allowed a complete phylogenetic study for comparing the structure of this family in woody species and the model Arabidopsis thaliana. Expression profiles of the whole groups of EgrDREB1 and EgrDREB2 were investigated through RNAseq database survey and RT-qPCR analyses. Results The structure and the size of the AP2/ERF family show a global conservation for the plant species under comparison. In addition to an expansion of the ERF subfamily, the tree genomes mainly differ with respect to the group representation within the subfamilies. With regard to the E. grandis DREB subfamily, an obvious feature is the presence of 17 DREB1/CBF genes, the maximum reported to date for dicotyledons. In contrast, only six DREB2 have been identified, which is similar to the other plants species under study, except for Malus. All the DREB1/CBF and DREB2 genes from E. grandis are expressed in at least one condition and all are heat-responsive. Regulation by cold and drought depends on the genes but is not specific of one group; DREB1/CBF group is more cold-inducible than DREB2 which is mainly drought responsive. Conclusion These features suggest that the dramatic expansion of the DREB1/CBF group might be related to the adaptation of this evergreen tree to climate changes when it expanded in Australia. PMID:25849589

  17. Genome-wide analysis of the AP2/ERF family in Eucalyptus grandis: an intriguing over-representation of stress-responsive DREB1/CBF genes.

    Directory of Open Access Journals (Sweden)

    P B Cao

    Full Text Available The AP2/ERF family includes a large number of developmentally and physiologically important transcription factors sharing an AP2 DNA-binding domain. Among them DREB1/CBF and DREB2 factors are known as master regulators respectively of cold and heat/osmotic stress responses.The manual annotation of AP2/ERF family from Eucalyptus grandis, Malus, Populus and Vitis genomes allowed a complete phylogenetic study for comparing the structure of this family in woody species and the model Arabidopsis thaliana. Expression profiles of the whole groups of EgrDREB1 and EgrDREB2 were investigated through RNAseq database survey and RT-qPCR analyses.The structure and the size of the AP2/ERF family show a global conservation for the plant species under comparison. In addition to an expansion of the ERF subfamily, the tree genomes mainly differ with respect to the group representation within the subfamilies. With regard to the E. grandis DREB subfamily, an obvious feature is the presence of 17 DREB1/CBF genes, the maximum reported to date for dicotyledons. In contrast, only six DREB2 have been identified, which is similar to the other plants species under study, except for Malus. All the DREB1/CBF and DREB2 genes from E. grandis are expressed in at least one condition and all are heat-responsive. Regulation by cold and drought depends on the genes but is not specific of one group; DREB1/CBF group is more cold-inducible than DREB2 which is mainly drought responsive.These features suggest that the dramatic expansion of the DREB1/CBF group might be related to the adaptation of this evergreen tree to climate changes when it expanded in Australia.

  18. NoGOA: predicting noisy GO annotations using evidences and sparse representation.

    Science.gov (United States)

    Yu, Guoxian; Lu, Chang; Wang, Jun

    2017-07-21

    Gene Ontology (GO) is a community effort to represent functional features of gene products. GO annotations (GOA) provide functional associations between GO terms and gene products. Due to resources limitation, only a small portion of annotations are manually checked by curators, and the others are electronically inferred. Although quality control techniques have been applied to ensure the quality of annotations, the community consistently report that there are still considerable noisy (or incorrect) annotations. Given the wide application of annotations, however, how to identify noisy annotations is an important but yet seldom studied open problem. We introduce a novel approach called NoGOA to predict noisy annotations. NoGOA applies sparse representation on the gene-term association matrix to reduce the impact of noisy annotations, and takes advantage of sparse representation coefficients to measure the semantic similarity between genes. Secondly, it preliminarily predicts noisy annotations of a gene based on aggregated votes from semantic neighborhood genes of that gene. Next, NoGOA estimates the ratio of noisy annotations for each evidence code based on direct annotations in GOA files archived on different periods, and then weights entries of the association matrix via estimated ratios and propagates weights to ancestors of direct annotations using GO hierarchy. Finally, it integrates evidence-weighted association matrix and aggregated votes to predict noisy annotations. Experiments on archived GOA files of six model species (H. sapiens, A. thaliana, S. cerevisiae, G. gallus, B. Taurus and M. musculus) demonstrate that NoGOA achieves significantly better results than other related methods and removing noisy annotations improves the performance of gene function prediction. The comparative study justifies the effectiveness of integrating evidence codes with sparse representation for predicting noisy GO annotations. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=NoGOA .

  19. Common mutations identified in the MLH1 gene in familial Lynch syndrome

    Directory of Open Access Journals (Sweden)

    Jisha Elias

    2017-12-01

    In this study we identified three families with Lynch syndrome from a rural cancer center in western India (KCHRC, Goraj, Gujarat, where 70-75 CRC patients are seen annually. DNA isolated from the blood of consented family members of all three families (8-10 members/family was subjected to NGS sequencing methods on an Illumina HiSeq 4000 platform. We identified unique mutations in the MLH1 gene in all three HNPCC family members. Two of the three unrelated families shared a common mutation (154delA and 156delA. Total 8 members of a family were identified as carriers for 156delA mutation of which 5 members were unaffected while 3 were affected (age of onset: 1 member <30yrs & 2 were>40yr. The family with 154delA mutation showed 2 affected members (>40yr carrying the mutations.LYS618DEL mutation found in 8 members of the third family showed that both affected and unaffected carried the mutation. Thus the common mutations identified in the MLH1 gene in two unrelated families had a high risk for lynch syndrome especially above the age of 40.

  20. Understanding the mechanisms of ATPase beta family genes for cellular thermotolerance in crossbred bulls.

    Science.gov (United States)

    Deb, Rajib; Sajjanar, Basavaraj; Singh, Umesh; Alex, Rani; Raja, T V; Alyethodi, Rafeeque R; Kumar, Sushil; Sengar, Gyanendra; Sharma, Sheetal; Singh, Rani; Prakash, B

    2015-12-01

    Na+/K+-ATPase is an integral membrane protein composed of a large catalytic subunit (alpha), a smaller glycoprotein subunit (beta), and gamma subunit. The beta subunit is essential for ion recognition as well as maintenance of the membrane integrity. Present study was aimed to analyze the expression pattern of ATPase beta subunit genes (ATPase B1, ATPase B2, and ATPase B3) among the crossbred bulls under different ambient temperatures (20-44 °C). The present study was also aimed to look into the relationship of HSP70 with the ATPase beta family genes. Our results demonstrated that among beta family genes, transcript abundance of ATPase B1 and ATPase B2 is significantly (P ATPase Β1, ATPase B2, and ATPase B3 is highly correlated (P ATPase beta family genes for cellular thermotolerance in cattle.

  1. Genome-Wide Analysis of the RNA Helicase Gene Family in Gossypium raimondii

    Directory of Open Access Journals (Sweden)

    Jie Chen

    2014-03-01

    Full Text Available The RNA helicases, which help to unwind stable RNA duplexes, and have important roles in RNA metabolism, belong to a class of motor proteins that play important roles in plant development and responses to stress. Although this family of genes has been the subject of systematic investigation in Arabidopsis, rice, and tomato, it has not yet been characterized in cotton. In this study, we identified 161 putative RNA helicase genes in the genome of the diploid cotton species Gossypium raimondii. We classified these genes into three subfamilies, based on the presence of either a DEAD-box (51 genes, DEAH-box (52 genes, or DExD/H-box (58 genes in their coding regions. Chromosome location analysis showed that the genes that encode RNA helicases are distributed across all 13 chromosomes of G. raimondii. Syntenic analysis revealed that 62 of the 161 G. raimondii helicase genes (38.5% are within the identified syntenic blocks. Sixty-six (40.99% helicase genes from G. raimondii have one or several putative orthologs in tomato. Additionally, GrDEADs have more conserved gene structures and more simple domains than GrDEAHs and GrDExD/Hs. Transcriptome sequencing data demonstrated that many of these helicases, especially GrDEADs, are highly expressed at the fiber initiation stage and in mature leaves. To our knowledge, this is the first report of a genome-wide analysis of the RNA helicase gene family in cotton.

  2. Genome-wide identification, characterization and phylogenetic analysis of 50 catfish ATP-binding cassette (ABC) transporter genes.

    Science.gov (United States)

    Liu, Shikai; Li, Qi; Liu, Zhanjiang

    2013-01-01

    Although a large set of full-length transcripts was recently assembled in catfish, annotation of large gene families, especially those with duplications, is still a great challenge. Most often, complexities in annotation cause mis-identification and thereby much confusion in the scientific literature. As such, detailed phylogenetic analysis and/or orthology analysis are required for annotation of genes involved in gene families. The ATP-binding cassette (ABC) transporter gene superfamily is a large gene family that encodes membrane proteins that transport a diverse set of substrates across membranes, playing important roles in protecting organisms from diverse environment. In this work, we identified a set of 50 ABC transporters in catfish genome. Phylogenetic analysis allowed their identification and annotation into seven subfamilies, including 9 ABCA genes, 12 ABCB genes, 12 ABCC genes, 5 ABCD genes, 2 ABCE genes, 4 ABCF genes and 6 ABCG genes. Most ABC transporters are conserved among vertebrates, though cases of recent gene duplications and gene losses do exist. Gene duplications in catfish were found for ABCA1, ABCB3, ABCB6, ABCC5, ABCD3, ABCE1, ABCF2 and ABCG2. The whole set of catfish ABC transporters provide the essential genomic resources for future biochemical, toxicological and physiological studies of ABC drug efflux transporters. The establishment of orthologies should allow functional inferences with the information from model species, though the function of lineage-specific genes can be distinct because of specific living environment with different selection pressure.

  3. Characterization of the avian Trojan gene family reveals contrasting evolutionary constraints.

    Directory of Open Access Journals (Sweden)

    Petar Petrov

    Full Text Available "Trojan" is a leukocyte-specific, cell surface protein originally identified in the chicken. Its molecular function has been hypothesized to be related to anti-apoptosis and the proliferation of immune cells. The Trojan gene has been localized onto the Z sex chromosome. The adjacent two genes also show significant homology to Trojan, suggesting the existence of a novel gene/protein family. Here, we characterize this Trojan family, identify homologues in other species and predict evolutionary constraints on these genes. The two Trojan-related proteins in chicken were predicted as a receptor-type tyrosine phosphatase and a transmembrane protein, bearing a cytoplasmic immuno-receptor tyrosine-based activation motif. We identified the Trojan gene family in ten other bird species and found related genes in three reptiles and a fish species. The phylogenetic analysis of the homologues revealed a gradual diversification among the family members. Evolutionary analyzes of the avian genes predicted that the extracellular regions of the proteins have been subjected to positive selection. Such selection was possibly a response to evolving interacting partners or to pathogen challenges. We also observed an almost complete lack of intracellular positively selected sites, suggesting a conserved signaling mechanism of the molecules. Therefore, the contrasting patterns of selection likely correlate with the interaction and signaling potential of the molecules.

  4. Characterization of the avian Trojan gene family reveals contrasting evolutionary constraints.

    Science.gov (United States)

    Petrov, Petar; Syrjänen, Riikka; Smith, Jacqueline; Gutowska, Maria Weronika; Uchida, Tatsuya; Vainio, Olli; Burt, David W

    2015-01-01

    "Trojan" is a leukocyte-specific, cell surface protein originally identified in the chicken. Its molecular function has been hypothesized to be related to anti-apoptosis and the proliferation of immune cells. The Trojan gene has been localized onto the Z sex chromosome. The adjacent two genes also show significant homology to Trojan, suggesting the existence of a novel gene/protein family. Here, we characterize this Trojan family, identify homologues in other species and predict evolutionary constraints on these genes. The two Trojan-related proteins in chicken were predicted as a receptor-type tyrosine phosphatase and a transmembrane protein, bearing a cytoplasmic immuno-receptor tyrosine-based activation motif. We identified the Trojan gene family in ten other bird species and found related genes in three reptiles and a fish species. The phylogenetic analysis of the homologues revealed a gradual diversification among the family members. Evolutionary analyzes of the avian genes predicted that the extracellular regions of the proteins have been subjected to positive selection. Such selection was possibly a response to evolving interacting partners or to pathogen challenges. We also observed an almost complete lack of intracellular positively selected sites, suggesting a conserved signaling mechanism of the molecules. Therefore, the contrasting patterns of selection likely correlate with the interaction and signaling potential of the molecules.

  5. Enamelin/ameloblastin gene polymorphisms in autosomal amelogenesis imperfecta among Syrian families.

    Science.gov (United States)

    Dashash, Mayssoon; Bazrafshani, Mohamed Riza; Poulton, Kay; Jaber, Saaed; Naeem, Emad; Blinkhorn, Anthony Stevenson

    2011-02-01

      This study was undertaken to investigate whether a single G deletion within a series of seven G residues (codon 196) at the exon 9-intron 9 boundary of the enamelin gene ENAM and a tri-nucleotide deletion at codon 180 in exon 7 (GGA vs deletion) of ameloblastin gene AMBN could have a role in autosomal amelogenesis imperfecta among affected Syrian families.   A new technique - size-dependent, deletion screening - was developed to detect nucleotide deletion in ENAM and AMBN genes. Twelve Syrian families with autosomal-dominant or -recessive amelogenesis imperfecta were included.   A homozygous/heterozygous mutation in the ENAM gene (152/152, 152/153) was identified in affected members of three families with autosomal-dominant amelogenesis imperfecta and one family with autosomal-recessive amelogenesis imperfecta. A heterozygous mutation (222/225) in the AMBN gene was identified. However, no disease causing mutations was found. The present findings provide useful information for the implication of ENAM gene polymorphism in autosomal-dominant/-recessive amelogenesis imperfecta.   Further investigations are required to identify other genes responsible for the various clinical phenotypes. © 2010 Blackwell Publishing Asia Pty Ltd.

  6. Transcriptomic and phylogenetic analysis of Culex pipiens quinquefasciatus for three detoxification gene families

    Directory of Open Access Journals (Sweden)

    Yan Liangzhen

    2012-11-01

    Full Text Available Abstract Background The genomes of three major mosquito vectors of human diseases, Anopheles gambiae, Aedes aegypti, and Culex pipiens quinquefasciatus, have been previously sequenced. C. p. quinquefasciatus has the largest number of predicted protein-coding genes, which partially results from the expansion of three detoxification gene families: cytochrome P450 monooxygenases (P450, glutathione S-transferases (GST, and carboxyl/cholinesterases (CCE. However, unlike An. gambiae and Ae. aegypti, which have large amounts of gene expression data, C. p. quinquefasciatus has limited transcriptomic resources. Knowledge of complete gene expression information is very important for the exploration of the functions of genes involved in specific biological processes. In the present study, the three detoxification gene families of C. p. quinquefasciatus were analyzed for phylogenetic classification and compared with those of three other dipteran insects. Gene expression during various developmental stages and the differential expression responsible for parathion resistance were profiled using the digital gene expression (DGE technique. Results A total of 302 detoxification genes were found in C. p. quinquefasciatus, including 71 CCE, 196 P450, and 35 cytosolic GST genes. Compared with three other dipteran species, gene expansion in Culex mainly occurred in the CCE and P450 families, where the genes of α-esterases, juvenile hormone esterases, and CYP325 of the CYP4 subfamily showed the most pronounced expansion on the genome. For the five DGE libraries, 3.5-3.8 million raw tags were generated and mapped to 13314 reference genes. Among 302 detoxification genes, 225 (75% were detected for expression in at least one DGE library. One fourth of the CCE and P450 genes were detected uniquely in one stage, indicating potential developmentally regulated expression. A total of 1511 genes showed different expression levels between a parathion-resistant and a

  7. BEACON: automated tool for Bacterial GEnome Annotation ComparisON

    KAUST Repository

    Kalkatawi, Manal M.; Alam, Intikhab; Bajic, Vladimir B.

    2015-01-01

    We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/

  8. First generation annotations for the fathead minnow (Pimephales promelas) genome

    Science.gov (United States)

    Ab initio gene prediction and evidence alignment were used to produce the first annotations for the fathead minnow SOAPdenovo genome assembly. Additionally, a genome browser hosted at genome.setac.org provides simplified access to the annotation data in context with fathead minno...

  9. Annotation of the protein coding regions of the equine genome

    DEFF Research Database (Denmark)

    Hestand, Matthew S.; Kalbfleisch, Theodore S.; Coleman, Stephen J.

    2015-01-01

    Current gene annotation of the horse genome is largely derived from in silico predictions and cross-species alignments. Only a small number of genes are annotated based on equine EST and mRNA sequences. To expand the number of equine genes annotated from equine experimental evidence, we sequenced m...... and appear to be small errors in the equine reference genome, since they are also identified as homozygous variants by genomic DNA resequencing of the reference horse. Taken together, we provide a resource of equine mRNA structures and protein coding variants that will enhance equine and cross...

  10. Identification and description of three families with familial Alzheimer disease that segregate variants in the SORL1 gene.

    Science.gov (United States)

    Thonberg, Håkan; Chiang, Huei-Hsin; Lilius, Lena; Forsell, Charlotte; Lindström, Anna-Karin; Johansson, Charlotte; Björkström, Jenny; Thordardottir, Steinunn; Sleegers, Kristel; Van Broeckhoven, Christine; Rönnbäck, Annica; Graff, Caroline

    2017-06-09

    Alzheimer disease (AD) is a progressive neurodegenerative disorder and the most common form of dementia. The majority of AD cases are sporadic, while up to 5% are families with an early onset AD (EOAD). Mutations in one of the three genes: amyloid beta precursor protein (APP), presenilin 1 (PSEN1) or presenilin 2 (PSEN2) can be disease causing. However, most EOAD families do not carry mutations in any of these three genes, and candidate genes, such as the sortilin-related receptor 1 (SORL1), have been suggested to be potentially causative. To identify AD causative variants, we performed whole-exome sequencing on five individuals from a family with EOAD and a missense variant, p.Arg1303Cys (c.3907C > T) was identified in SORL1 which segregated with disease and was further characterized with immunohistochemistry on two post mortem autopsy cases from the same family. In a targeted re-sequencing effort on independent index patients from 35 EOAD-families, a second SORL1 variant, c.3050-2A > G, was found which segregated with the disease in 3 affected and was absent in one unaffected family member. The c.3050-2A > G variant is located two nucleotides upstream of exon 22 and was shown to cause exon 22 skipping, resulting in a deletion of amino acids Gly1017- Glu1074 of SORL1. Furthermore, a third SORL1 variant, c.5195G > C, recently identified in a Swedish case control cohort included in the European Early-Onset Dementia (EU EOD) consortium study, was detected in two affected siblings in a third family with familial EOAD. The finding of three SORL1-variants that segregate with disease in three separate families with EOAD supports the involvement of SORL1 in AD pathology. The cause of these rare monogenic forms of EOAD has proven difficult to find and the use of exome and genome sequencing may be a successful route to target them.

  11. Diverse expression of sucrose transporter gene family in Zea mays

    Indian Academy of Sciences (India)

    2015-03-04

    Mar 4, 2015 ... In this study, we identified four sucrose transporter genes. (ZmSUT1 .... strand synthesis was done with forward and reverse primers designed at .... Qazi H. A., Paranjpe S. and Bhargava S. 2012 Stem sugar accu- mulation in ...

  12. Gene Panel Testing in Epileptic Encephalopathies and Familial Epilepsies

    DEFF Research Database (Denmark)

    Møller, Rikke S.; Larsen, Line H.G.; Johannesen, Katrine M.

    2016-01-01

    -causing variant in 49 (23%) of the 216 patients. The variants were found in 19 different genes including SCN1A, STXBP1, CDKL5, SCN2A, SCN8A, GABRA1, KCNA2, and STX1B. Patients with neonatal-onset epilepsies had the highest rate of positive findings (57%). The overall yield for patients with EEs was 32%, compared...

  13. Gene Panel Testing in Epileptic Encephalopathies and Familial Epilepsies

    DEFF Research Database (Denmark)

    Møller, Rikke S; Larsen, Line H G; Johannesen, Katrine M

    2016-01-01

    In recent years, several genes have been causally associated with epilepsy. However, making a genetic diagnosis in a patient can still be difficult, since extensive phenotypic and genetic heterogeneity has been observed in many monogenic epilepsies. This study aimed to analyze the genetic basis o...

  14. Genetic diversity of bitter taste receptor gene family in Sichuan ...

    Indian Academy of Sciences (India)

    Previous research had revealed that chicken has only three bitter taste receptor genes (Tas2r1, ... Journal of Genetics, DOI 10.1007/s12041-016-0684-4, Vol. ..... between red-winged blackbirds and European starlings. ... Academic Press,.

  15. The zebrafish progranulin gene family and antisense transcripts

    Directory of Open Access Journals (Sweden)

    Baranowski David

    2005-11-01

    Full Text Available Abstract Background Progranulin is an epithelial tissue growth factor (also known as proepithelin, acrogranin and PC-cell-derived growth factor that has been implicated in development, wound healing and in the progression of many cancers. The single mammalian progranulin gene encodes a glycoprotein precursor consisting of seven and one half tandemly repeated non-identical copies of the cystine-rich granulin motif. A genome-wide duplication event hypothesized to have occurred at the base of the teleost radiation predicts that mammalian progranulin may be represented by two co-orthologues in zebrafish. Results The cDNAs encoding two zebrafish granulin precursors, progranulins-A and -B, were characterized and found to contain 10 and 9 copies of the granulin motif respectively. The cDNAs and genes encoding the two forms of granulin, progranulins-1 and -2, were also cloned and sequenced. Both latter peptides were found to be encoded by precursors with a simplified architecture consisting of one and one half copies of the granulin motif. A cDNA encoding a chimeric progranulin which likely arises through the mechanism of trans-splicing between grn1 and grn2 was also characterized. A non-coding RNA gene with antisense complementarity to both grn1 and grn2 was identified which may have functional implications with respect to gene dosage, as well as in restricting the formation of the chimeric form of progranulin. Chromosomal localization of the four progranulin (grn genes reveals syntenic conservation for grna only, suggesting that it is the true orthologue of mammalian grn. RT-PCR and whole-mount in situ hybridization analysis of zebrafish grns during development reveals that combined expression of grna and grnb, but not grn1 and grn2, recapitulate many of the expression patterns observed for the murine counterpart. This includes maternal deposition, widespread central nervous system distribution and specific localization within the epithelial

  16. Annotating functional RNAs in genomes using Infernal.

    Science.gov (United States)

    Nawrocki, Eric P

    2014-01-01

    Many different types of functional non-coding RNAs participate in a wide range of important cellular functions but the large majority of these RNAs are not routinely annotated in published genomes. Several programs have been developed for identifying RNAs, including specific tools tailored to a particular RNA family as well as more general ones designed to work for any family. Many of these tools utilize covariance models (CMs), statistical models of the conserved sequence, and structure of an RNA family. In this chapter, as an illustrative example, the Infernal software package and CMs from the Rfam database are used to identify RNAs in the genome of the archaeon Methanobrevibacter ruminantium, uncovering some additional RNAs not present in the genome's initial annotation. Analysis of the results and comparison with family-specific methods demonstrate some important strengths and weaknesses of this general approach.

  17. The SULTR gene family in maize (Zea mays L.): Gene cloning and expression analyses under sulfate starvation and abiotic stress.

    Science.gov (United States)

    Huang, Qin; Wang, Meiping; Xia, Zongliang

    2018-01-01

    Sulfur is an essential macronutrient required for plant growth, development and stress responses. The family of sulfate transporters (SULTRs) mediates the uptake and translocation of sulfate in higher plants. However, basic knowledge of the SULTR gene family in maize (Zea mays L.) is scarce. In this study, a genome-wide bioinformatic analysis of SULTR genes in maize was conducted, and the developmental expression patterns of the genes and their responses to sulfate starvation and abiotic stress were further investigated. The ZmSULTR family includes eight putative members in the maize genome and is clustered into four groups in the phylogenetic tree. These genes displayed differential expression patterns in various organs of maize. For example, expression of ZmSULTR1;1 and ZmSULTR4;1 was high in roots, and transcript levels of ZmSULTR3;1 and ZmSULTR3;3 were high in shoots. Expression of ZmSULTR1;2, ZmSULTR2;1, ZmSULTR3;3, and ZmSULTR4;1 was high in flowers. Also, these eight genes showed differential responses to sulfate deprivation in roots and shoots of maize seedlings. Transcript levels of ZmSULTR1;1, ZmSULTR1;2, and ZmSULTR3;4 were significantly increased in roots during 12-day-sulfate starvation stress, while ZmSULTR3;3 and ZmSULTR3;5 only showed an early response pattern in shoots. In addition, dynamic transcriptional changes determined via qPCR revealed differential expression profiles of these eight ZmSULTR genes in response to environmental stresses such as salt, drought, and heat stresses. Notably, all the genes, except for ZmSULTR3;3, were induced by drought and heat stresses. However, a few genes were induced by salt stress. Physiological determination showed that two important thiol-containing compounds, cysteine and glutathione, increased significantly under these abiotic stresses. The results suggest that members of the SULTR family might function in adaptations to sulfur deficiency stress and adverse growing environments. This study will lay a

  18. Structural and Functional Annotation of Hypothetical Proteins of O139

    Directory of Open Access Journals (Sweden)

    Md. Saiful Islam

    2015-06-01

    Full Text Available In developing countries threat of cholera is a significant health concern whenever water purification and sewage disposal systems are inadequate. Vibrio cholerae is one of the responsible bacteria involved in cholera disease. The complete genome sequence of V. cholerae deciphers the presence of various genes and hypothetical proteins whose function are not yet understood. Hence analyzing and annotating the structure and function of hypothetical proteins is important for understanding the V. cholerae. V. cholerae O139 is the most common and pathogenic bacterial strain among various V. cholerae strains. In this study sequence of six hypothetical proteins of V. cholerae O139 has been annotated from NCBI. Various computational tools and databases have been used to determine domain family, protein-protein interaction, solubility of protein, ligand binding sites etc. The three dimensional structure of two proteins were modeled and their ligand binding sites were identified. We have found domains and families of only one protein. The analysis revealed that these proteins might have antibiotic resistance activity, DNA breaking-rejoining activity, integrase enzyme activity, restriction endonuclease, etc. Structural prediction of these proteins and detection of binding sites from this study would indicate a potential target aiding docking studies for therapeutic designing against cholera.

  19. Genome-Wide Analysis of the Aquaporin Gene Family in Chickpea (Cicer arietinum L.).

    Science.gov (United States)

    Deokar, Amit A; Tar'an, Bunyamin

    2016-01-01

    Aquaporins (AQPs) are essential membrane proteins that play critical role in the transport of water and many other solutes across cell membranes. In this study, a comprehensive genome-wide analysis identified 40 AQP genes in chickpea ( Cicer arietinum L.). A complete overview of the chickpea AQP (CaAQP) gene family is presented, including their chromosomal locations, gene structure, phylogeny, gene duplication, conserved functional motifs, gene expression, and conserved promoter motifs. To understand AQP's evolution, a comparative analysis of chickpea AQPs with AQP orthologs from soybean, Medicago, common bean, and Arabidopsis was performed. The chickpea AQP genes were found on all of the chickpea chromosomes, except chromosome 7, with a maximum of six genes on chromosome 6, and a minimum of one gene on chromosome 5. Gene duplication analysis indicated that the expansion of chickpea AQP gene family might have been due to segmental and tandem duplications. CaAQPs were grouped into four subfamilies including 15 NOD26-like intrinsic proteins (NIPs), 13 tonoplast intrinsic proteins (TIPs), eight plasma membrane intrinsic proteins (PIPs), and four small basic intrinsic proteins (SIPs) based on sequence similarities and phylogenetic position. Gene structure analysis revealed a highly conserved exon-intron pattern within CaAQP subfamilies supporting the CaAQP family classification. Functional prediction based on conserved Ar/R selectivity filters, Froger's residues, and specificity-determining positions suggested wide differences in substrate specificity among the subfamilies of CaAQPs. Expression analysis of the AQP genes indicated that some of the genes are tissue-specific, whereas few other AQP genes showed differential expression in response to biotic and abiotic stresses. Promoter profiling of CaAQP genes for conserved cis -acting regulatory elements revealed enrichment of cis -elements involved in circadian control, light response, defense and stress responsiveness

  20. Genome-wide evolutionary characterization and expression analyses of WRKY family genes in Brachypodium distachyon.

    Science.gov (United States)

    Wen, Feng; Zhu, Hong; Li, Peng; Jiang, Min; Mao, Wenqing; Ong, Chermaine; Chu, Zhaoqing

    2014-06-01

    Members of plant WRKY gene family are ancient transcription factors that function in plant growth and development and respond to biotic and abiotic stresses. In our present study, we have investigated WRKY family genes in Brachypodium distachyon, a new model plant of family Poaceae. We identified a total of 86 WRKY genes from B. distachyon and explored their chromosomal distribution and evolution, domain alignment, promoter cis-elements, and expression profiles. Combining the analysis of phylogenetic tree of BdWRKY genes and the result of expression profiling, results showed that most of clustered gene pairs had higher similarities in the WRKY domain, suggesting that they might be functionally redundant. Neighbour-joining analysis of 301 WRKY domains from Oryza sativa, Arabidopsis thaliana, and B. distachyon suggested that BdWRKY domains are evolutionarily more closely related to O. sativa WRKY domains than those of A. thaliana. Moreover, tissue-specific expression profile of BdWRKY genes and their responses to phytohormones and several biotic or abiotic stresses were analysed by quantitative real-time PCR. The results showed that the expression of BdWRKY genes was rapidly regulated by stresses and phytohormones, and there was a strong correlation between promoter cis-elements and the phytohormones-induced BdWRKY gene expression. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  1. Targeted sequencing of established and candidate colorectal cancer genes in the Colon Cancer Family Registry Cohort.

    Science.gov (United States)

    Raskin, Leon; Guo, Yan; Du, Liping; Clendenning, Mark; Rosty, Christophe; Lindor, Noralane M; Gruber, Stephen B; Buchanan, Daniel D

    2017-11-07

    The underlying genetic cause of colorectal cancer (CRC) can be identified for 5-10% of all cases, while at least 20% of CRC cases are thought to be due to inherited genetic factors. Screening for highly penetrant mutations in genes associated with Mendelian cancer syndromes using next-generation sequencing (NGS) can be prohibitively expensive for studies requiring large samples sizes. The aim of the study was to identify rare single nucleotide variants and small indels in 40 established or candidate CRC susceptibility genes in 1,046 familial CRC cases (including both MSS and MSI-H tumor subtypes) and 1,006 unrelated controls from the Colon Cancer Family Registry Cohort using a robust and cost-effective DNA pooling NGS strategy. We identified 264 variants in 38 genes that were observed only in cases, comprising either very rare (minor allele frequency cancer susceptibility genes BAP1, CDH1, CHEK2, ENG, and MSH3 . For the candidate CRC genes, we identified likely pathogenic variants in the helicase domain of POLQ and in the LRIG1 , SH2B3 , and NOS1 genes and present their clinicopathological characteristics. Using a DNA pooling NGS strategy, we identified novel germline mutations in established CRC susceptibility genes in familial CRC cases. Further studies are required to support the role of POLQ , LRIG1 , SH2B3 and NOS1 as CRC susceptibility genes.

  2. Contributions to In Silico Genome Annotation

    KAUST Repository

    Kalkatawi, Manal M.

    2017-11-30

    , we focus on deriving a model capable of facilitating the functional annotation of prokaryotes. As far as we know, there is no fully automated system for detailed comparison of functional annotations generated by different methods. Hence, we developed BEACON, a method and supporting system that compares gene annotation from various methods to produce a more reliable and comprehensive annotation. Overall, our research contributed to different aspects of the genome annotation.

  3. Genome-wide analysis of the WRKY gene family in physic nut (Jatropha curcas L.).

    Science.gov (United States)

    Xiong, Wangdan; Xu, Xueqin; Zhang, Lin; Wu, Pingzhi; Chen, Yaping; Li, Meiru; Jiang, Huawu; Wu, Guojiang

    2013-07-25

    The WRKY proteins, which contain highly conserved WRKYGQK amino acid sequences and zinc-finger-like motifs, constitute a large family of transcription factors in plants. They participate in diverse physiological and developmental processes. WRKY genes have been identified and characterized in a number of plant species. We identified a total of 58 WRKY genes (JcWRKY) in the genome of the physic nut (Jatropha curcas L.). On the basis of their conserved WRKY domain sequences, all of the JcWRKY proteins could be assigned to one of the previously defined groups, I-III. Phylogenetic analysis of JcWRKY genes with Arabidopsis and rice WRKY genes, and separately with castor bean WRKY genes, revealed no evidence of recent gene duplication in JcWRKY gene family. Analysis of transcript abundance of JcWRKY gene products were tested in different tissues under normal growth condition. In addition, 47 WRKY genes responded to at least one abiotic stress (drought, salinity, phosphate starvation and nitrogen starvation) in individual tissues (leaf, root and/or shoot cortex). Our study provides a useful reference data set as the basis for cloning and functional analysis of physic nut WRKY genes. Copyright © 2013 Elsevier B.V. All rights reserved.

  4. Genome-wide investigation and transcriptome analysis of the WRKY gene family in Gossypium.

    Science.gov (United States)

    Ding, Mingquan; Chen, Jiadong; Jiang, Yurong; Lin, Lifeng; Cao, YueFen; Wang, Minhua; Zhang, Yuting; Rong, Junkang; Ye, Wuwei

    2015-02-01

    WRKY transcription factors play important roles in various stress responses in diverse plant species. In cotton, this family has not been well studied, especially in relation to fiber development. Here, the genomes and transcriptomes of Gossypium raimondii and Gossypium arboreum were investigated to identify fiber development related WRKY genes. This represents the first comprehensive comparative study of WRKY transcription factors in both diploid A and D cotton species. In total, 112 G. raimondii and 109 G. arboreum WRKY genes were identified. No significant gene structure or domain alterations were detected between the two species, but many SNPs distributed unequally in exon and intron regions. Physical mapping revealed that the WRKY genes in G. arboreum were not located in the corresponding chromosomes of G. raimondii, suggesting great chromosome rearrangement in the diploid cotton genomes. The cotton WRKY genes, especially subgroups I and II, have expanded through multiple whole genome duplications and tandem duplications compared with other plant species. Sequence comparison showed many functionally divergent sites between WRKY subgroups, while the genes within each group are under strong purifying selection. Transcriptome analysis suggested that many WRKY genes participate in specific fiber development processes such as fiber initiation, elongation and maturation with different expression patterns between species. Complex WRKY gene expression such as differential Dt and At allelic gene expression in G. hirsutum and alternative splicing events were also observed in both diploid and tetraploid cottons during fiber development process. In conclusion, this study provides important information on the evolution and function of WRKY gene family in cotton species.

  5. Gene mapping in an anophthalmic pedigree of a consanguineous Pakistani family opened new horizons for research

    Directory of Open Access Journals (Sweden)

    Saleha S

    2016-06-01

    Full Text Available Clinical anophthalmia is a rare inherited disease of the eye and phenotype refers to the absence of ocular tissue in the orbit of eye. Patients may have unilateral or bilateral anophthalmia, and generally have short palpebral fissures and small orbits. Anophthalmia may be isolated or associated with a broader syndrome and may have genetic or environmental causes. However, genetic cause has been defined in only a small proportion of cases, therefore, a consanguineous Pakistani family of the Pashtoon ethnic group, with isolated clinical anophthalmia was investigated using linkage mapping. A family pedigree was created to trace the possible mode of inheritance of the disease. Blood samples were collected from affected as well as normal members of this family, and screened for disease-associated mutations. This family was analyzed for linkage to all the known loci of clinical anophthalmia, using microsatellite short tandem repeat (STR markers. Direct sequencing was performed to find out disease-associated mutations in the candidate gene. This family with isolated clinical anophthalmia, was mapped to the SOX2 gene that is located at chromosome 3q26.3-q27. However, on exonic and regulatory regions mutation screening of the SOX2 gene, the disease-associated mutation was not identified. It showed that another gene responsible for development of the eye might be present at chromosome 3q26.3-q27 and needs to be identified and screened for the disease-associated mutation in this family.

  6. Gene mapping in an anophthalmic pedigree of a consanguineous Pakistani family opened new horizons for research

    Science.gov (United States)

    Ajmal, M; Zafar, S; Hameed, A

    2016-01-01

    ABSTRACT Clinical anophthalmia is a rare inherited disease of the eye and phenotype refers to the absence of ocular tissue in the orbit of eye. Patients may have unilateral or bilateral anophthalmia, and generally have short palpebral fissures and small orbits. Anophthalmia may be isolated or associated with a broader syndrome and may have genetic or environmental causes. However, genetic cause has been defined in only a small proportion of cases, therefore, a consanguineous Pakistani family of the Pashtoon ethnic group, with isolated clinical anophthalmia was investigated using linkage mapping. A family pedigree was created to trace the possible mode of inheritance of the disease. Blood samples were collected from affected as well as normal members of this family, and screened for disease-associated mutations. This family was analyzed for linkage to all the known loci of clinical anophthalmia, using microsatellite short tandem repeat (STR) markers. Direct sequencing was performed to find out disease-associated mutations in the candidate gene. This family with isolated clinical anophthalmia, was mapped to the SOX2 gene that is located at chromosome 3q26.3-q27. However, on exonic and regulatory regions mutation screening of the SOX2 gene, the disease-associated mutation was not identified. It showed that another gene responsible for development of the eye might be present at chromosome 3q26.3-q27 and needs to be identified and screened for the disease-associated mutation in this family. PMID:27785411

  7. Gene Environment Interactions and Predictors of Colorectal Cancer in Family-Based, Multi-Ethnic Groups.

    Science.gov (United States)

    Shiao, S Pamela K; Grayson, James; Yu, Chong Ho; Wasek, Brandi; Bottiglieri, Teodoro

    2018-02-16

    For the personalization of polygenic/omics-based health care, the purpose of this study was to examine the gene-environment interactions and predictors of colorectal cancer (CRC) by including five key genes in the one-carbon metabolism pathways. In this proof-of-concept study, we included a total of 54 families and 108 participants, 54 CRC cases and 54 matched family friends representing four major racial ethnic groups in southern California (White, Asian, Hispanics, and Black). We used three phases of data analytics, including exploratory, family-based analyses adjusting for the dependence within the family for sharing genetic heritage, the ensemble method, and generalized regression models for predictive modeling with a machine learning validation procedure to validate the results for enhanced prediction and reproducibility. The results revealed that despite the family members sharing genetic heritage, the CRC group had greater combined gene polymorphism rates than the family controls ( p relation to gene-environment interactions in the prevention of CRC.

  8. Aux/IAA Gene Family in Plants: Molecular Structure, Regulation, and Function

    Directory of Open Access Journals (Sweden)

    Jie Luo

    2018-01-01

    Full Text Available Auxin plays a crucial role in the diverse cellular and developmental responses of plants across their lifespan. Plants can quickly sense and respond to changes in auxin levels, and these responses involve several major classes of auxin-responsive genes, including the Auxin/Indole-3-Acetic Acid (Aux/IAA family, the auxin response factor (ARF family, small auxin upregulated RNA (SAUR, and the auxin-responsive Gretchen Hagen3 (GH3 family. Aux/IAA proteins are short-lived nuclear proteins comprising several highly conserved domains that are encoded by the auxin early response gene family. These proteins have specific domains that interact with ARFs and inhibit the transcription of genes activated by ARFs. Molecular studies have revealed that Aux/IAA family members can form diverse dimers with ARFs to regulate genes in various ways. Functional analyses of Aux/IAA family members have indicated that they have various roles in plant development, such as root development, shoot growth, and fruit ripening. In this review, recently discovered details regarding the molecular characteristics, regulation, and protein–protein interactions of the Aux/IAA proteins are discussed. These details provide new insights into the molecular basis of the Aux/IAA protein functions in plant developmental processes.

  9. Improving Microbial Genome Annotations in an Integrated Database Context

    Science.gov (United States)

    Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; Anderson, Iain; Mavromatis, Konstantinos; Kyrpides, Nikos C.; Ivanova, Natalia N.

    2013-01-01

    Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG) family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/. PMID:23424620

  10. Improving microbial genome annotations in an integrated database context.

    Directory of Open Access Journals (Sweden)

    I-Min A Chen

    Full Text Available Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/.

  11. Dichotomy in the NRT gene families of dicots and grass species.

    Directory of Open Access Journals (Sweden)

    Darren Plett

    Full Text Available A large proportion of the nitrate (NO(3(- acquired by plants from soil is actively transported via members of the NRT families of NO(3(- transporters. In Arabidopsis, the NRT1 family has eight functionally characterised members and predominantly comprises low-affinity transporters; the NRT2 family contains seven members which appear to be high-affinity transporters; and there are two NRT3 (NAR2 family members which are known to participate in high-affinity transport. A modified reciprocal best hit (RBH approach was used to identify putative orthologues of the Arabidopsis NRT genes in the four fully sequenced grass genomes (maize, rice, sorghum, Brachypodium. We also included the poplar genome in our analysis to establish whether differences between Arabidopsis and the grasses may be generally applicable to monocots and dicots. Our analysis reveals fundamental differences between Arabidopsis and the grass species in the gene number and family structure of all three families of NRT transporters. All grass species possessed additional NRT1.1 orthologues and appear to lack NRT1.6/NRT1.7 orthologues. There is significant separation in the NRT2 phylogenetic tree between NRT2 genes from dicots and grass species. This indicates that determination of function of NRT2 genes in grass species will not be possible in cereals based simply on sequence homology to functionally characterised Arabidopsis NRT2 genes and that proper functional analysis will be required. Arabidopsis has a unique NRT3.2 gene which may be a fusion of the NRT3.1 and NRT3.2 genes present in all other species examined here. This work provides a framework for future analysis of NO(3(- transporters and NO(3(- transport in grass crop species.

  12. The role of IL-4 gene 70 bp VNTR and ACE gene I/D variants in Familial Mediterranean fever.

    Science.gov (United States)

    Yigit, Serbülent; Tural, Sengul; Tekcan, Akın; Tasliyurt, Turker; Inanir, Ahmet; Uzunkaya, Süheyla; Kismali, Gorkem

    2014-05-01

    Familial Mediterranean fever (FMF) is characterized by recurrent attacks of fever and inflammation in the peritoneum, synovium, or pleura, accompanied by pain. It is an autosomal recessive disease caused by mutations in the MEFV (MEditerranean FeVer) gene. Patients with similar genotypes exhibit phenotypic diversity. As a result, the variations in different genes could be responsible for the clinical findings of this disease. In previous studies genes encoding Angiotensin-Converting Enzyme (ACE) and IL-4 (Interleukin-4) were found to be associated with rheumatologic and autoimmune diseases. In the present study we hypothesized whether ACE I/D or IL-4 70 bp variable tandem repeats (VNTR) genes are associated with FMF and its clinical findings in Turkish patients. Genomic DNA obtained from 670 persons (339 patients with FMF and 331 healthy controls) was used in the study. Genotypes for an ACE gene I/D polymorphism and IL-4 gene 70 bp VNTR were determined by polymerase chain reaction with specific primers. To our knowledge, this is the first study examining ACE gene I/D polymorphism and IL-4 gene 70 bp VNTR polymorphism in FMF patients. As a result, there was a statistically significant difference between the groups with respect to genotype distribution (pACE gene DD genotype was associated with an increased risk in FMF [pACE genotype frequencies according to the clinical characteristics, we found a statistically significant association between DD+ID genotype and fever (p=0.04). In addition IL-4 gene P1P1 genotype was associated with FMF (pACE gene and P1 allele or P1P1 genotype of IL-4 gene may be important molecular markers for susceptibility of FMF. Copyright © 2014 Elsevier Ltd. All rights reserved.

  13. [PAX3 gene mutation analysis for two Waardenburg syndrome type Ⅰ families and their prenatal diagnosis].

    Science.gov (United States)

    Bai, Y; Liu, N; Kong, X D; Yan, J; Qin, Z B; Wang, B

    2016-12-07

    Objective: To analyze the mutations of PAX3 gene in two Waardenburg syndrome type Ⅰ (WS1) pedigrees and make prenatal diagnosis for the high-risk 18-week-old fetus. Methods: PAX3 gene was first analyzed by Sanger sequencing and multiplex ligation-dependent probe amplification(MLPA) for detecting pathogenic mutation of the probands of the two pedigrees. The mutations were confirmed by MLPA and Sanger in parents and unrelated healthy individuals.Prenatal genetic diagnosis for the high-risk fetus was performed by amniotic fluid cell after genotyping. Results: A heterozygous PAX3 gene gross deletion (E7 deletion) was identified in all patients from WS1-01 family, and not found in 20 healthy individuals.Prenatal diagnosis in WS1-01 family indicated that the fetus was normal. Molecular studies identified a novel deletion mutation c. 1385_1386delCT within the PAX3 gene in all affected WS1-02 family members, but in none of the unaffected relatives and 200 healthy individuals. Conclusions: PAX3 gene mutation is etiological for two WS1 families. Sanger sequencing plus MLPA is effective and accurate for making gene diagnosis and prenatal diagnosis.

  14. Distribution of mutations in the PEX gene in families with X-linked hypophosphataemic rickets (HYP).

    Science.gov (United States)

    Rowe, P S; Oudet, C L; Francis, F; Sinding, C; Pannetier, S; Econs, M J; Strom, T M; Meitinger, T; Garabedian, M; David, A; Macher, M A; Questiaux, E; Popowska, E; Pronicka, E; Read, A P; Mokrzycki, A; Glorieux, F H; Drezner, M K; Hanauer, A; Lehrach, H; Goulding, J N; O'Riordan, J L

    1997-04-01

    Mutations in the PEX gene at Xp22.1 (phosphate-regulating gene with homologies to endopeptidases, on the X-chromosome), are responsible for X-linked hypophosphataemic rickets (HYP). Homology of PEX to the M13 family of Zn2+ metallopeptidases which include neprilysin (NEP) as prototype, has raised important questions regarding PEX function at the molecular level. The aim of this study was to analyse 99 HYP families for PEX gene mutations, and to correlate predicted changes in the protein structure with Zn2+ metallopeptidase gene function. Primers flanking 22 characterised exons were used to amplify DNA by PCR, and SSCP was then used to screen for mutations. Deletions, insertions, nonsense mutations, stop codons and splice mutations occurred in 83% of families screened for in all 22 exons, and 51% of a separate set of families screened in 17 PEX gene exons. Missense mutations in four regions of the gene were informative regarding function, with one mutation in the Zn2+-binding site predicted to alter substrate enzyme interaction and catalysis. Computer analysis of the remaining mutations predicted changes in secondary structure, N-glycosylation, protein phosphorylation and catalytic site molecular structure. The wide range of mutations that align with regions required for protease activity in NEP suggests that PEX also functions as a protease, and may act by processing factor(s) involved in bone mineral metabolism.

  15. Chicken genome analysis reveals novel genes encoding biotin-binding proteins related to avidin family

    Directory of Open Access Journals (Sweden)

    Nordlund Henri R

    2005-03-01

    Full Text Available Abstract Background A chicken egg contains several biotin-binding proteins (BBPs, whose complete DNA and amino acid sequences are not known. In order to identify and characterise these genes and proteins we studied chicken cDNAs and genes available in the NCBI database and chicken genome database using the reported N-terminal amino acid sequences of chicken egg-yolk BBPs as search strings. Results Two separate hits showing significant homology for these N-terminal sequences were discovered. For one of these hits, the chromosomal location in the immediate proximity of the avidin gene family was found. Both of these hits encode proteins having high sequence similarity with avidin suggesting that chicken BBPs are paralogous to avidin family. In particular, almost all residues corresponding to biotin binding in avidin are conserved in these putative BBP proteins. One of the found DNA sequences, however, seems to encode a carboxy-terminal extension not present in avidin. Conclusion We describe here the predicted properties of the putative BBP genes and proteins. Our present observations link BBP genes together with avidin gene family and shed more light on the genetic arrangement and variability of this family. In addition, comparative modelling revealed the potential structural elements important for the functional and structural properties of the putative BBP proteins.

  16. Two Paralogous Families of a Two-Gene Subtilisin Operon Are Widely Distributed in Oral Treponemes

    Science.gov (United States)

    Correia, Frederick F.; Plummer, Alvin R.; Ellen, Richard P.; Wyss, Chris; Boches, Susan K.; Galvin, Jamie L.; Paster, Bruce J.; Dewhirst, Floyd E.

    2003-01-01

    Certain oral treponemes express a highly proteolytic phenotype and have been associated with periodontal diseases. The periodontal pathogen Treponema denticola produces dentilisin, a serine protease of the subtilisin family. The two-gene operon prcA-prtP is required for expression of active dentilisin (PrtP), a putative lipoprotein attached to the treponeme's outer membrane or sheath. The purpose of this study was to examine the diversity and structure of treponemal subtilisin-like proteases in order to better understand their distribution and function. The complete sequences of five prcA-prtP operons were determined for Treponema lecithinolyticum, “Treponema vincentii,” and two canine species. Partial operon sequences were obtained for T. socranskii subsp. 04 as well as 450- to 1,000-base fragments of prtP genes from four additional treponeme strains. Phylogenetic analysis demonstrated that the sequences fall into two paralogous families. The first family includes the sequence from T. denticola. Treponemes possessing this operon family express chymotrypsin-like protease activity and can cleave the substrate N-succinyl-alanyl-alanyl-prolyl-phenylalanine-p-nitroanilide (SAAPFNA). Treponemes possessing the second paralog family do not possess chymotrypsin-like activity or cleave SAAPFNA. Despite examination of a range of protein and peptide substrates, the specificity of the second protease family remains unknown. Each of the fully sequenced prcA and prtP genes contains a 5′ hydrophobic leader sequence with a treponeme lipobox. The two paralogous families of treponeme subtilisins represent a new subgroup within the subtilisin family of proteases and are the only subtilisin lipoprotein family. The present study demonstrated that the subtilisin paralogs comprising a two-gene operon are widely distributed among treponemes. PMID:14617650

  17. MEETING: Chlamydomonas Annotation Jamboree - October 2003

    Energy Technology Data Exchange (ETDEWEB)

    Grossman, Arthur R

    2007-04-13

    Shotgun sequencing of the nuclear genome of Chlamydomonas reinhardtii (Chlamydomonas throughout) was performed at an approximate 10X coverage by JGI. Roughly half of the genome is now contained on 26 scaffolds, all of which are at least 1.6 Mb, and the coverage of the genome is ~95%. There are now over 200,000 cDNA sequence reads that we have generated as part of the Chlamydomonas genome project (Grossman, 2003; Shrager et al., 2003; Grossman et al. 2007; Merchant et al., 2007); other sequences have also been generated by the Kasuza sequence group (Asamizu et al., 1999; Asamizu et al., 2000) or individual laboratories that have focused on specific genes. Shrager et al. (2003) placed the reads into distinct contigs (an assemblage of reads with overlapping nucleotide sequences), and contigs that group together as part of the same genes have been designated ACEs (assembly of contigs generated from EST information). All of the reads have also been mapped to the Chlamydomonas nuclear genome and the cDNAs and their corresponding genomic sequences have been reassembled, and the resulting assemblage is called an ACEG (an Assembly of contiguous EST sequences supported by genomic sequence) (Jain et al., 2007). Most of the unique genes or ACEGs are also represented by gene models that have been generated by the Joint Genome Institute (JGI, Walnut Creek, CA). These gene models have been placed onto the DNA scaffolds and are presented as a track on the Chlamydomonas genome browser associated with the genome portal (http://genome.jgi-psf.org/Chlre3/Chlre3.home.html). Ultimately, the meeting grant awarded by DOE has helped enormously in the development of an annotation pipeline (a set of guidelines used in the annotation of genes) and resulted in high quality annotation of over 4,000 genes; the annotators were from both Europe and the USA. Some of the people who led the annotation initiative were Arthur Grossman, Olivier Vallon, and Sabeeha Merchant (with many individual

  18. The Mycobacterium leprae antigen 85 complex gene family: identification of the genes for the 85A, 85C, and related MPT51 proteins

    NARCIS (Netherlands)

    Rinke de Wit, T. F.; Bekelie, S.; Osland, A.; Wieles, B.; Janson, A. A.; Thole, J. E.

    1993-01-01

    The genes for two novel members (designated 85A and 85C) of the Mycobacterium leprae antigen 85 complex family of proteins and the gene for the closely related M. leprae MPT51 protein were isolated. The complete DNA sequence of the M. leprae 85C gene and partial sequences of the 85A and MPT51 genes

  19. Whole-Exome Sequencing Identifies One De Novo Variant in the FGD6 Gene in a Thai Family with Autism Spectrum Disorder

    Directory of Open Access Journals (Sweden)

    Chuphong Thongnak

    2018-01-01

    Full Text Available Autism spectrum disorder (ASD has a strong genetic basis, although the genetics of autism is complex and it is unclear. Genetic testing such as microarray or sequencing was widely used to identify autism markers, but they are unsuccessful in several cases. The objective of this study is to identify causative variants of autism in two Thai families by using whole-exome sequencing technique. Whole-exome sequencing was performed with autism-affected children from two unrelated families. Each sample was sequenced on SOLiD 5500xl Genetic Analyzer system followed by combined bioinformatics pipeline including annotation and filtering process to identify candidate variants. Candidate variants were validated, and the segregation study with other family members was performed using Sanger sequencing. This study identified a possible causative variant for ASD, c.2951G>A, in the FGD6 gene. We demonstrated the potential for ASD genetic variants associated with ASD using whole-exome sequencing and a bioinformatics filtering procedure. These techniques could be useful in identifying possible causative ASD variants, especially in cases in which variants cannot be identified by other techniques.

  20. Analysis of factor VIII gene inversions in 164 unrelated hemophilia A families

    Energy Technology Data Exchange (ETDEWEB)

    Vnencak-Jones, L.; Phillips, J.A. III; Janco, R.L. [Vanderbilt Univ. School of Medicine, Nashville, TN (United States)] [and others

    1994-09-01

    Hemophilia A is an X-linked recessive disease with variable phenotype and both heterogeneous and wide spread mutations in the factor VIII (F8) gene. As a result, diagnostic carrier or prenatal testing often relies upon laborious DNA linkage analysis. Recently, inversion mutations resulting from an intrachromosomal recombination between DNA sequences in one of two A genes {approximately}500 kb upstream from the F8 gene and a homologous A gene in intron 22 of the F8 gene were identified and found in 45% of severe hemophiliacs. We have analyzed banked DNA collected since 1986 from affected males or obligate carrier females representing 164 unrelated hemophilia A families. The disease was sporadic in 37%, familial in 54% and in 10% of families incomplete information was given. A unique deletion was identified in 1/164, a normal pattern was observed in 110/164 (67%), and 53/164 (32%) families had inversion mutations with 43/53 (81%) involving the distal A gene (R3 pattern) and 10/53 (19%) involving the proximal A gene (R2 pattern). While 19% of all rearrangements were R2, in 35 families with severe disease (< 1% VIII:C activity) all 16 rearrangements seen were R3. In 18 families with the R3 pattern and known activities, 16 (89%) had levels < 1%, with the remaining 2 families having {le} 2.4% activity. Further, 18 referrals specifically noted the production of inhibitors and 8/18 (45%) had the R3 pattern. Our findings demonstrate that the R3 inversion mutation patterns is (1) only seen with VIII:C activity levels of {le} 2.4%, (2) seen in 46% of families with severe hemophilia, (3) seen in 45% of hemophiliacs known to have inhibitors, (4) not correlated with sporadic or familial disease and (5) not in disequilibrium with the Bcl I or Taq I intron 18 or ST14 polymorphisms. Finally, in families positive for an inversion mutation, direct testing offers a highly accurate and less expensive alternative to DNA linkage analysis.

  1. Mutation analysis of pre-mRNA splicing genes in Chinese families with retinitis pigmentosa

    Science.gov (United States)

    Pan, Xinyuan; Chen, Xue; Liu, Xiaoxing; Gao, Xiang; Kang, Xiaoli; Xu, Qihua; Chen, Xuejuan; Zhao, Kanxing; Zhang, Xiumei; Chu, Qiaomei; Wang, Xiuying

    2014-01-01

    Purpose Seven genes involved in precursor mRNA (pre-mRNA) splicing have been implicated in autosomal dominant retinitis pigmentosa (adRP). We sought to detect mutations in all seven genes in Chinese families with RP, to characterize the relevant phenotypes, and to evaluate the prevalence of mutations in splicing genes in patients with adRP. Methods Six unrelated families from our adRP cohort (42 families) and two additional families with RP with uncertain inheritance mode were clinically characterized in the present study. Targeted sequence capture with next-generation massively parallel sequencing (NGS) was performed to screen mutations in 189 genes including all seven pre-mRNA splicing genes associated with adRP. Variants detected with NGS were filtered with bioinformatics analyses, validated with Sanger sequencing, and prioritized with pathogenicity analysis. Results Mutations in pre-mRNA splicing genes were identified in three individual families including one novel frameshift mutation in PRPF31 (p.Leu366fs*1) and two known mutations in SNRNP200 (p.Arg681His and p.Ser1087Leu). The patients carrying SNRNP200 p.R681H showed rapid disease progression, and the family carrying p.S1087L presented earlier onset ages and more severe phenotypes compared to another previously reported family with p.S1087L. In five other families, we identified mutations in other RP-related genes, including RP1 p. Ser781* (novel), RP2 p.Gln65* (novel) and p.Ile137del (novel), IMPDH1 p.Asp311Asn (recurrent), and RHO p.Pro347Leu (recurrent). Conclusions Mutations in splicing genes identified in the present and our previous study account for 9.5% in our adRP cohort, indicating the important role of pre-mRNA splicing deficiency in the etiology of adRP. Mutations in the same splicing gene, or even the same mutation, could correlate with different phenotypic severities, complicating the genotype–phenotype correlation and clinical prognosis. PMID:24940031

  2. The polyphenol oxidase gene family in land plants: Lineage-specific duplication and expansion

    Directory of Open Access Journals (Sweden)

    Tran Lan T

    2012-08-01

    Full Text Available Abstract Background Plant polyphenol oxidases (PPOs are enzymes that typically use molecular oxygen to oxidize ortho-diphenols to ortho-quinones. These commonly cause browning reactions following tissue damage, and may be important in plant defense. Some PPOs function as hydroxylases or in cross-linking reactions, but in most plants their physiological roles are not known. To better understand the importance of PPOs in the plant kingdom, we surveyed PPO gene families in 25 sequenced genomes from chlorophytes, bryophytes, lycophytes, and flowering plants. The PPO genes were then analyzed in silico for gene structure, phylogenetic relationships, and targeting signals. Results Many previously uncharacterized PPO genes were uncovered. The moss, Physcomitrella patens, contained 13 PPO genes and Selaginella moellendorffii (spike moss and Glycine max (soybean each had 11 genes. Populus trichocarpa (poplar contained a highly diversified gene family with 11 PPO genes, but several flowering plants had only a single PPO gene. By contrast, no PPO-like sequences were identified in several chlorophyte (green algae genomes or Arabidopsis (A. lyrata and A. thaliana. We found that many PPOs contained one or two introns often near the 3’ terminus. Furthermore, N-terminal amino acid sequence analysis using ChloroP and TargetP 1.1 predicted that several putative PPOs are synthesized via the secretory pathway, a unique finding as most PPOs are predicted to be chloroplast proteins. Phylogenetic reconstruction of these sequences revealed that large PPO gene repertoires in some species are mostly a consequence of independent bursts of gene duplication, while the lineage leading to Arabidopsis must have lost all PPO genes. Conclusion Our survey identified PPOs in gene families of varying sizes in all land plants except in the genus Arabidopsis. While we found variation in intron numbers and positions, overall PPO gene structure is congruent with the phylogenetic

  3. Genome-wide identification of the SWEET gene family in wheat.

    Science.gov (United States)

    Gao, Yue; Wang, Zi Yuan; Kumar, Vikranth; Xu, Xiao Feng; Yuan, De Peng; Zhu, Xiao Feng; Li, Tian Ya; Jia, Baolei; Xuan, Yuan Hu

    2018-02-05

    The SWEET (sugars will eventually be exported transporter) family is a newly characterized group of sugar transporters. In plants, the key roles of SWEETs in phloem transport, nectar secretion, pollen nutrition, stress tolerance, and plant-pathogen interactions have been identified. SWEET family genes have been characterized in many plant species, but a comprehensive analysis of SWEET members has not yet been performed in wheat. Here, 59 wheat SWEETs (hereafter TaSWEETs) were identified through homology searches. Analyses of phylogenetic relationships, numbers of transmembrane helices (TMHs), gene structures, and motifs showed that TaSWEETs carrying 3-7 TMHs could be classified into four clades with 10 different types of motifs. Examination of the expression patterns of 18 SWEET genes revealed that a few are tissue-specific while most are ubiquitously expressed. In addition, the stem rust-mediated expression patterns of SWEET genes were monitored using a stem rust-susceptible cultivar, 'Little Club' (LC). The resulting data showed that the expression of five out of the 18 SWEETs tested was induced following inoculation. In conclusion, we provide the first comprehensive analysis of the wheat SWEET gene family. Information regarding the phylogenetic relationships, gene structures, and expression profiles of SWEET genes in different tissues and following stem rust disease inoculation will be useful in identifying the potential roles of SWEETs in specific developmental and pathogenic processes. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. Evolutionary Pattern and Regulation Analysis to Support Why Diversity Functions Existed within PPAR Gene Family Members

    Directory of Open Access Journals (Sweden)

    Tianyu Zhou

    2015-01-01

    Full Text Available Peroxisome proliferators-activated receptor (PPAR gene family members exhibit distinct patterns of distribution in tissues and differ in functions. The purpose of this study is to investigate the evolutionary impacts on diversity functions of PPAR members and the regulatory differences on gene expression patterns. 63 homology sequences of PPAR genes from 31 species were collected and analyzed. The results showed that three isolated types of PPAR gene family may emerge from twice times of gene duplication events. The conserved domains of HOLI (ligand binding domain of hormone receptors domain and ZnF_C4 (C4 zinc finger in nuclear in hormone receptors are essential for keeping basic roles of PPAR gene family, and the variant domains of LCRs may be responsible for their divergence in functions. The positive selection sites in HOLI domain are benefit for PPARs to evolve towards diversity functions. The evolutionary variants in the promoter regions and 3′ UTR regions of PPARs result into differential transcription factors and miRNAs involved in regulating PPAR members, which may eventually affect their expressions and tissues distributions. These results indicate that gene duplication event, selection pressure on HOLI domain, and the variants on promoter and 3′ UTR are essential for PPARs evolution and diversity functions acquired.

  5. Evolutionary Pattern and Regulation Analysis to Support Why Diversity Functions Existed within PPAR Gene Family Members.

    Science.gov (United States)

    Zhou, Tianyu; Yan, Xiping; Wang, Guosong; Liu, Hehe; Gan, Xiang; Zhang, Tao; Wang, Jiwen; Li, Liang

    2015-01-01

    Peroxisome proliferators-activated receptor (PPAR) gene family members exhibit distinct patterns of distribution in tissues and differ in functions. The purpose of this study is to investigate the evolutionary impacts on diversity functions of PPAR members and the regulatory differences on gene expression patterns. 63 homology sequences of PPAR genes from 31 species were collected and analyzed. The results showed that three isolated types of PPAR gene family may emerge from twice times of gene duplication events. The conserved domains of HOLI (ligand binding domain of hormone receptors) domain and ZnF_C4 (C4 zinc finger in nuclear in hormone receptors) are essential for keeping basic roles of PPAR gene family, and the variant domains of LCRs may be responsible for their divergence in functions. The positive selection sites in HOLI domain are benefit for PPARs to evolve towards diversity functions. The evolutionary variants in the promoter regions and 3' UTR regions of PPARs result into differential transcription factors and miRNAs involved in regulating PPAR members, which may eventually affect their expressions and tissues distributions. These results indicate that gene duplication event, selection pressure on HOLI domain, and the variants on promoter and 3' UTR are essential for PPARs evolution and diversity functions acquired.

  6. Evolutionary mechanisms driving the evolution of a large polydnavirus gene family coding for protein tyrosine phosphatases

    Directory of Open Access Journals (Sweden)

    Serbielle Céline

    2012-12-01

    Full Text Available Abstract Background Gene duplications have been proposed to be the main mechanism involved in genome evolution and in acquisition of new functions. Polydnaviruses (PDVs, symbiotic viruses associated with parasitoid wasps, are ideal model systems to study mechanisms of gene duplications given that PDV genomes consist of virulence genes organized into multigene families. In these systems the viral genome is integrated in a wasp chromosome as a provirus and virus particles containing circular double-stranded DNA are injected into the parasitoids’ hosts and are essential for parasitism success. The viral virulence factors, organized in gene families, are required collectively to induce host immune suppression and developmental arrest. The gene family which encodes protein tyrosine phosphatases (PTPs has undergone spectacular expansion in several PDV genomes with up to 42 genes. Results Here, we present strong indications that PTP gene family expansion occurred via classical mechanisms: by duplication of large segments of the chromosomally integrated form of the virus sequences (segmental duplication, by tandem duplications within this form and by dispersed duplications. We also propose a novel duplication mechanism specific to PDVs that involves viral circle reintegration into the wasp genome. The PTP copies produced were shown to undergo conservative evolution along with episodes of adaptive evolution. In particular recently produced copies have undergone positive selection in sites most likely involved in defining substrate selectivity. Conclusion The results provide evidence about the dynamic nature of polydnavirus proviral genomes. Classical and PDV-specific duplication mechanisms have been involved in the production of new gene copies. Selection pressures associated with antagonistic interactions with parasitized hosts have shaped these genes used to manipulate lepidopteran physiology with evidence for positive selection involved in

  7. Identification and characterization of NF-YB family genes in tung tree.

    Science.gov (United States)

    Yang, Susu; Wang, Yangdong; Yin, Hengfu; Guo, Haobo; Gao, Ming; Zhu, Huiping; Chen, Yicun

    2015-12-01

    The NF-YB transcription factor gene family encodes a subunit of the CCAAT box-binding factor (CBF), a highly conserved trimeric activator that strongly binds to the CCAAT box promoter element. Studies on model plants have shown that NF-YB proteins participate in important developmental and physiological processes, but little is known about NF-YB proteins in trees. Here, we identified seven NF-YB transcription factor-encoding genes in Vernicia fordii, an important oilseed tree in China. A phylogenetic analysis separated the genes into two groups; non-LEC1 type (VfNF-YB1, 5, 7, 9, 11, 13) and LEC1-type (VfNF-YB 14). A gene structure analysis showed that VfNF-YB 5 has three introns and the other genes have no introns. The seven VfNF-YB sequences contain highly conserved domains, a disordered region at the N terminus, and two long helix structures at the C terminus. Phylogenetic analyses showed that VfNF-YB family genes are highly homologous to GmNF-YB genes, and many of them are closely related to functionally characterized NF-YBs. In expression analyses of various tissues (root, stem, leaf, and kernel) and the root during pathogen infection, VfNF-YB1, 5, and 11 were dominantly expressed in kernels, and VfNF-YB7 and 9 were expressed only in the root. Different VfNF-YB family genes showed different responses to pathogen infection, suggesting that they play different roles in the pathogen response. Together, these findings represent the first extensive evaluation of the NF-YB family in tung tree and provide a foundation for dissecting the functions of VfNF-YB genes in seed development, stress adaption, fatty acid synthesis, and pathogen response.

  8. FIGENIX: Intelligent automation of genomic annotation: expertise integration in a new software platform

    Directory of Open Access Journals (Sweden)

    Pontarotti Pierre

    2005-08-01

    Full Text Available Abstract Background Two of the main objectives of the genomic and post-genomic era are to structurally and functionally annotate genomes which consists of detecting genes' position and structure, and inferring their function (as well as of other features of genomes. Structural and functional annotation both require the complex chaining of numerous different software, algorithms and methods under the supervision of a biologist. The automation of these pipelines is necessary to manage huge amounts of data released by sequencing projects. Several pipelines already automate some of these complex chaining but still necessitate an important contribution of biologists for supervising and controlling the results at various steps. Results Here we propose an innovative automated platform, FIGENIX, which includes an expert system capable to substitute to human expertise at several key steps. FIGENIX currently automates complex pipelines of structural and functional annotation under the supervision of the expert system (which allows for example to make key decisions, check intermediate results or refine the dataset. The quality of the results produced by FIGENIX is comparable to those obtained by expert biologists with a drastic gain in terms of time costs and avoidance of errors due to the human manipulation of data. Conclusion The core engine and expert system of the FIGENIX platform currently handle complex annotation processes of broad interest for the genomic community. They could be easily adapted to new, or more specialized pipelines, such as for example the annotation of miRNAs, the classification of complex multigenic families, annotation of regulatory elements and other genomic features of interest.

  9. AtTZF gene family localizes to cytoplasmic foci

    OpenAIRE

    Pomeranz, Marcelo; Lin, Pei-Chi; Finer, John; Jang, Jyan-Chyun

    2010-01-01

    In eukaryotes, mRNA turnover and translational repression represent important regulatory steps in gene expression. Curiously, when under cellular stresses, factors involved in these processes aggregate into cytoplasmic foci known as Processing bodies (P-bodies) and Stress Granules (SGs). In animals, CCCH Tandem Zinc Finger (TZF) proteins play important roles in mRNA decay within P-bodies. TTP, a P-body localized mammalian TZF, can bind to the 3'UTRs of mRNAs containing AU-rich elements (AREs)...

  10. A mutation in the Norrie disease gene (NDP) associated with X-linked familial exudative vitreoretinopathy.

    Science.gov (United States)

    Chen, Z Y; Battinelli, E M; Fielder, A; Bundey, S; Sims, K; Breakefield, X O; Craig, I W

    1993-10-01

    Familial exudative vitreoretinopathy (FEVR) is a hereditary disorder characterized by an abnormality of the peripheral retina. Both autosomal dominant (adFEVR) and X-linked (XLFEVR) forms have been described, but the biochemical defect(s) underlying the symptoms are unknown. Molecular analysis of the Norrie gene locus (NDP) in a four generation FEVR family (shown previously to exhibit linkage to the X-chromosome markers DXS228 and MAOA (Xp11.4-p11.3)) reveals a missense mutation in the highly conserved region of the NDP gene, which caused a neutral amino acid substitution (Leu124Phe), was detected in all of the affected males, but not in the unaffected family members, nor in normal controls. The observations suggest that phenotypes of both XLFEVR and Norrie disease can result from mutations in the same gene.

  11. [Mutation analysis of FGFR3 gene in a family featuring hereditary dwarfism].

    Science.gov (United States)

    Zhang, Qiong; Jiang, Hai-ou; Quan, Qing-li; Li, Jun; He, Ting; Huang, Xue-shuang

    2011-12-01

    To investigate the clinical symptoms and potential mutation in FGFR3 gene for a family featuring hereditary dwarfism in order to attain diagnosis and provide prenatal diagnosis. Five patients and two unaffected relatives from the family, in addition with 100 healthy controls, were recruited. Genome DNA was extracted. Exons 10 and 13 of the FGFR3 gene were amplified using polymerase chain reaction (PCR). PCR products were sequenced in both directions. All patients had similar features including short stature, short limbs, lumbar hyperlordosis but normal craniofacial features. A heterozygous mutation G1620T (N540K) was identified in the cDNA from all patients but not in the unaffected relatives and 100 control subjects. A heterozygous G380R mutation was excluded. The hereditary dwarfism featured by this family has been caused by hypochondroplasia (HCH) due to a N540K mutation in the FGFR3 gene.

  12. Phylogenetic analysis of the MS4A and TMEM176 gene families.

    Directory of Open Access Journals (Sweden)

    Jonathan Zuccolo

    2010-02-01

    Full Text Available The MS4A gene family in humans includes CD20 (MS4A1, FcRbeta (MS4A2, Htm4 (MS4A3, and at least 13 other syntenic genes encoding membrane proteins, most having characteristic tetraspanning topology. Expression of MS4A genes is variable in tissues throughout the body; however, several are limited to cells in the hematopoietic system where they have known roles in immune cell functions. Genes in the small TMEM176 group share significant sequence similarity with MS4A genes and there is evidence of immune function of at least one of the encoded proteins. In this study, we examined the evolutionary history of the MS4A/TMEM176 families as well as tissue expression of the phylogenetically earliest members, in order to investigate their possible origins in immune cells.Orthologs of human MS4A genes were found only in mammals; however, MS4A gene homologs were found in most jawed vertebrates. TMEM176 genes were found only in mammals and bony fish. Several unusual MS4A genes having 2 or more tandem MS4A sequences were identified in the chicken (Gallus gallus and early mammals (opossum, Monodelphis domestica and platypus, Ornithorhyncus anatinus. A large number of highly conserved MS4A and TMEM176 genes was found in zebrafish (Danio rerio. The most primitive organism identified to have MS4A genes was spiny dogfish (Squalus acanthus. Tissue expression of MS4A genes in S. acanthias and D. rerio showed no evidence of expression restricted to the hematopoietic system.Our findings suggest that MS4A genes first appeared in cartilaginous fish with expression outside of the immune system, and have since diversified in many species into their modern forms with expression and function in both immune and nonimmune cells.

  13. Phylogenetic Analysis of the MS4A and TMEM176 Gene Families

    Science.gov (United States)

    Zuccolo, Jonathan; Bau, Jeremy; Childs, Sarah J.; Goss, Greg G.; Sensen, Christoph W.; Deans, Julie P.

    2010-01-01

    Background The MS4A gene family in humans includes CD20 (MS4A1), FcRβ (MS4A2), Htm4 (MS4A3), and at least 13 other syntenic genes encoding membrane proteins, most having characteristic tetraspanning topology. Expression of MS4A genes is variable in tissues throughout the body; however, several are limited to cells in the hematopoietic system where they have known roles in immune cell functions. Genes in the small TMEM176 group share significant sequence similarity with MS4A genes and there is evidence of immune function of at least one of the encoded proteins. In this study, we examined the evolutionary history of the MS4A/TMEM176 families as well as tissue expression of the phylogenetically earliest members, in order to investigate their possible origins in immune cells. Principal Findings Orthologs of human MS4A genes were found only in mammals; however, MS4A gene homologs were found in most jawed vertebrates. TMEM176 genes were found only in mammals and bony fish. Several unusual MS4A genes having 2 or more tandem MS4A sequences were identified in the chicken (Gallus gallus) and early mammals (opossum, Monodelphis domestica and platypus, Ornithorhyncus anatinus). A large number of highly conserved MS4A and TMEM176 genes was found in zebrafish (Danio rerio). The most primitive organism identified to have MS4A genes was spiny dogfish (Squalus acanthus). Tissue expression of MS4A genes in S. acanthias and D. rerio showed no evidence of expression restricted to the hematopoietic system. Conclusions/Significance Our findings suggest that MS4A genes first appeared in cartilaginous fish with expression outside of the immune system, and have since diversified in many species into their modern forms with expression and function in both immune and nonimmune cells. PMID:20186339

  14. Characterization of the bovine pregnancy-associated glycoprotein gene family – analysis of gene sequences, regulatory regions within the promoter and expression of selected genes

    Directory of Open Access Journals (Sweden)

    Walker Angela M

    2009-04-01

    Full Text Available Abstract Background The Pregnancy-associated glycoproteins (PAGs belong to a large family of aspartic peptidases expressed exclusively in the placenta of species in the Artiodactyla order. In cattle, the PAG gene family is comprised of at least 22 transcribed genes, as well as some variants. Phylogenetic analyses have shown that the PAG family segregates into 'ancient' and 'modern' groupings. Along with sequence differences between family members, there are clear distinctions in their spatio-temporal distribution and in their relative level of expression. In this report, 1 we performed an in silico analysis of the bovine genome to further characterize the PAG gene family, 2 we scrutinized proximal promoter sequences of the PAG genes to evaluate the evolution pressures operating on them and to identify putative regulatory regions, 3 we determined relative transcript abundance of selected PAGs during pregnancy and, 4 we performed preliminary characterization of the putative regulatory elements for one of the candidate PAGs, bovine (bo PAG-2. Results From our analysis of the bovine genome, we identified 18 distinct PAG genes and 14 pseudogenes. We observed that the first 500 base pairs upstream of the translational start site contained multiple regions that are conserved among all boPAGs. However, a preponderance of conserved regions, that harbor recognition sites for putative transcriptional factors (TFs, were found to be unique to the modern boPAG grouping, but not the ancient boPAGs. We gathered evidence by means of Q-PCR and screening of EST databases to show that boPAG-2 is the most abundant of all boPAG transcripts. Finally, we provided preliminary evidence for the role of ETS- and DDVL-related TFs in the regulation of the boPAG-2 gene. Conclusion PAGs represent a relatively large gene family in the bovine genome. The proximal promoter regions of these genes display differences in putative TF binding sites, likely contributing to observed

  15. Diagnosing CADASIL using MRI: evidence from families with known mutations of Notch 3 gene

    International Nuclear Information System (INIS)

    Chawda, S.J.; Lange, R.P.J. de; St-Clair, D.; Hourihan, M.D.; Halpin, S.F.S.

    2000-01-01

    Clinical data and MRI findings are presented on 18 subjects from two families with neuropathologically confirmed CADASIL. DNA analysis revealed mutations in exon 4 of Notch 3 gene in both families. All family members with mutations in Notch 3 gene had extensive abnormalities on MRI, principally lesions in the white matter of the frontal lobes and in the external capsules. Of several family members in whom a diagnosis of CADASIL was suspected on the basis of minor symptoms, one had MRI changes consistent with CADASIL; none of these cases carried a mutation in the Notch 3 gene. MRI and clinical features that may alert the radiologist to the diagnosis of CADASIL are reviewed. However, a wide differential diagnosis exists for the MRI appearances of CADASIL, including multiple sclerosis and small-vessel disease secondary to hypertension. The definitive diagnosis cannot be made on MRI alone and requires additional evidence, where available, from a positive family history and by screening DNA for mutations of Notch 3 gene. (orig.)

  16. New mutations in the NHS gene in Nance-Horan Syndrome families from the Netherlands.

    Science.gov (United States)

    Florijn, Ralph J; Loves, Willem; Maillette de Buy Wenniger-Prick, Liesbeth J J M; Mannens, Marcel M A M; Tijmes, Nel; Brooks, Simon P; Hardcastle, Alison J; Bergen, Arthur A B

    2006-09-01

    Mutations in the NHS gene cause Nance-Horan Syndrome (NHS), a rare X-chromosomal recessive disorder with variable features, including congenital cataract, microphthalmia, a peculiar form of the ear and dental anomalies. We investigated the NHS gene in four additional families with NHS from the Netherlands, by dHPLC and direct sequencing. We identified an unique mutation in each family. Three out of these four mutations were not reported before. We report here the first splice site sequence alteration mutation and three protein truncating mutations. Our results suggest that X-linked cataract and NHS are allelic disorders.

  17. Evaluation of the norrie disease gene in a family with incontinentia pigmenti.

    Science.gov (United States)

    Shastry, B S; Trese, M T

    2000-01-01

    Incontinentia pigmenti (IP) is an ectodermal multisystem disorder which can affect dental, ocular, cardiac and neurologic structures. The ocular changes of IP can have a very similar appearance to the retinal detachment of X-linked familial exudative vitreoretinopathy, which has been shown to be caused by the mutations in the Norrie disease gene. Therefore, it is of interest to determine whether similar mutations in the gene can account for the retinal pathology in patients with IP. To test our hypothesis, we have analyzed the entire Norrie disease gene for a family with IP, by single strand conformational polymorphism followed by DNA sequencing. The sequencing data revealed no disease-specific sequence alterations. These data suggest that ocular findings of IP are perhaps associated with different genes and there is no direct relationship between the genotype and phenotype. Copyright 2000 S. Karger AG, Basel

  18. Evolutionary relationship and structural characterization of the EPF/EPFL gene family.

    Directory of Open Access Journals (Sweden)

    Naoki Takata

    Full Text Available EPF1-EPF2 and EPFL9/Stomagen act antagonistically in regulating leaf stomatal density. The aim of this study was to elucidate the evolutionary functional divergence of EPF/EPFL family genes. Phylogenetic analyses showed that AtEPFL9/Stomagen-like genes are conserved only in vascular plants and are closely related to AtEPF1/EPF2-like genes. Modeling showed that EPF/EPFL peptides share a common 3D structure that is constituted of a scaffold and loop. Molecular dynamics simulation suggested that AtEPF1/EPF2-like peptides form an additional disulfide bond in their loop regions and show greater flexibility in these regions than AtEPFL9/Stomagen-like peptides. This study uncovered the evolutionary relationship and the conformational divergence of proteins encoded by the EPF/EPFL family genes.

  19. Evolutionary relationship and structural characterization of the EPF/EPFL gene family.

    Science.gov (United States)

    Takata, Naoki; Yokota, Kiyonobu; Ohki, Shinya; Mori, Masashi; Taniguchi, Toru; Kurita, Manabu

    2013-01-01

    EPF1-EPF2 and EPFL9/Stomagen act antagonistically in regulating leaf stomatal density. The aim of this study was to elucidate the evolutionary functional divergence of EPF/EPFL family genes. Phylogenetic analyses showed that AtEPFL9/Stomagen-like genes are conserved only in vascular plants and are closely related to AtEPF1/EPF2-like genes. Modeling showed that EPF/EPFL peptides share a common 3D structure that is constituted of a scaffold and loop. Molecular dynamics simulation suggested that AtEPF1/EPF2-like peptides form an additional disulfide bond in their loop regions and show greater flexibility in these regions than AtEPFL9/Stomagen-like peptides. This study uncovered the evolutionary relationship and the conformational divergence of proteins encoded by the EPF/EPFL family genes.

  20. Positive selection in the SLC11A1 gene in the family Equidae

    DEFF Research Database (Denmark)

    Bayerova, Zuzana; Janova, Eva; Matiasovic, Jan

    2016-01-01

    Immunity-related genes are a suitable model for studying effects of selection at the genomic level. Some of them are highly conserved due to functional constraints and purifying selection, while others are variable and change quickly to cope with the variation of pathogens. The SLC11A1 gene encodes...... a transporter protein mediating antimicrobial activity of macrophages. Little is known about the patterns of selection shaping this gene during evolution. Although it is a typical evolutionarily conserved gene, functionally important polymorphisms associated with various diseases were identified in humans...... and other species. We analyzed the genomic organization, genetic variation, and evolution of the SLC11A1 gene in the family Equidae to identify patterns of selection within this important gene. Nucleotide SLC11A1 sequences were shown to be highly conserved in ten equid species, with more than 97 % sequence...

  1. Genome-wide identification and characterization of the SBP-box gene family in Petunia.

    Science.gov (United States)

    Zhou, Qin; Zhang, Sisi; Chen, Feng; Liu, Baojun; Wu, Lan; Li, Fei; Zhang, Jiaqi; Bao, Manzhu; Liu, Guofeng

    2018-03-12

    SQUAMOSA PROMOTER BINDING PROTEIN (SBP)-box genes encode a family of plant-specific transcription factors (TFs) that play important roles in many growth and development processes including phase transition, leaf initiation, shoot and inflorescence branching, fruit development and ripening etc. The SBP-box gene family has been identified and characterized in many species, but has not been well studied in Petunia, an important ornamental genus. We identified 21 putative SPL genes of Petunia axillaris and P. inflata from the reference genome of P. axillaris N and P. inflata S6, respectively, which were supported by the transcriptome data. For further confirmation, all the 21 genes were also cloned from P. hybrida line W115 (Mitchel diploid). Phylogenetic analysis based on the highly conserved SBP domains arranged PhSPLs in eight groups, analogous to those from Arabidopsis and tomato. Furthermore, the Petunia SPL genes had similar exon-intron structure and the deduced proteins contained very similar conserved motifs within the same subgroup. Out of 21 PhSPL genes, fourteen were predicted to be potential targets of PhmiR156/157, and the putative miR156/157 response elements (MREs) were located in the coding region of group IV, V, VII and VIII genes, but in the 3'-UTR regions of group VI genes. SPL genes were also identified from another two wild Petunia species, P. integrifolia and P. exserta, based on their transcriptome databases to investigate the origin of PhSPLs. Phylogenetic analysis and multiple alignments of the coding sequences of PhSPLs and their orthologs from wild species indicated that PhSPLs were originated mainly from P. axillaris. qRT-PCR analysis demonstrated differential spatiotemperal expression patterns of PhSPL genes in petunia and many were expressed predominantly in the axillary buds and/or inflorescences. In addition, overexpression of PhSPL9a and PhSPL9b in Arabidopsis suggested that these genes play a conserved role in promoting the vegetative

  2. Phylogenetic analysis of the expansion of the MATH-BTB gene family in the grasses.

    Science.gov (United States)

    Juranić, Martina; Dresselhaus, Thomas

    2014-01-01

    MATH-BTB proteins are known to act as substrate-specific adaptors of cullin3 (CUL3)-based ubiquitin E3 ligases to target protein for ubiquitination. In a previous study we reported the presence of 31 MATH-BTB genes in the maize genome and determined the regulatory role of the MATH-BTB protein MAB1 during meiosis to mitosis transition. In contrast to maize, there are only 6 homologous genes in the model plant Arabidopsis, while this family has largely expanded in grasses. Here, we report a phylogenetic analysis of the MATH-BTB gene family in 9 land plant species including various mosses, eudicots, and grasses. We extend a previous classification of the plant MATH-BTB family and additionally arrange the expanded group into 5 grass-specific clades. Synteny studies indicate that expansion occurred to a large extent due to local gene duplications. Expression studies of 3 closely related MATH-BTB genes in maize (MAB1-3) indicate highly specific expression pattern. In summary, this work provides a solid base for further studies comparing genetic and functional information of the MATH-BTB family especially in the grasses.

  3. [Study of gene mutation and pathogenetic mechanism for a family with Waardenburg syndrome].

    Science.gov (United States)

    Chen, Hongsheng; Liao, Xinbin; Liu, Yalan; He, Chufeng; Zhang, Hua; Jiang, Lu; Feng, Yong; Mei, Lingyun

    2017-08-10

    To explore the pathogenetic mechanism of a family affected with Waardenburg syndrome. Clinical data of the family was collected. Potential mutation of the MITF, SOX10 and SNAI2 genes were screened. Plasmids for wild type (WT) and mutant MITF proteins were constructed to determine their exogenous expression and subcellular distribution by Western blotting and immunofluorescence assay, respectively. A heterozygous c.763C>T (p.R255X) mutation was detected in exon 8 of the MITF gene in the proband and all other patients from the family. No pathological mutation of the SOX10 and SNAI2 genes was detected. The DNA sequences of plasmids of MITF wild and mutant MITF R255X were confirmed. Both proteins were detected with the expected size. WT MITF protein only localized in the nucleus, whereas R255X protein showed aberrant localization in the nucleus as well as the cytoplasm. The c.763C>T mutation of the MITF gene probably underlies the disease in this family. The mutation can affect the subcellular distribution of MITF proteins in vitro, which may shed light on the molecular mechanism of Waardenburg syndrome caused by mutations of the MITF gene.

  4. Prevalence of variations in melanoma susceptibility genes among Slovenian melanoma families

    Directory of Open Access Journals (Sweden)

    Besic Nikola

    2008-09-01

    Full Text Available Abstract Background Two high-risk genes have been implicated in the development of CM (cutaneous melanoma. Germline mutations of the CDKN2A gene are found in CDK4 gene reported to date. Beside those high penetrance genes, certain allelic variants of the MC1R gene modify the risk of developing the disease. The aims of our study were: to determine the prevalence of germline CDKN2A mutations and variants in members of families with familial CM and in patients with multiple primary CM; to search for possible CDK4 mutations, and to determine the frequency of variations in the MC1R gene. Methods From January 2001 until January 2007, 64 individuals were included in the study. The group included 28 patients and 7 healthy relatives belonging to 25 families, 26 patients with multiple primary tumors and 3 children with CM. Additionally 54 healthy individuals were included as a control group. Mutations and variants of the melanoma susceptibility genes were identified by direct sequencing. Results Seven families with CDKN2A mutations were discovered (7/25 or 28.0%. The L94Q mutation found in one family had not been previously reported in other populations. The D84N variant, with possible biological impact, was discovered in the case of patient without family history but with multiple primary CM. Only one mutation carrier was found in the control group. Further analysis revealed that c.540C>T heterozygous carriers were more common in the group of CM patients and their healthy relatives (11/64 vs. 2/54. One p14ARF variant was discovered in the control group and no mutations of the CDK4 gene were found. Most frequently found variants of the MC1R gene were T314T, V60L, V92M, R151C, R160W and R163Q with frequencies slightly higher in the group of patients and their relatives than in the group of controls, but the difference was statistically insignificant. Conclusion The present study has shown high prevalence of p16INK4A mutations in Slovenian population of

  5. Comprehensive identification and expression analysis of Hsp90s gene family in Solanum lycopersicum.

    Science.gov (United States)

    Zai, W S; Miao, L X; Xiong, Z L; Zhang, H L; Ma, Y R; Li, Y L; Chen, Y B; Ye, S G

    2015-07-14

    Heat shock protein 90 (Hsp90) is a protein produced by plants in response to adverse environmental stresses. In this study, we identified and analyzed Hsp90 gene family members using a bioinformatic method based on genomic data from tomato (Solanum lycopersicum L.). The results illustrated that tomato contains at least 7 Hsp90 genes distributed on 6 chromosomes; protein lengths ranged from 267-794 amino acids. Intron numbers ranged from 2-19 in the genes. The phylogenetic tree revealed that Hsp90 genes in tomato (Solanum lycopersicum L.), rice (Oryza sativa L.), and Arabidopsis (Arabidopsis thaliana L.) could be divided into 5 groups, which included 3 pairs of orthologous genes and 4 pairs of paralogous genes. Expression analysis of RNA-sequence data showed that the Hsp90-1 gene was specifically expressed in mature fruits, while Hsp90-5 and Hsp90-6 showed opposite expression patterns in various tissues of cultivated and wild tomatoes. The expression levels of the Hsp90-1, Hsp90-2, and Hsp90- 3 genes in various tissues of cultivated tomatoes were high, while both the expression levels of genes Hsp90-3 and Hsp90-4 were low. Additionally, quantitative real-time polymerase chain reaction showed that these genes were involved in the responses to yellow leaf curl virus in tomato plant leaves. Our results provide a foundation for identifying the function of the Hsp90 gene in tomato.

  6. Characterization of Soybean WRKY Gene Family and Identification of Soybean WRKY Genes that Promote Resistance to Soybean Cyst Nematode.

    Science.gov (United States)

    Yang, Yan; Zhou, Yuan; Chi, Yingjun; Fan, Baofang; Chen, Zhixiang

    2017-12-19

    WRKY proteins are a superfamily of plant transcription factors with important roles in plants. WRKY proteins have been extensively analyzed in plant species including Arabidopsis and rice. Here we report characterization of soybean WRKY gene family and their functional analysis in resistance to soybean cyst nematode (SCN), the most important soybean pathogen. Through search of the soybean genome, we identified 174 genes encoding WRKY proteins that can be classified into seven groups as established in other plants. WRKY variants including a WRKY-related protein unique to legumes have also been identified. Expression analysis reveals both diverse expression patterns in different soybean tissues and preferential expression of specific WRKY groups in certain tissues. Furthermore, a large number of soybean WRKY genes were responsive to salicylic acid. To identify soybean WRKY genes that promote soybean resistance to SCN, we first screened soybean WRKY genes for enhancing SCN resistance when over-expressed in transgenic soybean hairy roots. To confirm the results, we transformed five WRKY genes into a SCN-susceptible soybean cultivar and generated transgenic soybean lines. Transgenic soybean lines overexpressing three WRKY transgenes displayed increased resistance to SCN. Thus, WRKY genes could be explored to develop new soybean cultivars with enhanced resistance to SCN.

  7. X-exome sequencing of 405 unresolved families identifies seven novel intellectual disability genes.

    Science.gov (United States)

    Hu, H; Haas, S A; Chelly, J; Van Esch, H; Raynaud, M; de Brouwer, A P M; Weinert, S; Froyen, G; Frints, S G M; Laumonnier, F; Zemojtel, T; Love, M I; Richard, H; Emde, A-K; Bienek, M; Jensen, C; Hambrock, M; Fischer, U; Langnick, C; Feldkamp, M; Wissink-Lindhout, W; Lebrun, N; Castelnau, L; Rucci, J; Montjean, R; Dorseuil, O; Billuart, P; Stuhlmann, T; Shaw, M; Corbett, M A; Gardner, A; Willis-Owen, S; Tan, C; Friend, K L; Belet, S; van Roozendaal, K E P; Jimenez-Pocquet, M; Moizard, M-P; Ronce, N; Sun, R; O'Keeffe, S; Chenna, R; van Bömmel, A; Göke, J; Hackett, A; Field, M; Christie, L; Boyle, J; Haan, E; Nelson, J; Turner, G; Baynam, G; Gillessen-Kaesbach, G; Müller, U; Steinberger, D; Budny, B; Badura-Stronka, M; Latos-Bieleńska, A; Ousager, L B; Wieacker, P; Rodríguez Criado, G; Bondeson, M-L; Annerén, G; Dufke, A; Cohen, M; Van Maldergem, L; Vincent-Delorme, C; Echenne, B; Simon-Bouy, B; Kleefstra, T; Willemsen, M; Fryns, J-P; Devriendt, K; Ullmann, R; Vingron, M; Wrogemann, K; Wienker, T F; Tzschach, A; van Bokhoven, H; Gecz, J; Jentsch, T J; Chen, W; Ropers, H-H; Kalscheuer, V M

    2016-01-01

    X-linked intellectual disability (XLID) is a clinically and genetically heterogeneous disorder. During the past two decades in excess of 100 X-chromosome ID genes have been identified. Yet, a large number of families mapping to the X-chromosome remained unresolved suggesting that more XLID genes or loci are yet to be identified. Here, we have investigated 405 unresolved families with XLID. We employed massively parallel sequencing of all X-chromosome exons in the index males. The majority of these males were previously tested negative for copy number variations and for mutations in a subset of known XLID genes by Sanger sequencing. In total, 745 X-chromosomal genes were screened. After stringent filtering, a total of 1297 non-recurrent exonic variants remained for prioritization. Co-segregation analysis of potential clinically relevant changes revealed that 80 families (20%) carried pathogenic variants in established XLID genes. In 19 families, we detected likely causative protein truncating and missense variants in 7 novel and validated XLID genes (CLCN4, CNKSR2, FRMPD4, KLHL15, LAS1L, RLIM and USP27X) and potentially deleterious variants in 2 novel candidate XLID genes (CDK16 and TAF1). We show that the CLCN4 and CNKSR2 variants impair protein functions as indicated by electrophysiological studies and altered differentiation of cultured primary neurons from Clcn4(-/-) mice or after mRNA knock-down. The newly identified and candidate XLID proteins belong to pathways and networks with established roles in cognitive function and intellectual disability in particular. We suggest that systematic sequencing of all X-chromosomal genes in a cohort of patients with genetic evidence for X-chromosome locus involvement may resolve up to 58% of Fragile X-negative cases.

  8. New mutation of the MPZ gene in a family with the Dejerine-Sottas disease phenotype.

    Science.gov (United States)

    Floroskufi, Paraskewi; Panas, Marios; Karadima, Georgia; Vassilopoulos, Demetris

    2007-05-01

    Charcot-Marie-Tooth disease type 1B is associated with mutations in the myelin protein zero gene. In the present study a new myelin protein zero gene mutation (c.89T>C,Ile30Thr) was detected in a family with the Dejerine-Sottas disease phenotype. The results support the hypothesis that severe, early-onset neuropathy may be related to either an alteration of a conserved amino acid or a disruption of the tertiary structure of myelin protein zero.

  9. Genome-wide analysis of the GRAS gene family in Prunus mume.

    Science.gov (United States)

    Lu, Jiuxing; Wang, Tao; Xu, Zongda; Sun, Lidan; Zhang, Qixiang

    2015-02-01

    Prunus mume is an ornamental flower and fruit tree in Rosaceae. We investigated the GRAS gene family to improve the breeding and cultivation of P. mume and other Rosaceae fruit trees. The GRAS gene family encodes transcriptional regulators that have diverse functions in plant growth and development, such as gibberellin and phytochrome A signal transduction, root radial patterning, and axillary meristem formation and gametogenesis in the P. mume genome. Despite the important roles of these genes in plant growth regulation, no findings on the GRAS genes of P. mume have been reported. In this study, we discerned phylogenetic relationships of P. mume GRAS genes, and their locations, structures in the genome and expression levels of different tissues. Out of 46 identified GRAS genes, 45 were located on the 8 P. mume chromosomes. Phylogenetic results showed that these genes could be classified into 11 groups. We found that Group X was P. mume-specific, and three genes of Group IX clustered with the rice-specific gene Os4. We speculated that these genes existed before the divergence of dicotyledons and monocotyledons and were lost in Arabidopsis. Tissue expression analysis indicated that 13 genes showed high expression levels in roots, stems, leaves, flowers and fruits, and were related to plant growth and development. Functional analysis of 24 GRAS genes and an orthologous relationship analysis indicated that many functioned during plant growth and flower and fruit development. Our bioinformatics analysis provides valuable information to improve the economic, agronomic and ecological benefits of P. mume and other Rosaceae fruit trees.

  10. Genome-wide analysis of the WRKY gene family in cotton.

    Science.gov (United States)

    Dou, Lingling; Zhang, Xiaohong; Pang, Chaoyou; Song, Meizhen; Wei, Hengling; Fan, Shuli; Yu, Shuxun

    2014-12-01

    WRKY proteins are major transcription factors involved in regulating plant growth and development. Although many studies have focused on the functional identification of WRKY genes, our knowledge concerning many areas of WRKY gene biology is limited. For example, in cotton, the phylogenetic characteristics, global expression patterns, molecular mechanisms regulating expression, and target genes/pathways of WRKY genes are poorly characterized. Therefore, in this study, we present a genome-wide analysis of the WRKY gene family in cotton (Gossypium raimondii and Gossypium hirsutum). We identified 116 WRKY genes in G. raimondii from the completed genome sequence, and we cloned 102 WRKY genes in G. hirsutum. Chromosomal location analysis indicated that WRKY genes in G. raimondii evolved mainly from segmental duplication followed by tandem amplifications. Phylogenetic analysis of alga, bryophyte, lycophyta, monocot and eudicot WRKY domains revealed family member expansion with increasing complexity of the plant body. Microarray, expression profiling and qRT-PCR data revealed that WRKY genes in G. hirsutum may regulate the development of fibers, anthers, tissues (roots, stems, leaves and embryos), and are involved in the response to stresses. Expression analysis showed that most group II and III GhWRKY genes are highly expressed under diverse stresses. Group I members, representing the ancestral form, seem to be insensitive to abiotic stress, with low expression divergence. Our results indicate that cotton WRKY genes might have evolved by adaptive duplication, leading to sensitivity to diverse stresses. This study provides fundamental information to inform further analysis and understanding of WRKY gene functions in cotton species.

  11. Genome-wide survey and characterization of the WRKY gene family in Populus trichocarpa.

    Science.gov (United States)

    He, Hongsheng; Dong, Qing; Shao, Yuanhua; Jiang, Haiyang; Zhu, Suwen; Cheng, Beijiu; Xiang, Yan

    2012-07-01

    WRKY transcription factors participate in diverse physiological and developmental processes in plants. They have highly conserved WRKYGQK amino acid sequences in their N-termini, followed by the novel zinc-finger-like motifs, Cys₂His₂ or Cys₂HisCys. To date, numerous WRKY genes have been identified and characterized in a number of herbaceous species. Survey and characterization of WRKY genes in a ligneous species would facilitate a better understanding of the evolutionary processes and functions of this gene family. In this study, 104 poplar WRKY genes (PtWRKY) were identified in the latest poplar genome sequence. According to their structural features, the predicted members were divided into the previously defined groups I-III, as described in rice. In addition, chromosomal localization of the genes demonstrated that there might be WRKY gene hot spots in 2.3 Mb regions on chromosome 14. Furthermore, approximately 83% (86 out of 104) WRKY genes participated in gene duplication events, including 69% (29 out of 42) gene pairs which exhibited segmental duplication. Using semi-quantitative RT-PCR, the expression patterns of subgroup III genes were investigated under different stresses [cold, drought, salinity and salicylic acid (SA)]. The data revealed that these genes presented different expression levels in response to various stress conditions. Expression analysis exhibited PtWRKY76 gene induced markedly in 0.1 mM SA or 25% PEG-6000 treatment. The results presented here provide a fundamental clue for cloning specific function genes in further studies and applications. This study identified 104 poplar WRKY genes and demonstrated WRKY gene hot spots on chromosome 14. Furthermore, semi-quantitative RT-PCR showed variable stress responses in subgroup III.

  12. Diversification and evolution of the SDG gene family in Brassica rapa after the whole genome triplication.

    Science.gov (United States)

    Dong, Heng; Liu, Dandan; Han, Tianyu; Zhao, Yuxue; Sun, Ji; Lin, Sue; Cao, Jiashu; Chen, Zhong-Hua; Huang, Li

    2015-11-24

    Histone lysine methylation, controlled by the SET Domain Group (SDG) gene family, is part of the histone code that regulates chromatin function and epigenetic control of gene expression. Analyzing the SDG gene family in Brassica rapa for their gene structure, domain architecture, subcellular localization, rate of molecular evolution and gene expression pattern revealed common occurrences of subfunctionalization and neofunctionalization in BrSDGs. In comparison with Arabidopsis thaliana, the BrSDG gene family was found to be more divergent than AtSDGs, which might partly explain the rich variety of morphotypes in B. rapa. In addition, a new evolutionary pattern of the four main groups of SDGs was presented, in which the Trx group and the SUVR subgroup evolved faster than the E(z), Ash groups and the SUVH subgroup. These differences in evolutionary rate among the four main groups of SDGs are perhaps due to the complexity and variability of the regions that bind with biomacromolecules, which guide SDGs to their target loci.

  13. Rapid expansion of the protein disulfide isomerase gene family facilitates the folding of venom peptides

    DEFF Research Database (Denmark)

    Safavi-Hemami, Helena; Li, Qing; Jackson, Ronneshia L.

    2016-01-01

    Formation of correct disulfide bonds in the endoplasmic reticulum is a crucial step for folding proteins destined for secretion. Protein disulfide isomerases (PDIs) play a central role in this process. We report a previously unidentified, hypervariable family of PDIs that represents the most...... diverse gene family of oxidoreductases described in a single genus to date. These enzymes are highly expressed specifically in the venom glands of predatory cone snails, animals that synthesize a remarkably diverse set of cysteine-rich peptide toxins (conotoxins). Enzymes in this PDI family, termed...

  14. The nuclear IκB family of proteins controls gene regulation and immune homeostasis.

    Science.gov (United States)

    MaruYama, Takashi

    2015-10-01

    The inhibitory IκB family of proteins is subdivided into two groups based on protein localization in the cytoplasm or in the nucleus. These proteins interact with NF-κB, a major transcription factor regulating the expression of many inflammatory cytokines, by modulating its transcriptional activity. However, nuclear IκB family proteins not only interact with NF-κB to change its transcriptional activity, but they also bind to chromatin and control gene expression. This review provides an overview of nuclear IκB family proteins and their role in immune homeostasis. Copyright © 2015 Elsevier B.V. All rights reserved.

  15. Annotating individual human genomes.

    Science.gov (United States)

    Torkamani, Ali; Scott-Van Zeeland, Ashley A; Topol, Eric J; Schork, Nicholas J

    2011-10-01

    Advances in DNA sequencing technologies have made it possible to rapidly, accurately and affordably sequence entire individual human genomes. As impressive as this ability seems, however, it will not likely amount to much if one cannot extract meaningful information from individual sequence data. Annotating variations within individual genomes and providing information about their biological or phenotypic impact will thus be crucially important in moving individual sequencing projects forward, especially in the context of the clinical use of sequence information. In this paper we consider the various ways in which one might annotate individual sequence variations and point out limitations in the available methods for doing so. It is arguable that, in the foreseeable future, DNA sequencing of individual genomes will become routine for clinical, research, forensic, and personal purposes. We therefore also consider directions and areas for further research in annotating genomic variants. Copyright © 2011 Elsevier Inc. All rights reserved.

  16. ANNOTATING INDIVIDUAL HUMAN GENOMES*

    Science.gov (United States)

    Torkamani, Ali; Scott-Van Zeeland, Ashley A.; Topol, Eric J.; Schork, Nicholas J.

    2014-01-01

    Advances in DNA sequencing technologies have made it possible to rapidly, accurately and affordably sequence entire individual human genomes. As impressive as this ability seems, however, it will not likely to amount to much if one cannot extract meaningful information from individual sequence data. Annotating variations within individual genomes and providing information about their biological or phenotypic impact will thus be crucially important in moving individual sequencing projects forward, especially in the context of the clinical use of sequence information. In this paper we consider the various ways in which one might annotate individual sequence variations and point out limitations in the available methods for doing so. It is arguable that, in the foreseeable future, DNA sequencing of individual genomes will become routine for clinical, research, forensic, and personal purposes. We therefore also consider directions and areas for further research in annotating genomic variants. PMID:21839162

  17. Comparative genomic analysis of the WRKY III gene family in populus, grape, arabidopsis and rice.

    Science.gov (United States)

    Wang, Yiyi; Feng, Lin; Zhu, Yuxin; Li, Yuan; Yan, Hanwei; Xiang, Yan

    2015-09-08

    WRKY III genes have significant functions in regulating plant development and resistance. In plant, WRKY gene family has been studied in many species, however, there still lack a comprehensive analysis of WRKY III genes in the woody plant species poplar, three representative lineages of flowering plant species are incorporated in most analyses: Arabidopsis (a model plant for annual herbaceous dicots), grape (one model plant for perennial dicots) and Oryza sativa (a model plant for monocots). In this study, we identified 10, 6, 13 and 28 WRKY III genes in the genomes of Populus trichocarpa, grape (Vitis vinifera), Arabidopsis thaliana and rice (Oryza sativa), respectively. Phylogenetic analysis revealed that the WRKY III proteins could be divided into four clades. By microsynteny analysis, we found that the duplicated regions were more conserved between poplar and grape than Arabidopsis or rice. We dated their duplications by Ks analysis of Populus WRKY III genes and demonstrated that all the blocks were formed after the divergence of monocots and dicots. Strong purifying selection has played a key role in the maintenance of WRKY III genes in Populus. Tissue expression analysis of the WRKY III genes in Populus revealed that five were most highly expressed in the xylem. We also performed quantitative real-time reverse transcription PCR analysis of WRKY III genes in Populus treated with salicylic acid, abscisic acid and polyethylene glycol to explore their stress-related expression patterns. This study highlighted the duplication and diversification of the WRKY III gene family in Populus and provided a comprehensive analysis of this gene family in the Populus genome. Our results indicated that the majority of WRKY III genes of Populus was expanded by large-scale gene duplication. The expression pattern of PtrWRKYIII gene identified that these genes play important roles in the xylem during poplar growth and development, and may play crucial role in defense to drought

  18. Supplementary Material for: BEACON: automated tool for Bacterial GEnome Annotation ComparisON

    KAUST Repository

    Kalkatawi, Manal M.; Alam, Intikhab; Bajic, Vladimir B.

    2015-01-01

    Abstract Background Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). Results The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACONâ s utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27 %, while the number of genes without any function assignment is reduced. Conclusions We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/ .

  19. Gene Environment Interactions and Predictors of Colorectal Cancer in Family-Based, Multi-Ethnic Groups

    Directory of Open Access Journals (Sweden)

    S. Pamela K. Shiao

    2018-02-01

    Full Text Available For the personalization of polygenic/omics-based health care, the purpose of this study was to examine the gene–environment interactions and predictors of colorectal cancer (CRC by including five key genes in the one-carbon metabolism pathways. In this proof-of-concept study, we included a total of 54 families and 108 participants, 54 CRC cases and 54 matched family friends representing four major racial ethnic groups in southern California (White, Asian, Hispanics, and Black. We used three phases of data analytics, including exploratory, family-based analyses adjusting for the dependence within the family for sharing genetic heritage, the ensemble method, and generalized regression models for predictive modeling with a machine learning validation procedure to validate the results for enhanced prediction and reproducibility. The results revealed that despite the family members sharing genetic heritage, the CRC group had greater combined gene polymorphism rates than the family controls (p < 0.05, on MTHFR C677T, MTR A2756G, MTRR A66G, and DHFR 19 bp except MTHFR A1298C. Four racial groups presented different polymorphism rates for four genes (all p < 0.05 except MTHFR A1298C. Following the ensemble method, the most influential factors were identified, and the best predictive models were generated by using the generalized regression models, with Akaike’s information criterion and leave-one-out cross validation methods. Body mass index (BMI and gender were consistent predictors of CRC for both models when individual genes versus total polymorphism counts were used, and alcohol use was interactive with BMI status. Body mass index status was also interactive with both gender and MTHFR C677T gene polymorphism, and the exposure to environmental pollutants was an additional predictor. These results point to the important roles of environmental and modifiable factors in relation to gene–environment interactions in the prevention of CRC.

  20. The grapevine kinome: annotation, classification and expression patterns in developmental processes and stress responses.

    Science.gov (United States)

    Zhu, Kaikai; Wang, Xiaolong; Liu, Jinyi; Tang, Jun; Cheng, Qunkang; Chen, Jin-Gui; Cheng, Zong-Ming Max

    2018-01-01

    Protein kinases (PKs) have evolved as the largest family of molecular switches that regulate protein activities associated with almost all essential cellular functions. Only a fraction of plant PKs, however, have been functionally characterized even in model plant species. In the present study, the entire grapevine kinome was identified and annotated using the most recent version of the grapevine genome. A total of 1168 PK-encoding genes were identified and classified into 20 groups and 121 families, with the RLK-Pelle group being the largest, with 872 members. The 1168 kinase genes were unevenly distributed over all 19 chromosomes, and both tandem and segmental duplications contributed to the expansion of the grapevine kinome, especially of the RLK-Pelle group. Ka/Ks values indicated that most of the tandem and segmental duplication events were under purifying selection. The grapevine kinome families exhibited different expression patterns during plant development and in response to various stress treatments, with many being coexpressed. The comprehensive annotation of grapevine kinase genes, their patterns of expression and coexpression, and the related information facilitate a more complete understanding of the roles of various grapevine kinases in growth and development, responses to abiotic stress, and evolutionary history.

  1. Genome-wide identification and expression analysis of the WRKY gene family in cassava

    Directory of Open Access Journals (Sweden)

    Yunxie eWei

    2016-02-01

    Full Text Available The WRKY family, a large family of transcription factors (TFs found in higher plants, plays central roles in many aspects of physiological processes and adaption to environment. However, little information is available regarding the WRKY family in cassava (Manihot esculenta. In the present study, 85 WRKY genes were identified from the cassava genome and classified into three groups according to conserved WRKY domains and zinc-finger structure. Conserved motif analysis showed that all of the identified MeWRKYs had the conserved WRKY domain. Gene structure analysis suggested that the number of introns in MeWRKY genes varied from 1 to 5, with the majority of MeWRKY genes containing 3 exons. Expression profiles of MeWRKY genes in different tissues and in response to drought stress were analyzed using the RNA-seq technique. The results showed that 72 MeWRKY genes had differential expression in their transcript abundance and 78 MeWRKY genes were differentially expressed in response to drought stresses in different accessions, indicating their contribution to plant developmental processes and drought stress resistance in cassava. Finally, the expression of 9 WRKY genes was analyzed by qRT-PCR under osmotic, salt, ABA, H2O2, and cold treatments, indicating that MeWRKYs may be involved in different signaling pathways. Taken together, this systematic analysis identifies some tissue-specific and abiotic stress-responsive candidate MeWRKY genes for further functional assays in planta, and provides a solid foundation for understanding of abiotic stress responses and signal transduction mediated by WRKYs in cassava.

  2. Genome-Wide Identification and Expression Analysis of the WRKY Gene Family in Cassava.

    Science.gov (United States)

    Wei, Yunxie; Shi, Haitao; Xia, Zhiqiang; Tie, Weiwei; Ding, Zehong; Yan, Yan; Wang, Wenquan; Hu, Wei; Li, Kaimian

    2016-01-01

    The WRKY family, a large family of transcription factors (TFs) found in higher plants, plays central roles in many aspects of physiological processes and adaption to environment. However, little information is available regarding the WRKY family in cassava (Manihot esculenta). In the present study, 85 WRKY genes were identified from the cassava genome and classified into three groups according to conserved WRKY domains and zinc-finger structure. Conserved motif analysis showed that all of the identified MeWRKYs had the conserved WRKY domain. Gene structure analysis suggested that the number of introns in MeWRKY genes varied from 1 to 5, with the majority of MeWRKY genes containing three exons. Expression profiles of MeWRKY genes in different tissues and in response to drought stress were analyzed using the RNA-seq technique. The results showed that 72 MeWRKY genes had differential expression in their transcript abundance and 78 MeWRKY genes were differentially expressed in response to drought stresses in different accessions, indicating their contribution to plant developmental processes and drought stress resistance in cassava. Finally, the expression of 9 WRKY genes was analyzed by qRT-PCR under osmotic, salt, ABA, H2O2, and cold treatments, indicating that MeWRKYs may be involved in different signaling pathways. Taken together, this systematic analysis identifies some tissue-specific and abiotic stress-responsive candidate MeWRKY genes for further functional assays in planta, and provides a solid foundation for understanding of abiotic stress responses and signal transduction mediated by WRKYs in cassava.

  3. Three novel PHEX gene mutations in four Chinese families with X-linked dominant hypophosphatemic rickets

    Energy Technology Data Exchange (ETDEWEB)

    Kang, Qing-lin [Department of Orthopedic Surgery, Shanghai Jiao Tong University Affiliated Sixth People' s Hospital, Shanghai 200233 (China); Xu, Jia [Department of Orthopedic Surgery, Shanghai Jiao Tong University Affiliated Sixth People' s Hospital, Shanghai 200233 (China); Metabolic Bone Disease and Genetic Research Unit, Department of Osteoporosis and Bone Diseases, Shanghai Jiao Tong University Affiliated Sixth People' s Hospital, Shanghai 200233 (China); Medical College of Soochow University, Suzhou, Jiangsu province 215000 (China); Zhang, Zeng [Department of Orthopedic Surgery, Shanghai Jiao Tong University Affiliated Sixth People' s Hospital, Shanghai 200233 (China); Metabolic Bone Disease and Genetic Research Unit, Department of Osteoporosis and Bone Diseases, Shanghai Jiao Tong University Affiliated Sixth People' s Hospital, Shanghai 200233 (China); He, Jin-wei [Metabolic Bone Disease and Genetic Research Unit, Department of Osteoporosis and Bone Diseases, Shanghai Jiao Tong University Affiliated Sixth People' s Hospital, Shanghai 200233 (China); Lu, Lian-song [Department of Orthopedic Surgery, Shanghai Jiao Tong University Affiliated Sixth People' s Hospital, Shanghai 200233 (China); Medical College of Soochow University, Suzhou, Jiangsu province 215000 (China); Fu, Wen-zhen [Metabolic Bone Disease and Genetic Research Unit, Department of Osteoporosis and Bone Diseases, Shanghai Jiao Tong University Affiliated Sixth People' s Hospital, Shanghai 200233 (China); Zhang, Zhen-lin, E-mail: zzl2002@medmail.com.cn [Metabolic Bone Disease and Genetic Research Unit, Department of Osteoporosis and Bone Diseases, Shanghai Jiao Tong University Affiliated Sixth People' s Hospital, Shanghai 200233 (China)

    2012-07-13

    Highlights: Black-Right-Pointing-Pointer In our study, all of the patients were of Han Chinese ethnicity, which were rarely reported. Black-Right-Pointing-Pointer We identified three novel PHEX gene mutations in four unrelated families with XLH. Black-Right-Pointing-Pointer We found that the relationship between the phenotype and genotype of the PHEX gene was not invariant. Black-Right-Pointing-Pointer We found that two PHEX gene sites, p.534 and p.731, were conserved. -- Abstract: Background: X-linked hypophosphatemia (XLH), the most common form of inherited rickets, is a dominant disorder that is characterized by renal phosphate wasting with hypophosphatemia, abnormal bone mineralization, short stature, and rachitic manifestations. The related gene with inactivating mutations associated with XLH has been identified as PHEX, which is a phosphate-regulating gene with homologies to endopeptidases on the X chromosome. In this study, a variety of PHEX mutations were identified in four Chinese families with XLH. Methods: We investigated four unrelated Chinese families who exhibited typical features of XLH by using PCR to analyze mutations that were then sequenced. The laboratory and radiological investigations were conducted simultaneously. Results: Three novel mutations were found in these four families: one frameshift mutation, c.2033dupT in exon 20, resulting in p.T679H; one nonsense mutation, c.1294A > T in exon 11, resulting in p.K432X; and one missense mutation, c.2192T > C in exon 22, resulting in p.F731S. Conclusions: We found that the PHEX gene mutations were responsible for XLH in these Chinese families. Our findings are useful for understanding the genetic basis of Chinese patients with XLH.

  4. Three novel PHEX gene mutations in four Chinese families with X-linked dominant hypophosphatemic rickets

    International Nuclear Information System (INIS)

    Kang, Qing-lin; Xu, Jia; Zhang, Zeng; He, Jin-wei; Lu, Lian-song; Fu, Wen-zhen; Zhang, Zhen-lin

    2012-01-01

    Highlights: ► In our study, all of the patients were of Han Chinese ethnicity, which were rarely reported. ► We identified three novel PHEX gene mutations in four unrelated families with XLH. ► We found that the relationship between the phenotype and genotype of the PHEX gene was not invariant. ► We found that two PHEX gene sites, p.534 and p.731, were conserved. -- Abstract: Background: X-linked hypophosphatemia (XLH), the most common form of inherited rickets, is a dominant disorder that is characterized by renal phosphate wasting with hypophosphatemia, abnormal bone mineralization, short stature, and rachitic manifestations. The related gene with inactivating mutations associated with XLH has been identified as PHEX, which is a phosphate-regulating gene with homologies to endopeptidases on the X chromosome. In this study, a variety of PHEX mutations were identified in four Chinese families with XLH. Methods: We investigated four unrelated Chinese families who exhibited typical features of XLH by using PCR to analyze mutations that were then sequenced. The laboratory and radiological investigations were conducted simultaneously. Results: Three novel mutations were found in these four families: one frameshift mutation, c.2033dupT in exon 20, resulting in p.T679H; one nonsense mutation, c.1294A > T in exon 11, resulting in p.K432X; and one missense mutation, c.2192T > C in exon 22, resulting in p.F731S. Conclusions: We found that the PHEX gene mutations were responsible for XLH in these Chinese families. Our findings are useful for understanding the genetic basis of Chinese patients with XLH.

  5. Comprehensive Genomic Identification and Expression Analysis of the Phosphate Transporter (PHT) Gene Family in Apple.

    Science.gov (United States)

    Sun, Tingting; Li, Mingjun; Shao, Yun; Yu, Lingyan; Ma, Fengwang

    2017-01-01

    Elemental phosphorus (Pi) is essential to plant growth and development. The family of phosphate transporters (PHTs) mediates the uptake and translocation of Pi inside the plants. Members include five sub-cellular phosphate transporters that play different roles in Pi uptake and transport. We searched the Genome Database for Rosaceae and identified five clusters of phosphate transporters in apple ( Malus domestica ), including 37 putative genes. The MdPHT1 family contains 14 genes while MdPHT2 has two, MdPHT3 has seven, MdPHT4 has 11, and MdPHT5 has three. Our overview of this gene family focused on structure, chromosomal distribution and localization, phylogenies, and motifs. These genes displayed differential expression patterns in various tissues. For example, expression was high for MdPHT1;12, MdPHT3;6 , and MdPHT3;7 in the roots, and was also increased in response to low-phosphorus conditions. In contrast, MdPHT4;1, MdPHT4;4 , and MdPHT4;10 were expressed only in the leaves while transcript levels of MdPHT1;4, MdPHT1;12 , and MdPHT5;3 were highest in flowers. In general, these 37 genes were regulated significantly in either roots or leaves in response to the imposition of phosphorus and/or drought stress. The results suggest that members of the PHT family function in plant adaptations to adverse growing environments. Our study will lay a foundation for better understanding the PHT family evolution and exploring genes of interest for genetic improvement in apple.

  6. Novel mutations in Norrie disease gene in Japanese patients with Norrie disease and familial exudative vitreoretinopathy.

    Science.gov (United States)

    Kondo, Hiroyuki; Qin, Minghui; Kusaka, Shunji; Tahira, Tomoko; Hasebe, Haruyuki; Hayashi, Hideyuki; Uchio, Eiichi; Hayashi, Kenshi

    2007-03-01

    To search for mutations in the Norrie disease gene (NDP) in Japanese patients with familial exudative vitreoretinopathy (FEVR) and Norrie disease (ND) and to delineate the mutation-associated clinical features. Direct sequencing after polymerase chain reaction of all exons of the NDP gene was performed on blood collected from 62 probands (31 familial and 31 simplex) with FEVR, from 3 probands with ND, and from some of their family members. The clinical symptoms and signs in the patients with mutations were assessed. X-inactivation in the female carriers was examined in three FEVR families by using leukocyte DNA. Four novel mutations-I18K, K54N, R115L, and IVS2-1G-->A-and one reported mutation, R97P, in the NDP gene were identified in six families. The severity of vitreoretinopathy varied among these patients. Three probands with either K54N or R115L had typical features of FEVR, whereas the proband with R97P had those of ND. Families with IVS2-1G-->A exhibited either ND or FEVR characteristics. A proband with I18K presented with significant phenotypic heterogeneity between the two eyes. In addition, affected female carriers in a family harboring the K54N mutation presented with different degrees of vascular abnormalities in the periphery of the retina. X-inactivation profiles indicated that the skewing was not significantly different between affected and unaffected women. These observations indicate that mutations of the NDP gene can cause ND and 6% of FEVR cases in the Japanese population. The X-inactivation assay with leukocytes may not be predictive of the presence of a mutation in affected female carriers.

  7. GSV Annotated Bibliography

    Energy Technology Data Exchange (ETDEWEB)

    Roberts, Randy S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Pope, Paul A. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Jiang, Ming [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Trucano, Timothy G. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Aragon, Cecilia R. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Ni, Kevin [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Wei, Thomas [Argonne National Lab. (ANL), Argonne, IL (United States); Chilton, Lawrence K. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Bakel, Alan [Argonne National Lab. (ANL), Argonne, IL (United States)

    2010-09-14

    The following annotated bibliography was developed as part of the geospatial algorithm verification and validation (GSV) project for the Simulation, Algorithms and Modeling program of NA-22. Verification and Validation of geospatial image analysis algorithms covers a wide range of technologies. Papers in the bibliography are thus organized into the following five topic areas: Image processing and analysis, usability and validation of geospatial image analysis algorithms, image distance measures, scene modeling and image rendering, and transportation simulation models. Many other papers were studied during the course of the investigation including. The annotations for these articles can be found in the paper "On the verification and validation of geospatial image analysis algorithms".

  8. Tidying up international nucleotide sequence databases: ecological, geographical and sequence quality annotation of its sequences of mycorrhizal fungi.

    Science.gov (United States)

    Tedersoo, Leho; Abarenkov, Kessy; Nilsson, R Henrik; Schüssler, Arthur; Grelet, Gwen-Aëlle; Kohout, Petr; Oja, Jane; Bonito, Gregory M; Veldre, Vilmar; Jairus, Teele; Ryberg, Martin; Larsson, Karl-Henrik; Kõljalg, Urmas

    2011-01-01

    Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi.

  9. Genomic sequence and organization of two members of a human lectin gene family

    International Nuclear Information System (INIS)

    Gitt, M.A.; Barondes, S.H.

    1991-01-01

    The authors have isolated and sequenced the genomic DNA encoding a human dimeric soluble lactose-binding lectin. The gene has four exons, and its upstream region contains sequences that suggest control by glucocorticoids, heat (environmental) shock, metals, and other factors. They have also isolated and sequenced three exons of the gene encoding another human putative lectin, the existence of which was first indicated by isolation of its cDNA. Comparisons suggest a general pattern of genomic organization of members of this lectin gene family

  10. Evolutionary Relationship and Structural Characterization of the EPF/EPFL Gene Family

    OpenAIRE

    Takata, Naoki; Yokota, Kiyonobu; Ohki, Shinya; Mori, Masashi; Taniguchi, Toru; Kurita, Manabu

    2013-01-01

    EPF1-EPF2 and EPFL9/Stomagen act antagonistically in regulating leaf stomatal density. The aim of this study was to elucidate the evolutionary functional divergence of EPF/EPFL family genes. Phylogenetic analyses showed that AtEPFL9/Stomagen-like genes are conserved only in vascular plants and are closely related to AtEPF1/EPF2-like genes. Modeling showed that EPF/EPFL peptides share a common 3D structure that is constituted of a scaffold and loop. Molecular dynamics simulation suggested that...

  11. eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations

    DEFF Research Database (Denmark)

    Muller, J; Szklarczyk, D; Julien, P

    2010-01-01

    The identification of orthologous relationships forms the basis for most comparative genomics studies. Here, we present the second version of the eggNOG database, which contains orthologous groups (OGs) constructed through identification of reciprocal best BLAST matches and triangular linkage...... of the tree of life; in addition to the species groups included in our first release (i.e. fungi, metazoa, insects, vertebrates and mammals), we have now constructed OGs for archaea, fishes, rodents and primates. We automatically annotate the non-supervised orthologous groups (NOGs) with functional...... descriptions, protein domains, and functional categories as defined initially for the COG/KOG database. In-depth analysis is facilitated by precomputed high-quality multiple sequence alignments and maximum-likelihood trees for each of the available OGs. Altogether, eggNOG covers 2,242 035 proteins (built from...

  12. Exome sequencing of Pakistani consanguineous families identifies 30 novel candidate genes for recessive intellectual disability.

    Science.gov (United States)

    Riazuddin, S; Hussain, M; Razzaq, A; Iqbal, Z; Shahzad, M; Polla, D L; Song, Y; van Beusekom, E; Khan, A A; Tomas-Roca, L; Rashid, M; Zahoor, M Y; Wissink-Lindhout, W M; Basra, M A R; Ansar, M; Agha, Z; van Heeswijk, K; Rasheed, F; Van de Vorst, M; Veltman, J A; Gilissen, C; Akram, J; Kleefstra, T; Assir, M Z; Grozeva, D; Carss, K; Raymond, F L; O'Connor, T D; Riazuddin, S A; Khan, S N; Ahmed, Z M; de Brouwer, A P M; van Bokhoven, H; Riazuddin, S

    2017-11-01

    Intellectual disability (ID) is a clinically and genetically heterogeneous disorder, affecting 1-3% of the general population. Although research into the genetic causes of ID has recently gained momentum, identification of pathogenic mutations that cause autosomal recessive ID (ARID) has lagged behind, predominantly due to non-availability of sizeable families. Here we present the results of exome sequencing in 121 large consanguineous Pakistani ID families. In 60 families, we identified homozygous or compound heterozygous DNA variants in a single gene, 30 affecting reported ID genes and 30 affecting novel candidate ID genes. Potential pathogenicity of these alleles was supported by co-segregation with the phenotype, low frequency in control populations and the application of stringent bioinformatics analyses. In another eight families segregation of multiple pathogenic variants was observed, affecting 19 genes that were either known or are novel candidates for ID. Transcriptome profiles of normal human brain tissues showed that the novel candidate ID genes formed a network significantly enriched for transcriptional co-expression (P<0.0001) in the frontal cortex during fetal development and in the temporal-parietal and sub-cortex during infancy through adulthood. In addition, proteins encoded by 12 novel ID genes directly interact with previously reported ID proteins in six known pathways essential for cognitive function (P<0.0001). These results suggest that disruptions of temporal parietal and sub-cortical neurogenesis during infancy are critical to the pathophysiology of ID. These findings further expand the existing repertoire of genes involved in ARID, and provide new insights into the molecular mechanisms and the transcriptome map of ID.

  13. Whole genome duplications and expansion of the vertebrate GATA transcription factor gene family

    Directory of Open Access Journals (Sweden)

    Bowerman Bruce

    2009-08-01

    Full Text Available Abstract Background GATA transcription factors influence many developmental processes, including the specification of embryonic germ layers. The GATA gene family has significantly expanded in many animal lineages: whereas diverse cnidarians have only one GATA transcription factor, six GATA genes have been identified in many vertebrates, five in many insects, and eleven to thirteen in Caenorhabditis nematodes. All bilaterian animal genomes have at least one member each of two classes, GATA123 and GATA456. Results We have identified one GATA123 gene and one GATA456 gene from the genomic sequence of two invertebrate deuterostomes, a cephalochordate (Branchiostoma floridae and a hemichordate (Saccoglossus kowalevskii. We also have confirmed the presence of six GATA genes in all vertebrate genomes, as well as additional GATA genes in teleost fish. Analyses of conserved sequence motifs and of changes to the exon-intron structure, and molecular phylogenetic analyses of these deutero