Abstract BACKGROUND: Progress in genome sequencing is proceeding at an exponential pace, and several new algal genomes are becoming available every year. One of the challenges facing the community is the association of protein sequences encoded in the genomes with biological function. While most genome assembly projects generate annotations for predicted protein sequences, they are usually limited and integrate functional terms from a limited number of databases. Another challenge is the use of annotations to interpret large lists of 'interesting' genes generated by genome-scale datasets. Previously, these gene lists had to be analyzed across several independent biological databases, often on a gene-by-gene basis. In contrast, several annotation databases, such as DAVID, integrate data from multiple functional databases and reveal underlying biological themes of large gene lists. While several such databases have been constructed for animals, none is currently available for the study of algae. Due to renewed interest in algae as potential sources of biofuels and the emergence of multiple algal genome sequences, a significant need has arisen for such a database to process the growing compendiums of algal genomic data. DESCRIPTION: The Algal Functional Annotation Tool is a web-based comprehensive analysis suite integrating annotation data from several pathway, ontology, and protein family databases. The current version provides annotation for the model alga Chlamydomonas reinhardtii, and in the future will include additional genomes. The site allows users to interpret large gene lists by identifying associated functional terms, and their enrichment. Additionally, expression data for several experimental conditions were compiled and analyzed to provide an expression-based enrichment search. A tool to search for functionally-related genes based on gene expression across these conditions is also provided. Other features include dynamic visualization of genes
Lopez, D. [UCLA; Casero, D. [UCLA; Cokus, S. J. [UCLA; Merchant, S. S. [UCLA; Pellegrini, M. [UCLA
The Algal Functional Annotation Tool is a web-based comprehensive analysis suite integrating annotation data from several pathway, ontology, and protein family databases. The current version provides annotation for the model alga Chlamydomonas reinhardtii, and in the future will include additional genomes. The site allows users to interpret large gene lists by identifying associated functional terms, and their enrichment. Additionally, expression data for several experimental conditions were compiled and analyzed to provide an expression-based enrichment search. A tool to search for functionally-related genes based on gene expression across these conditions is also provided. Other features include dynamic visualization of genes on KEGG pathway maps and batch gene identifier conversion.
Maeda, Kazuaki; Bird, Steven; Ma, Xiaoyi; Lee, Haejoong
The Annotation Graph Toolkit is a collection of software supporting the development of annotation tools based on the annotation graph model. The toolkit includes application programming interfaces for manipulating annotation graph data and for importing data from other formats. There are interfaces for the scripting languages Tcl and Python, a database interface, specialized graphical user interfaces for a variety of annotation tasks, and several sample applications. This paper describes all ...
Kalkatawi, Manal Matoq Saeed
Background Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). Results The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON’s utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27 %, while the number of genes without any function assignment is reduced. Conclusions We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/
Hend S. Al-Khalifa; Davis, Hugh C.
There are many resources on the Web which are suitable for educational purposes. Unfortunately the task of identifying suitable resources for a particular educational purpose is difficult as they have not typically been annotated with educational metadata. However, many resources have now been annotated in an unstructured manner within contemporary social bookmaking services. This paper describes a novel tool called ‘FolksAnnotation’ that creates annotations with educational semantics from th...
Full Text Available Large-scale genome projects have generated a rapidly increasing number of DNA sequences. Therefore, development of computational methods to rapidly analyze these sequences is essential for progress in genomic research. Here we present an automatic annotation system for preliminary analysis of DNA sequences. The gene annotation tool (GATO is a Bioinformatics pipeline designed to facilitate routine functional annotation and easy access to annotated genes. It was designed in view of the frequent need of genomic researchers to access data pertaining to a common set of genes. In the GATO system, annotation is generated by querying some of the Web-accessible resources and the information is stored in a local database, which keeps a record of all previous annotation results. GATO may be accessed from everywhere through the internet or may be run locally if a large number of sequences are going to be annotated. It is implemented in PHP and Perl and may be run on any suitable Web server. Usually, installation and application of annotation systems require experience and are time consuming, but GATO is simple and practical, allowing anyone with basic skills in informatics to access it without any special training. GATO can be downloaded at [http://mariwork.iq.usp.br/gato/]. Minimum computer free space required is 2 MB.
Fujita, A; Massirer, K B; Durham, A M; Ferreira, C E; Sogayar, M C
Large-scale genome projects have generated a rapidly increasing number of DNA sequences. Therefore, development of computational methods to rapidly analyze these sequences is essential for progress in genomic research. Here we present an automatic annotation system for preliminary analysis of DNA sequences. The gene annotation tool (GATO) is a Bioinformatics pipeline designed to facilitate routine functional annotation and easy access to annotated genes. It was designed in view of the frequent need of genomic researchers to access data pertaining to a common set of genes. In the GATO system, annotation is generated by querying some of the Web-accessible resources and the information is stored in a local database, which keeps a record of all previous annotation results. GATO may be accessed from everywhere through the internet or may be run locally if a large number of sequences are going to be annotated. It is implemented in PHP and Perl and may be run on any suitable Web server. Usually, installation and application of annotation systems require experience and are time consuming, but GATO is simple and practical, allowing anyone with basic skills in informatics to access it without any special training. GATO can be downloaded at [http://mariwork.iq.usp.br/gato/]. Minimum computer free space required is 2 MB. PMID:16258624
Douglas, Kathy A.; Lang, Josephine; Colasante, Meg
Blended learning has been evolving as an important approach to learning and teaching in tertiary education. This approach incorporates learning in both online and face-to-face modes and promotes deep learning by incorporating the best of both approaches. An innovation in blended learning is the use of an online media annotation tool (MAT) in…
Arumugam, Manimozhiyan; Harrington, Eoghan D; Foerstner, Konrad U;
SUMMARY: SmashCommunity is a stand-alone metagenomic annotation and analysis pipeline suitable for data from Sanger and 454 sequencing technologies. It supports state-of-the-art software for essential metagenomic tasks such as assembly and gene prediction. It provides tools to estimate...... the quantitative phylogenetic and functional compositions of metagenomes, to compare compositions of multiple metagenomes and to produce intuitive visual representations of such analyses. AVAILABILITY: SmashCommunity is freely available at http://www.bork.embl.de/software/smash CONTACT: email@example.com....
Cette thèse propose une nouvelle méthodologie pour la construction et l’utilisation de modèles à base de connaissances pour l'annotation automatique d'images. Plus précisément, nous proposons dans un premier lieu des approches pour la construction automatique de modèles de connaissances explicites et structurés, à savoir des hiérarchies sémantiques et des ontologies multimédia adaptées pour l'annotation d'images. Ainsi, nous proposons une approche pour la construction automatique de hiérarchi...
Bhatia, Vivek N; Perlman, David H; Costello, Catherine E; McComb, Mark E
In order that biological meaning may be derived and testable hypotheses may be built from proteomics experiments, assignments of proteins identified by mass spectrometry or other techniques must be supplemented with additional notation, such as information on known protein functions, protein-protein interactions, or biological pathway associations. Collecting, organizing, and interpreting this data often requires the input of experts in the biological field of study, in addition to the time-consuming search for and compilation of information from online protein databases. Furthermore, visualizing this bulk of information can be challenging due to the limited availability of easy-to-use and freely available tools for this process. In response to these constraints, we have undertaken the design of software to automate annotation and visualization of proteomics data in order to accelerate the pace of research. Here we present the Software Tool for Researching Annotations of Proteins (STRAP), a user-friendly, open-source C# application. STRAP automatically obtains gene ontology (GO) terms associated with proteins in a proteomics results ID list using the freely accessible UniProtKB and EBI GOA databases. Summarized in an easy-to-navigate tabular format, STRAP results include meta-information on the protein in addition to complementary GO terminology. Additionally, this information can be edited by the user so that in-house expertise on particular proteins may be integrated into the larger data set. STRAP provides a sortable tabular view for all terms, as well as graphical representations of GO-term association data in pie charts (biological process, cellular component, and molecular function) and bar charts (cross comparison of sample sets) to aid in the interpretation of large data sets and differential analyses experiments. Furthermore, proteins of interest may be exported as a unique FASTA-formatted file to allow for customizable re-searching of mass spectrometry
Liu, Chia-Hsin; Ho, Bing-Ching; Chen, Chun-Ling; Chang, Ya-Hsuan; Hsu, Yi-Chiung; Li, Yu-Cheng; Yuan, Shin-Sheng; Huang, Yi-Huan; Chang, Chi-Sheng; Li, Ker-Chau; Chen, Hsuan-Yu
Recently, with the development of next generation sequencing (NGS), the combination of chromatin immunoprecipitation (ChIP) and NGS, namely ChIP-seq, has become a powerful technique to capture potential genomic binding sites of regulatory factors, histone modifications and chromatin accessible regions. For most researchers, additional information including genomic variations on the TF binding site, allele frequency of variation between different populations, variation associated disease, and other neighbour TF binding sites are essential to generate a proper hypothesis or a meaningful conclusion. Many ChIP-seq datasets had been deposited on the public domain to help researchers make new discoveries. However, researches are often intimidated by the complexity of data structure and largeness of data volume. Such information would be more useful if they could be combined or downloaded with ChIP-seq data. To meet such demands, we built a webtool: ePIgenomic ANNOtation tool (ePIANNO, http://epianno.stat.sinica.edu.tw/index.html). ePIANNO is a web server that combines SNP information of populations (1000 Genomes Project) and gene-disease association information of GWAS (NHGRI) with ChIP-seq (hmChIP, ENCODE, and ROADMAP epigenomics) data. ePIANNO has a user-friendly website interface allowing researchers to explore, navigate, and extract data quickly. We use two examples to demonstrate how users could use functions of ePIANNO webserver to explore useful information about TF related genomic variants. Users could use our query functions to search target regions, transcription factors, or annotations. ePIANNO may help users to generate hypothesis or explore potential biological functions for their studies. PMID:26859295
Full Text Available Recently, with the development of next generation sequencing (NGS, the combination of chromatin immunoprecipitation (ChIP and NGS, namely ChIP-seq, has become a powerful technique to capture potential genomic binding sites of regulatory factors, histone modifications and chromatin accessible regions. For most researchers, additional information including genomic variations on the TF binding site, allele frequency of variation between different populations, variation associated disease, and other neighbour TF binding sites are essential to generate a proper hypothesis or a meaningful conclusion. Many ChIP-seq datasets had been deposited on the public domain to help researchers make new discoveries. However, researches are often intimidated by the complexity of data structure and largeness of data volume. Such information would be more useful if they could be combined or downloaded with ChIP-seq data. To meet such demands, we built a webtool: ePIgenomic ANNOtation tool (ePIANNO, http://epianno.stat.sinica.edu.tw/index.html. ePIANNO is a web server that combines SNP information of populations (1000 Genomes Project and gene-disease association information of GWAS (NHGRI with ChIP-seq (hmChIP, ENCODE, and ROADMAP epigenomics data. ePIANNO has a user-friendly website interface allowing researchers to explore, navigate, and extract data quickly. We use two examples to demonstrate how users could use functions of ePIANNO webserver to explore useful information about TF related genomic variants. Users could use our query functions to search target regions, transcription factors, or annotations. ePIANNO may help users to generate hypothesis or explore potential biological functions for their studies.
The purpose of the study was to understand student interaction and learning supported by a collaboratively social annotation tool--Diigo. The researcher examined through a case study how students participated and interacted when learning an online text with the social annotation tool--Diigo, and how they perceived their experience. The findings…
L'annotation automatique d'images est un domaine du traitement d'images permettant d'associer automatiquement des mots-clés ou du texte à des images à partir de leur contenu afin de pouvoir ensuite rechercher des images par requête textuelle. L'annotation automatique d'images cherche à combler les lacunes des deux autres approches actuelles permettant la recherche d'images à partir de requête textuelle. La première consiste à annoter manuellement les images, ce qui n'est plus envisageable ave...
Futrelle, J.; York, A.
We present a design for a web-based image annotation interface developed to assist in supervised classification of organisms and substrate for habitat assessment from multiple, heterogeneous oceanographic imaging systems. The interface enables human image annotators to count, identify, and measure targets and classify substrate in a variety of kinds of imagery including benthic surveys and imaging flow cytometry. These annotations are then used to build training sets for supervised classification algorithms for purposes of characterizing community structure and habitat assessment. The Ocean Imaging Informatics team at WHOI used the Tetherless World Constellation's collaborative design methodology to develop shared formal information model and system design that applies to a variety of image annotation use cases. Because the information model represents consensus between researchers with differing instrumentation and science needs, it assists with rapid prototyping and establishes a baseline against which existing and forthcoming image annotation tools can be evaluated. A technology review suggested that there are few general-purpose image annotation tools suitable for annotation of high-volume oceanographic imagery. Most tools require too many steps for operations that must be repeated thousands of times, and/or lack critical features such as display of instrument metadata, QA/QC, and management of annotator tasks. While some of these problems are user interface limitations, others suggest that existing tools are missing critically important concepts. For example, QA/QC appears in our information model as an "activity stream" associated with each image annotation, consisting of events indicating review status, specific image quality issues, etc. The model also includes "identification modes" that contextualize annotations according to the annotator's assigned task, assisting both with interpreting annotations and with providing contextual user interface shortcuts
Tang, Peter Torben; Jensen, Jens Dahl
Throughout the evolution of microelectronics, the electrochemical deposition technique has played a vital role in manufacturing of printed circuit boards (PCB´s),in deposition of materials for packaging processes such as flip-chip bonding and recently also in fabrication of interconnects for ultra...... large scale integrated (ULSI) circuits. During the last 5 to 10 years electrochemical deposition has also achieved attention as a tool for manufacturing of micro electromechanical systems (MEMS), microfluidic and optical systems (creating inserts for the injection moulding of polymer optics...... or microfluidic systems) and many other areas within the field of micro systems technology (MST). The present paper describes some applications for which electrodeposition of various metals have been utilised for manufacturing of MEMS and modern interconnect components....
A novel visualized sound description, called sound dendrogram is proposed to make manual annotation easier when building large speech corpora. It is a lattice structure built from a group of "seed regions" and through an iterative procedure of mergence. A simple but reliable extraction method of "seed regions" and advanced distance metric are adopted to construct the sound dendrogram, so that it can present speech's structure character ranging from coarse to fine in a visualized way. Tests show that all phonemic boundaries are contained in the lattice structure of sound dendrogram and very easy to identify. Sound dendrogram can be a powerful assistant tool during the process of speech corpora's manual annotation.
Makarov, Vladimir; O'Grady, Tina; Cai, Guiqing; Lihm, Jayon; Buxbaum, Joseph D; Yoon, Seungtai
Summary: AnnTools is a versatile bioinformatics application designed for comprehensive annotation of a full spectrum of human genome variation: novel and known single-nucleotide substitutions (SNP/SNV), short insertions/deletions (INDEL) and structural variants/copy number variation (SV/CNV). The variants are interpreted by interrogating data compiled from 15 constantly updated sources. In addition to detailed functional characterization of the coding variants, AnnTools searches for overlaps ...
Navas-Delgado, Ismael; García Godoy, María Jesús; Arjona-Pulido, Fátima; Castillo-Castillo, Trinidad; Ramos-Ostio, Ana Isabel; Ifantes Díaz, Sarai; Medina García, Ana; Aldana-Montes, José F.
Currently, clinical interpretation of whole-genome NGS genetic findings are very low-throughput because of a lack of computational tools/software. The current bottleneck of whole-genome and whole-exome sequencing projects is in structured data management and sophisticated computational analysis of experimental data. In this work, we have started designing a platform for integrating, in a first step, existing analysis tools and adding annotations from public databases to the findings of these ...
D. K. Iakovidis
Full Text Available Image segmentation and annotation are key components of image-based medical computer-aided diagnosis (CAD systems. In this paper we present Ratsnake, a publicly available generic image annotation tool providing annotation efficiency, semantic awareness, versatility, and extensibility, features that can be exploited to transform it into an effective CAD system. In order to demonstrate this unique capability, we present its novel application for the evaluation and quantification of salient objects and structures of interest in kidney biopsy images. Accurate annotation identifying and quantifying such structures in microscopy images can provide an estimation of pathogenesis in obstructive nephropathy, which is a rather common disease with severe implication in children and infants. However a tool for detecting and quantifying the disease is not yet available. A machine learning-based approach, which utilizes prior domain knowledge and textural image features, is considered for the generation of an image force field customizing the presented tool for automatic evaluation of kidney biopsy images. The experimental evaluation of the proposed application of Ratsnake demonstrates its efficiency and effectiveness and promises its wide applicability across a variety of medical imaging domains.
Panitz, Frank; Stengaard, Henrik; Hornshoj, Henrik;
MOTIVATION: Single nucleotide polymorphisms (SNPs) analysis is an important means to study genetic variation. A fast and cost-efficient approach to identify large numbers of novel candidates is the SNP mining of large scale sequencing projects. The increasing availability of sequence trace data in...... public repositories makes it feasible to evaluate SNP predictions on the DNA chromatogram level. MAVIANT, a platform-independent Multipurpose Alignment VIewing and Annotation Tool, provides DNA chromatogram and alignment views and facilitates evaluation of predictions. In addition, it supports direct...... manual annotation, which is immediately accessible and can be easily shared with external collaborators. RESULTS: Large-scale SNP mining of polymorphisms bases on porcine EST sequences yielded more than 7900 candidate SNPs in coding regions (cSNPs), which were annotated relative to the human genome. Non...
Merchant Sabeeha S
Full Text Available Abstract Background Progress in genome sequencing is proceeding at an exponential pace, and several new algal genomes are becoming available every year. One of the challenges facing the community is the association of protein sequences encoded in the genomes with biological function. While most genome assembly projects generate annotations for predicted protein sequences, they are usually limited and integrate functional terms from a limited number of databases. Another challenge is the use of annotations to interpret large lists of 'interesting' genes generated by genome-scale datasets. Previously, these gene lists had to be analyzed across several independent biological databases, often on a gene-by-gene basis. In contrast, several annotation databases, such as DAVID, integrate data from multiple functional databases and reveal underlying biological themes of large gene lists. While several such databases have been constructed for animals, none is currently available for the study of algae. Due to renewed interest in algae as potential sources of biofuels and the emergence of multiple algal genome sequences, a significant need has arisen for such a database to process the growing compendiums of algal genomic data. Description The Algal Functional Annotation Tool is a web-based comprehensive analysis suite integrating annotation data from several pathway, ontology, and protein family databases. The current version provides annotation for the model alga Chlamydomonas reinhardtii, and in the future will include additional genomes. The site allows users to interpret large gene lists by identifying associated functional terms, and their enrichment. Additionally, expression data for several experimental conditions were compiled and analyzed to provide an expression-based enrichment search. A tool to search for functionally-related genes based on gene expression across these conditions is also provided. Other features include dynamic visualization of
Somaroo Shyamal; Wang Yaoyu E; Hong Eun-Jong; Ocano Marco; Mathur Vidhya; Dana Paul H; Caffrey Daniel R; Caffrey Brian E; Potluri Shobha; Huang Enoch S
Abstract Background By virtue of their shared ancestry, homologous sequences are similar in their structure and function. Consequently, multiple sequence alignments are routinely used to identify trends that relate to function. This type of analysis is particularly productive when it is combined with structural and phylogenetic analysis. Results Here we describe the release of PFAAT version 2.0, a tool for editing, analyzing, and annotating multiple sequence alignments. Support for multiple a...
Van Dyk Derek
Full Text Available Abstract Background There is an ever increasing rate of data made available on genetic variation, transcriptomes and proteomes. Similarly, a growing variety of bioinformatic programs are becoming available from many diverse sources, designed to identify a myriad of sequence patterns considered to have potential biological importance within inter-genic regions, genes, transcripts, and proteins. However, biologists require easy to use, uncomplicated tools to integrate this information, visualise and print gene annotations. Integrating this information usually requires considerable informatics skills, and comprehensive knowledge of the data format to make full use of this information. Tools are needed to explore gene model variants by allowing users the ability to create alternative transcript models using novel combinations of exons not necessarily represented in current database deposits of mRNA/cDNA sequences. Results Djinn Lite is designed to be an intuitive program for storing and visually exploring of custom annotations relating to a eukaryotic gene sequence and its modelled gene products. In particular, it is helpful in developing hypothesis regarding alternate splicing of transcripts by allowing the construction of model transcripts and inspection of their resulting translations. It facilitates the ability to view a gene and its gene products in one synchronised graphical view, allowing one to drill down into sequence related data. Colour highlighting of selected sequences and added annotations further supports exploration, visualisation of sequence regions and motifs known or predicted to be biologically significant. Conclusion Gene annotating remains an ongoing and challengingtask that will continue as gene structures, gene transcription repertoires, disease loci, protein products and their interactions become moreprecisely defined. Djinn Lite offers an accessible interface to help accumulate, enrich, and individualise sequence
Full Text Available Abstract Background In the last years several high-throughput cDNA sequencing projects have been funded worldwide with the aim of identifying and characterizing the structure of complete novel human transcripts. However some of these cDNAs are error prone due to frameshifts and stop codon errors caused by low sequence quality, or to cloning of truncated inserts, among other reasons. Therefore, accurate CDS prediction from these sequences first require the identification of potentially problematic cDNAs in order to speed up the posterior annotation process. Results cDNA2Genome is an application for the automatic high-throughput mapping and characterization of cDNAs. It utilizes current annotation data and the most up to date databases, especially in the case of ESTs and mRNAs in conjunction with a vast number of approaches to gene prediction in order to perform a comprehensive assessment of the cDNA exon-intron structure. The final result of cDNA2Genome is an XML file containing all relevant information obtained in the process. This XML output can easily be used for further analysis such us program pipelines, or the integration of results into databases. The web interface to cDNA2Genome also presents this data in HTML, where the annotation is additionally shown in a graphical form. cDNA2Genome has been implemented under the W3H task framework which allows the combination of bioinformatics tools in tailor-made analysis task flows as well as the sequential or parallel computation of many sequences for large-scale analysis. Conclusions cDNA2Genome represents a new versatile and easily extensible approach to the automated mapping and annotation of human cDNAs. The underlying approach allows sequential or parallel computation of sequences for high-throughput analysis of cDNAs.
Full Text Available Abstract Background High-throughput genome biological experiments yield large and multifaceted datasets that require flexible and user-friendly analysis tools to facilitate their interpretation by life scientists. Many solutions currently exist, but they are often limited to specific steps in the complex process of data management and analysis and some require extensive informatics skills to be installed and run efficiently. Results We developed the Annotation, Mapping, Expression and Network (AMEN software as a stand-alone, unified suite of tools that enables biological and medical researchers with basic bioinformatics training to manage and explore genome annotation, chromosomal mapping, protein-protein interaction, expression profiling and proteomics data. The current version provides modules for (i uploading and pre-processing data from microarray expression profiling experiments, (ii detecting groups of significantly co-expressed genes, and (iii searching for enrichment of functional annotations within those groups. Moreover, the user interface is designed to simultaneously visualize several types of data such as protein-protein interaction networks in conjunction with expression profiles and cellular co-localization patterns. We have successfully applied the program to interpret expression profiling data from budding yeast, rodents and human. Conclusion AMEN is an innovative solution for molecular systems biological data analysis freely available under the GNU license. The program is available via a website at the Sourceforge portal which includes a user guide with concrete examples, links to external databases and helpful comments to implement additional functionalities. We emphasize that AMEN will continue to be developed and maintained by our laboratory because it has proven to be extremely useful for our genome biological research program.
Full Text Available Abstract Background Recent years have seen the emergence of genome annotation methods based on the phylo-grammar, a probabilistic model combining continuous-time Markov chains and stochastic grammars. Previously, phylo-grammars have required considerable effort to implement, limiting their adoption by computational biologists. Results We have developed an open source software tool, xrate, for working with reversible, irreversible or parametric substitution models combined with stochastic context-free grammars. xrate efficiently estimates maximum-likelihood parameters and phylogenetic trees using a novel "phylo-EM" algorithm that we describe. The grammar is specified in an external configuration file, allowing users to design new grammars, estimate rate parameters from training data and annotate multiple sequence alignments without the need to recompile code from source. We have used xrate to measure codon substitution rates and predict protein and RNA secondary structures. Conclusion Our results demonstrate that xrate estimates biologically meaningful rates and makes predictions whose accuracy is comparable to that of more specialized tools.
The Halden Viewer is a software tool that can be used to interactively walk through and annotate 3D models of environments while visualising radiation dose-rate data. Paths can be defined through the environment and graphs can be plotted showing the dose-rate and accumulated dose of a human moving along the paths. Typical users include radiation protection staff, staff involved in maintenance and training activities, and managers. This report comprises of a foreword documenting the history of the Halden Viewer and the Halden Viewer User Manual. The user manual contains an overview of the tool, a tutorial, and reference information. The version of the Halden Viewer described in this report is release 2.0. (Author)
Full Text Available Abstract Background Most recently, with maturing of bovine genome sequencing and high throughput SNP genotyping technologies, a large number of significant SNPs associated with economic important traits can be identified by genome-wide association studies (GWAS. To further determine true association findings in GWAS, the common strategy is to sift out most promising SNPs for follow-up replication studies. Hence it is crucial to explore the functional significance of the candidate SNPs in order to screen and select the potential functional ones. To systematically prioritize these statistically significant SNPs and facilitate follow-up replication studies, we developed a bovine SNP annotation tool (Snat based on a web interface. Results With Snat, various sources of genomic information are integrated and retrieved from several leading online databases, including SNP information from dbSNP, gene information from Entrez Gene, protein features from UniProt, linkage information from AnimalQTLdb, conserved elements from UCSC Genome Browser Database and gene functions from Gene Ontology (GO, KEGG PATHWAY and Online Mendelian Inheritance in Animals (OMIA. Snat provides two different applications, including a CGI-based web utility and a command-line version, to access the integrated database, target any single nucleotide loci of interest and perform multi-level functional annotations. For further validation of the practical significance of our study, SNPs involved in two commercial bovine SNP chips, i.e., the Affymetrix Bovine 10K chip array and the Illumina 50K chip array, have been annotated by Snat, and the corresponding outputs can be directly downloaded from Snat website. Furthermore, a real dataset involving 20 identified SNPs associated with milk yield in our recent GWAS was employed to demonstrate the practical significance of Snat. Conclusions To our best knowledge, Snat is one of first tools focusing on SNP annotation for livestock. Snat confers
Full Text Available Abstract Background The variety of DNA microarray formats and datasets presently available offers an unprecedented opportunity to perform insightful comparisons of heterogeneous data. Cross-species studies, in particular, have the power of identifying conserved, functionally important molecular processes. Validation of discoveries can now often be performed in readily available public data which frequently requires cross-platform studies. Cross-platform and cross-species analyses require matching probes on different microarray formats. This can be achieved using the information in microarray annotations and additional molecular biology databases, such as orthology databases. Although annotations and other biological information are stored using modern database models (e.g. relational, they are very often distributed and shared as tables in text files, i.e. flat file databases. This common flat database format thus provides a simple and robust solution to flexibly integrate various sources of information and a basis for the combined analysis of heterogeneous gene expression profiles. Results We provide annotationTools, a Bioconductor-compliant R package to annotate microarray experiments and integrate heterogeneous gene expression profiles using annotation and other molecular biology information available as flat file databases. First, annotationTools contains a specialized set of functions for mining this widely used database format in a systematic manner. It thus offers a straightforward solution for annotating microarray experiments. Second, building on these basic functions and relying on the combination of information from several databases, it provides tools to easily perform cross-species analyses of gene expression data. Here, we present two example applications of annotationTools that are of direct relevance for the analysis of heterogeneous gene expression profiles, namely a cross-platform mapping of probes and a cross-species mapping
Full Text Available Abstract Background Publicly available datasets of microarray gene expression signals represent an unprecedented opportunity for extracting genomic relevant information and validating biological hypotheses. However, the exploitation of this exceptionally rich mine of information is still hampered by the lack of appropriate computational tools, able to overcome the critical issues raised by meta-analysis. Results This work presents A-MADMAN, an open source web application which allows the retrieval, annotation, organization and meta-analysis of gene expression datasets obtained from Gene Expression Omnibus. A-MADMAN addresses and resolves several open issues in the meta-analysis of gene expression data. Conclusion A-MADMAN allows i the batch retrieval from Gene Expression Omnibus and the local organization of raw data files and of any related meta-information, ii the re-annotation of samples to fix incomplete, or otherwise inadequate, metadata and to create user-defined batches of data, iii the integrative analysis of data obtained from different Affymetrix platforms through custom chip definition files and meta-normalization. Software and documentation are available on-line at http://compgen.bio.unipd.it/bioinfo/amadman/.
Current market trends are moving from large quantity production towards small batch production and mass customization. This has led to the high demand for the flexibility and adaptability of manufacturing technology and systems. Several reconfigurable pin type tooling systems have been proposed and developed to satisfy such demands. However, these reconfigurable tooling systems still suffer from several drawbacks, including difficulties associated with positioning and locking the pins and ...
Pawełkowicz, Magdalena; Nowak, Robert; Osipowski, Paweł; Rymuszka, Jacek; Świerkula, Katarzyna; Wojcieszek, Michał; Przybecki, Zbigniew
A major focus of sequencing project is to identify genes in genomes. However it is necessary to define the variety of genes and the criteria for identifying them. In this work we present discrepancies and dependencies from the application of different bioinformatic programs for structural annotation performed on the cucumber data set from Polish Consortium of Cucumber Genome Sequencing. We use Fgenesh, GenScan and GeneMark to automated structural annotation, the results have been compared to reference annotation.
Nikolaichik, Yevgeny; Damienikan, Aliaksandr U
The majority of bacterial genome annotations are currently automated and based on a 'gene by gene' approach. Regulatory signals and operon structures are rarely taken into account which often results in incomplete and even incorrect gene function assignments. Here we present SigmoID, a cross-platform (OS X, Linux and Windows) open-source application aiming at simplifying the identification of transcription regulatory sites (promoters, transcription factor binding sites and terminators) in bacterial genomes and providing assistance in correcting annotations in accordance with regulatory information. SigmoID combines a user-friendly graphical interface to well known command line tools with a genome browser for visualising regulatory elements in genomic context. Integrated access to online databases with regulatory information (RegPrecise and RegulonDB) and web-based search engines speeds up genome analysis and simplifies correction of genome annotation. We demonstrate some features of SigmoID by constructing a series of regulatory protein binding site profiles for two groups of bacteria: Soft Rot Enterobacteriaceae (Pectobacterium and Dickeya spp.) and Pseudomonas spp. Furthermore, we inferred over 900 transcription factor binding sites and alternative sigma factor promoters in the annotated genome of Pectobacterium atrosepticum. These regulatory signals control putative transcription units covering about 40% of the P. atrosepticum chromosome. Reviewing the annotation in cases where it didn't fit with regulatory information allowed us to correct product and gene names for over 300 loci. PMID:27257541
Damienikan, Aliaksandr U.
The majority of bacterial genome annotations are currently automated and based on a ‘gene by gene’ approach. Regulatory signals and operon structures are rarely taken into account which often results in incomplete and even incorrect gene function assignments. Here we present SigmoID, a cross-platform (OS X, Linux and Windows) open-source application aiming at simplifying the identification of transcription regulatory sites (promoters, transcription factor binding sites and terminators) in bacterial genomes and providing assistance in correcting annotations in accordance with regulatory information. SigmoID combines a user-friendly graphical interface to well known command line tools with a genome browser for visualising regulatory elements in genomic context. Integrated access to online databases with regulatory information (RegPrecise and RegulonDB) and web-based search engines speeds up genome analysis and simplifies correction of genome annotation. We demonstrate some features of SigmoID by constructing a series of regulatory protein binding site profiles for two groups of bacteria: Soft Rot Enterobacteriaceae (Pectobacterium and Dickeya spp.) and Pseudomonas spp. Furthermore, we inferred over 900 transcription factor binding sites and alternative sigma factor promoters in the annotated genome of Pectobacterium atrosepticum. These regulatory signals control putative transcription units covering about 40% of the P. atrosepticum chromosome. Reviewing the annotation in cases where it didn’t fit with regulatory information allowed us to correct product and gene names for over 300 loci. PMID:27257541
Motion, Graham B; Howden, Andrew J M; Huitema, Edgar; Jones, Susan
There are currently 151 plants with draft genomes available but levels of functional annotation for putative protein products are low. Therefore, accurate computational predictions are essential to annotate genomes in the first instance, and to provide focus for the more costly and time consuming functional assays that follow. DNA-binding proteins are an important class of proteins that require annotation, but current computational methods are not applicable for genome wide predictions in plant species. Here, we explore the use of species and lineage specific models for the prediction of DNA-binding proteins in plants. We show that a species specific support vector machine model based on Arabidopsis sequence data is more accurate (accuracy 81%) than a generic model (74%), and based on this we develop a plant specific model for predicting DNA-binding proteins. We apply this model to the tomato proteome and demonstrate its ability to perform accurate high-throughput prediction of DNA-binding proteins. In doing so, we have annotated 36 currently uncharacterised proteins by assigning a putative DNA-binding function. Our model is publically available and we propose it be used in combination with existing tools to help increase annotation levels of DNA-binding proteins encoded in plant genomes. PMID:26304539
Schaafsma, Gerard C P; Vihinen, Mauno
The Variation Ontology (VariO) is used for describing and annotating types, effects, consequences, and mechanisms of variations. To facilitate easy and consistent annotations, the online application VariOtator was developed. For variation type annotations, VariOtator is fully automated, accepting variant descriptions in Human Genome Variation Society (HGVS) format, and generating VariO terms, either with or without full lineage, that is, all parent terms. When a coding DNA variant description with a reference sequence is provided, VariOtator checks the description first with Mutalyzer and then generates the predicted RNA and protein descriptions with their respective VariO annotations. For the other sublevels, function, structure, and property, annotations cannot be automated, and VariOtator generates annotation based on provided details. For VariO terms relating to structure and property, one can use attribute terms as modifiers and evidence code terms for annotating experimental evidence. There is an online batch version, and stand-alone batch versions to be used with a Leiden Open Variation Database (LOVD) download file. A SOAP Web service allows client programs to access VariOtator programmatically. Thus, systematic variation effect and type annotations can be efficiently generated to allow easy use and integration of variations and their consequences. PMID:26773573
Andrew M. Lynn; Chakresh Kumar Jain; K. Kosalai; Pranjan Barman; Nupur Thakur; Harish Batra; Alok Bhattacharya
Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated annotation of genome DNA sequences.
Schmid, Christoph D; Sengstag, Thierry; Bucher, Philipp; Delorenzi, Mauro
A recurring task in the analysis of mass genome annotation data from high-throughput technologies is the identification of peaks or clusters in a noisy signal profile. Examples of such applications are the definition of promoters on the basis of transcription start site profiles, the mapping of transcription factor binding sites based on ChIP-chip data and the identification of quantitative trait loci (QTL) from whole genome SNP profiles. Input to such an analysis is a set of genome coordinates associated with counts or intensities. The output consists of a discrete number of peaks with respective volumes, extensions and center positions. We have developed for this purpose a flexible one-dimensional clustering tool, called MADAP, which we make available as a web server and as standalone program. A set of parameters enables the user to customize the procedure to a specific problem. The web server, which returns results in textual and graphical form, is useful for small to medium-scale applications, as well as for evaluation and parameter tuning in view of large-scale applications, requiring a local installation. The program written in C++ can be freely downloaded from ftp://ftp.epd.unil.ch/pub/software/unix/madap. The MADAP web server can be accessed at http://www.isrec.isb-sib.ch/madap/. PMID:17526516
End-user programmers create software to solve problems, yet the problem-solving knowledge generated in the process often remains tacit within the software artifact. One approach to exposing this knowledge is to enable the end-user to annotate the artifact as they create and use it. A 3-level model of annotation is presented and guidelines are proposed for the design of end-user programming environments supporting the explicit and literate annotation levels. These guidelines are then applied to the spreadsheet end-user programming paradigm.
Full Text Available Human protein kinases play fundamental roles mediating the majority of signal transduction pathways in eukaryotic cells as well as a multitude of other processes involved in metabolism, cell-cycle regulation, cellular shape, motility, differentiation and apoptosis. The human protein kinome contains 518 members. Most studies that focus on the human kinome require, at some point, the visualization of large amounts of data. The visualization of such data within the framework of a phylogenetic tree may help identify key relationships between different protein kinases in view of their evolutionary distance and the information used to annotate the kinome tree. For example, studies that focus on the promiscuity of kinase inhibitors can benefit from the annotations to depict binding affinities across kinase groups. Images involving the mapping of information into the kinome tree are common. However, producing such figures manually can be a long arduous process prone to errors. To circumvent this issue, we have developed a web-based tool called Kinome Render (KR that produces customized annotations on the human kinome tree. KR allows the creation and automatic overlay of customizable text or shape-based annotations of different sizes and colors on the human kinome tree. The web interface can be accessed at: http://bcb.med.usherbrooke.ca/kinomerender. A stand-alone version is also available and can be run locally.
Zhang, Yuanwei; Zang, Qiguang; Zhang, Huan; Ban, Rongjun; Yang, Yifan; Iqbal, Furhan; Li, Ao; Shi, Qinghua
Small RNA (sRNA) Sequencing technology has revealed that microRNAs (miRNAs) are capable of exhibiting frequent variations from their canonical sequences, generating multiple variants: the isoforms of miRNAs (isomiRs). However, integrated tool to precisely detect and systematically annotate isomiRs from sRNA sequencing data is still in great demand. Here, we present an online tool, DeAnnIso (Detection and Annotation of IsomiRs from sRNA sequencing data). DeAnnIso can detect all the isomiRs in an uploaded sample, and can extract the differentially expressing isomiRs from paired or multiple samples. Once the isomiRs detection is accomplished, detailed annotation information, including isomiRs expression, isomiRs classification, SNPs in miRNAs and tissue specific isomiR expression are provided to users. Furthermore, DeAnnIso provides a comprehensive module of target analysis and enrichment analysis for the selected isomiRs. Taken together, DeAnnIso is convenient for users to screen for isomiRs of their interest and useful for further functional studies. The server is implemented in PHP + Perl + R and available to all users for free at: http://mcg.ustc.edu.cn/bsc/deanniso/ and http://mcg2.ustc.edu.cn/bsc/deanniso/. PMID:27179030
Panitz, Frank; Stengaard, Henrik; Hornshoj, Henrik; Gorodkin, Jan; Hedegaard, Jakob; Cirera, Susanne; Thomsen, Bo; Madsen, Lone B.; Hoj, Anette; Vingborg, Rikke K.; Zahn, Bujie; Wang, Xuegang; Wang, Xuefei; Wernersson, Rasmus; Jørgensen, Claus B.; Scheibye-Knudsen, Karsten; Arvin, Troels; Lumholdt, Steen; Sawera, Milena; Green, Trine; Nielsen, Bente J.; Havgaard, Jakob H.; Brunak, Søren; Fredholm, Merete; Bendixen, Christian
MOTIVATION: Single nucleotide polymorphisms (SNPs) analysis is an important means to study genetic variation. A fast and cost-efficient approach to identify large numbers of novel candidates is the SNP mining of large scale sequencing projects. The increasing availability of sequence trace data in...... manual annotation, which is immediately accessible and can be easily shared with external collaborators. RESULTS: Large-scale SNP mining of polymorphisms bases on porcine EST sequences yielded more than 7900 candidate SNPs in coding regions (cSNPs), which were annotated relative to the human genome. Non......-synonymous SNPs were analyzed for their potential effect on the protein structure/function using the PolyPhen and SIFT prediction programs. Predicted SNPs and annotations are stored in a web-based database. Using MAVIANT SNPs can visually be verified based on the DNA sequencing traces. A subset of candidate SNPs...
Full Text Available Abstract Background The Specialized Program of Research Excellence (SPORE in Head and Neck Cancer neoplasm virtual biorepository is a bioinformatics-supported system to incorporate data from various clinical, pathological, and molecular systems into a single architecture based on a set of common data elements (CDEs that provides semantic and syntactic interoperability of data sets. Results The various components of this annotation tool include the Development of Common Data Elements (CDEs that are derived from College of American Pathologists (CAP Checklist and North American Association of Central Cancer Registries (NAACR standards. The Data Entry Tool is a portable and flexible Oracle-based data entry device, which is an easily mastered web-based tool. The Data Query Tool helps investigators and researchers to search de-identified information within the warehouse/resource through a "point and click" interface, thus enabling only the selected data elements to be essentially copied into a data mart using a multi dimensional model from the warehouse's relational structure. The SPORE Head and Neck Neoplasm Database contains multimodal datasets that are accessible to investigators via an easy to use query tool. The database currently holds 6553 cases and 10607 tumor accessions. Among these, there are 965 metastatic, 4227 primary, 1369 recurrent, and 483 new primary cases. The data disclosure is strictly regulated by user's authorization. Conclusion The SPORE Head and Neck Neoplasm Virtual Biorepository is a robust translational biomedical informatics tool that can facilitate basic science, clinical, and translational research. The Data Query Tool acts as a central source providing a mechanism for researchers to efficiently find clinically annotated datasets and biospecimens that are relevant to their research areas. The tool protects patient privacy by revealing only de-identified data in accordance with regulations and approvals of the IRB and
The Specialized Program of Research Excellence (SPORE) in Head and Neck Cancer neoplasm virtual biorepository is a bioinformatics-supported system to incorporate data from various clinical, pathological, and molecular systems into a single architecture based on a set of common data elements (CDEs) that provides semantic and syntactic interoperability of data sets. The various components of this annotation tool include the Development of Common Data Elements (CDEs) that are derived from College of American Pathologists (CAP) Checklist and North American Association of Central Cancer Registries (NAACR) standards. The Data Entry Tool is a portable and flexible Oracle-based data entry device, which is an easily mastered web-based tool. The Data Query Tool helps investigators and researchers to search de-identified information within the warehouse/resource through a 'point and click' interface, thus enabling only the selected data elements to be essentially copied into a data mart using a multi dimensional model from the warehouse's relational structure. The SPORE Head and Neck Neoplasm Database contains multimodal datasets that are accessible to investigators via an easy to use query tool. The database currently holds 6553 cases and 10607 tumor accessions. Among these, there are 965 metastatic, 4227 primary, 1369 recurrent, and 483 new primary cases. The data disclosure is strictly regulated by user's authorization. The SPORE Head and Neck Neoplasm Virtual Biorepository is a robust translational biomedical informatics tool that can facilitate basic science, clinical, and translational research. The Data Query Tool acts as a central source providing a mechanism for researchers to efficiently find clinically annotated datasets and biospecimens that are relevant to their research areas. The tool protects patient privacy by revealing only de-identified data in accordance with regulations and approvals of the IRB and scientific review committee
Shabajee, Paul; Miller, Libby; Dingley, Andy
A group of research projects based at HP-Labs Bristol, the University of Bristol (England) and ARKive (a new large multimedia database project focused on the worlds biodiversity based in the United Kingdom) are working to develop a flexible model for the indexing of multimedia collections that allows users to annotate content utilizing extensible…
Muhimmah, I.; Nugraha, D. DC
Reading a slide under a microscope manually is very complicated. An expert may spend 3-4 hours to read a single slide. Moreover, the intra- and inter-observer variability is known to be high. This prototype was developed to simplify the slide-reading process on Android devices in order to accelerate the reading process and generate more accurate information.The prototype allows users to annotate the boundaries of an object. Moreover, the proposed prototype has successfully reconstructed multiple object boundaries into simple closed curves from a limited amount of user input.Thecoordinates of the annotated objects are stored in a text file (*.txt) that can be usedfor further analysis.The prototype's performance with respect to time and memory usage are included.
Pers, Tune Hannes; Timshel, Pascal; Hirschhorn, Joel N.
Summary : An important computational step following genome-wide association studies (GWAS) is to assess whether disease or trait-associated single-nucleotide polymorphisms (SNPs) enrich for particular biological annotations. SNP-based enrichment analysis needs to account for biases such as co......-localization of GWAS signals to gene-dense and high linkage disequilibrium (LD) regions, and correlations of gene size, location and function. The SNPsnap Web server enables SNP-based enrichment analysis by providing matched sets of SNPs that can be used to calibrate background expectations. Specifically, SNPsnap...... efficiently identifies sets of randomly drawn SNPs that are matched to a set of query SNPs based on allele frequency, number of SNPs in LD, distance to nearest gene and gene density. Availability and implementation : SNPsnap server is available at http://www.broadinstitute.org/mpg/snpsnap/. Contact: joelh...
Virtual Ribosome is a DNA translation tool with two areas of focus. ( i) Providing a strong translation tool in its own right, with an integrated ORF finder, full support for the IUPAC degenerate DNA alphabet and all translation tables defined by the NCBI taxonomy group, including the use of...
Gu, Shengyin; Anderson, Iain; Kunin, Victor; Cipriano, Michael; Minovitsky, Simon; Weber, Gunther; Amenta, Nina; Hamann, Bernd; Dubchak,Inna
Summary: We describe a general multiplatform exploratorytool called TreeQ-Vista, designed for presenting functional annotationsin a phylogenetic context. Traits, such as phenotypic and genomicproperties, are interactively queried from a relational database with auser-friendly interface which provides a set of tools for users with orwithout SQL knowledge. The query results are projected onto aphylogenetic tree and can be displayed in multiple color groups. A richset of browsing, grouping and query tools are provided to facilitatetrait exploration, comparison and analysis.Availability: The program,detailed tutorial and examples are available online athttp://genome-test.lbl.gov/vista/TreeQVista.
Abate, Francesco; Zairis, Sakellarios; Ficarra, Elisa; Acquaviva, Andrea; Wiggins, Chris H.; Frattini, Veronique; Lasorella, Anna; Iavarone, Antonio; Inghirami, Giorgio; Rabadan, Raul
Background The extraordinary success of imatinib in the treatment of BCR-ABL1 associated cancers underscores the need to identify novel functional gene fusions in cancer. RNA sequencing offers a genome-wide view of expressed transcripts, uncovering biologically functional gene fusions. Although several bioinformatics tools are already available for the detection of putative fusion transcripts, candidate event lists are plagued with non-functional read-through events, reverse transcriptase tem...
Romualdi Chiara; Risso Davide; Ferrari Francesco; Coppe Alessandro; Bisognin Andrea; Bicciato Silvio; Bortoluzzi Stefania
Abstract Background Publicly available datasets of microarray gene expression signals represent an unprecedented opportunity for extracting genomic relevant information and validating biological hypotheses. However, the exploitation of this exceptionally rich mine of information is still hampered by the lack of appropriate computational tools, able to overcome the critical issues raised by meta-analysis. Results This work presents A-MADMAN, an open source web application which allows the retr...
Full Text Available Abstract Background While the genomic annotations of diverse lineages of the Mycobacterium tuberculosis complex are available, divergences between gene prediction methods are still a challenge for unbiased protein dataset generation. M. tuberculosis gene annotation is an example, where the most used datasets from two independent institutions (Sanger Institute and Institute of Genomic Research-TIGR differ up to 12% in the number of annotated open reading frames, and 46% of the genes contained in both annotations have different start codons. Such differences emphasize the importance of the identification of the sequence of protein products to validate each gene annotation including its sequence coding area. Results With this objective, we submitted a culture filtrate sample from M. tuberculosis to a high-accuracy LTQ-Orbitrap mass spectrometer analysis and applied refined N-terminal prediction to perform comparison of two gene annotations. From a total of 449 proteins identified from the MS data, we validated 35 tryptic peptides that were specific to one of the two datasets, representing 24 different proteins. From those, 5 proteins were only annotated in the Sanger database. In the remaining proteins, the observed differences were due to differences in annotation of transcriptional start sites. Conclusion Our results indicate that, even in a less complex sample likely to represent only 10% of the bacterial proteome, we were still able to detect major differences between different gene annotation approaches. This gives hope that high-throughput proteomics techniques can be used to improve and validate gene annotations, and in particular for verification of high-throughput, automatic gene annotations.
Hannah Matthew A
Full Text Available Abstract Background Microarray technology has become a widely accepted and standardized tool in biology. The first microarray data analysis programs were developed to support pair-wise comparison. However, as microarray experiments have become more routine, large scale experiments have become more common, which investigate multiple time points or sets of mutants or transgenics. To extract biological information from such high-throughput expression data, it is necessary to develop efficient analytical platforms, which combine manually curated gene ontologies with efficient visualization and navigation tools. Currently, most tools focus on a few limited biological aspects, rather than offering a holistic, integrated analysis. Results Here we introduce PageMan, a multiplatform, user-friendly, and stand-alone software tool that annotates, investigates, and condenses high-throughput microarray data in the context of functional ontologies. It includes a GUI tool to transform different ontologies into a suitable format, enabling the user to compare and choose between different ontologies. It is equipped with several statistical modules for data analysis, including over-representation analysis and Wilcoxon statistical testing. Results are exported in a graphical format for direct use, or for further editing in graphics programs. PageMan provides a fast overview of single treatments, allows genome-level responses to be compared across several microarray experiments covering, for example, stress responses at multiple time points. This aids in searching for trait-specific changes in pathways using mutants or transgenics, analyzing development time-courses, and comparison between species. In a case study, we analyze the results of publicly available microarrays of multiple cold stress experiments using PageMan, and compare the results to a previously published meta-analysis. PageMan offers a complete user's guide, a web-based over-representation analysis as
Full Text Available The number of RNA-Seq studies has grown in recent years. The design of RNA-Seq studies varies from very simple (e.g., two-condition case-control to very complicated (e.g., time series involving multiple samples at each time point with separate drug treatments. Most of these publically available RNA-Seq studies are deposited in NCBI databases, but their metadata are scattered throughout four different databases: Sequence Read Archive (SRA, Biosample, Bioprojects, and Gene Expression Omnibus (GEO. Although the NCBI web interface is able to provide all of the metadata information, it often requires significant effort to retrieve study- or project-level information by traversing through multiple hyperlinks and going to another page. Moreover, project- and study-level metadata lack manual or automatic curation by categories, such as disease type, time series, case-control, or replicate type, which are vital to comprehending any RNA-Seq study. Here we describe “MetaRNA-Seq,” a new tool for interactively browsing, searching, and annotating RNA-Seq metadata with the capability of semiautomatic curation at the study level.
The primary objective of this research is to identify the determinants of market failure that affect transactions between the tool, die and mould making industry and Quality Management service providers within the Nelson Mandela Bay area. A literature review of market failure, the service economy and the evolution of donors' interventions in business service markets was conducted. It provided valuable insights into the identification of issues that could indicate whether a market was performi...
This document presents Annotated English, a system of diacritical symbols which turns English pronunciation into a precise and unambiguous process. The annotations are defined and located in such a way that the original English text is not altered (not even a letter), thus allowing for a consistent reading and learning of the English language with and without annotations. The annotations are based on a set of general rules that make the frequency of annotations not dramatically high. This makes the reader easily associate annotations with exceptions, and makes it possible to shape, internalise and consolidate some rules for the English language which otherwise are weakened by the enormous amount of exceptions in English pronunciation. The advantages of this annotation system are manifold. Any existing text can be annotated without a significant increase in size. This means that we can get an annotated version of any document or book with the same number of pages and fontsize. Since no letter is affected, the ...
Full Text Available Abstract Background maxdLoad2 is a relational database schema and Java® application for microarray experimental annotation and storage. It is compliant with all standards for microarray meta-data capture; including the specification of what data should be recorded, extensive use of standard ontologies and support for data exchange formats. The output from maxdLoad2 is of a form acceptable for submission to the ArrayExpress microarray repository at the European Bioinformatics Institute. maxdBrowse is a PHP web-application that makes contents of maxdLoad2 databases accessible via web-browser, the command-line and web-service environments. It thus acts as both a dissemination and data-mining tool. Results maxdLoad2 presents an easy-to-use interface to an underlying relational database and provides a full complement of facilities for browsing, searching and editing. There is a tree-based visualization of data connectivity and the ability to explore the links between any pair of data elements, irrespective of how many intermediate links lie between them. Its principle novel features are: • the flexibility of the meta-data that can be captured, • the tools provided for importing data from spreadsheets and other tabular representations, • the tools provided for the automatic creation of structured documents, • the ability to browse and access the data via web and web-services interfaces. Within maxdLoad2 it is very straightforward to customise the meta-data that is being captured or change the definitions of the meta-data. These meta-data definitions are stored within the database itself allowing client software to connect properly to a modified database without having to be specially configured. The meta-data definitions (configuration file can also be centralized allowing changes made in response to revisions of standards or terminologies to be propagated to clients without user intervention. maxdBrowse is hosted on a web-server and presents
Channin, David S; Mongkolwat, Pattanasak; Kleper, Vladimir; Rubin, Daniel L
An image annotation is the explanatory or descriptive information about the pixel data of an image that is generated by a human (or machine) observer. An image markup is the graphical symbols placed over the image to depict an annotation. In the majority of current, clinical and research imaging practice, markup is captured in proprietary formats and annotations are referenced only in free text radiology reports. This makes these annotations difficult to query, retrieve and compute upon, hampering their integration into other data mining and analysis efforts. This paper describes the National Cancer Institute's Cancer Biomedical Informatics Grid's (caBIG) Annotation and Image Markup (AIM) project, focusing on how to use AIM to query for annotations. The AIM project delivers an information model for image annotation and markup. The model uses controlled terminologies for important concepts. All of the classes and attributes of the model have been harmonized with the other models and common data elements in use at the National Cancer Institute. The project also delivers XML schemata necessary to instantiate AIMs in XML as well as a software application for translating AIM XML into DICOM S/R and HL7 CDA. Large collections of AIM annotations can be built and then queried as Grid or Web services. Using the tools of the AIM project, image annotations and their markup can be captured and stored in human and machine readable formats. This enables the inclusion of human image observation and inference as part of larger data mining and analysis activities. PMID:19964202
Letunic, I.; Bork, P.
Interactive Tree Of Life (http://itol.embl.de) is a web-based tool for the display, manipulation and annotation of phylogenetic trees. It is freely available and open to everyone. The current version was completely redesigned and rewritten, utilizing current web technologies for speedy and streamlined processing. Numerous new features were introduced and several new data types are now supported. Trees with up to 100,000 leaves can now be efficiently displayed. Full interactive control over pr...
Schulze, Alexandra E; De Beer, Dalene; Mazibuko, Sithandiwe E; Muller, Christo J F; Roux, Candice; Willenburg, Elize L; Nyunaï, Nyemb; Louw, Johan; Manley, Marena; Joubert, Elizabeth
Similarity analysis of the phenolic fingerprints of a large number of aqueous extracts of Cyclopia subternata, obtained by high-performance liquid chromatography (HPLC), was evaluated as a potential tool to screen extracts for relative bioactivity. The assessment was based on the (dis)similarity of their fingerprints to that of a reference active extract of C. subternata, proven to enhance glucose uptake in vitro and in vivo. In vitro testing of extracts, selected as being most similar (n = 5; r ≥ 0.962) and most dissimilar (n = 5; r ≤ 0.688) to the reference active extract, showed that no clear pattern in terms of relative glucose uptake efficacy in C2C12 myocytes emerged, irrespective of the dose. Some of the most dissimilar extracts had higher glucose-lowering activity than the reference active extract. Principal component analysis revealed the major compounds responsible for the most variation within the chromatographic fingerprints, as mangiferin, isomangiferin, iriflophenone-3-C-β-D-glucoside-4-O-β-D-glucoside, iriflophenone-3-C-β-D-glucoside, scolymoside, and phloretin-3',5'-di-C-β-D-glucoside. Quantitative analysis of the selected extracts showed that the most dissimilar extracts contained the highest mangiferin and isomangiferin levels, whilst the most similar extracts had the highest scolymoside content. These compounds demonstrated similar glucose uptake efficacy in C2C12 myocytes. It can be concluded that (dis)similarity of chromatographic fingerprints of extracts of unknown activity to that of a proven bioactive extract does not necessarily translate to lower or higher bioactivity. PMID:26542834
Full Text Available I present a tool which tells the quality of document or its usefulness based on annotations. Annotation mayinclude comments, notes, observation, highlights, underline, explanation, question or help etc. commentsare used for evaluative purpose while others are used for summarization or for expansion also. Furtherthese comments may be on another annotation. Such annotations are referred as meta-annotation. Allannotation may not get equal weightage. My tool considered highlights, underline as well as comments toinfer the collective sentiment of annotators. Collective sentiments of annotators are classified as positive,negative, objectivity. My tool computes collective sentiment of annotations in two manners. It counts all theannotation present on the documents as well as it also computes sentiment scores of all annotation whichincludes comments to obtain the collective sentiments about the document or to judge the quality ofdocument. I demonstrate the use of tool on research paper.
Falquet, Gilles; De Ribaupierre, Hélène
Lors de recherche de documents, les scientifiques ont des objectifs précis en tête. Nous avons mené des interviews auprès de scientifiques pour comprendre plus précisément comment ils recherchaient leurs informations et travaillaient avec les documents trouvés. Nous avons observé que les scientifiques recherchent leurs informations dans des éléments de discours précis, et non pas toujours dans le document en entier. A partir de cela, nous avons créé un modèle d'annotation prenant en compte ce...
Nashar Karim; Wood A Joseph; Hulme Helen; Hayes Andrew; Morrison Norman; Velarde Giles; Wilson Michael; Hancock David; Kell Douglas B; Brass Andy
Abstract Background maxdLoad2 is a relational database schema and Java® application for microarray experimental annotation and storage. It is compliant with all standards for microarray meta-data capture; including the specification of what data should be recorded, extensive use of standard ontologies and support for data exchange formats. The output from maxdLoad2 is of a form acceptable for submission to the ArrayExpress microarray repository at the European Bioinformatics Institute. maxdBr...
李雪娟; 杨婧; 王俊红; 任倩俐; 李霞; 黄原
With the increasing popularity of mitochondrial genome studies, the correct assembly and annotation of genomes are the basis of all subsequent research into a species. Here we describe the protocols using Staden Package software to assemble and annotate the mitochondrial genome, along with other commonly used software, such as ContigExpress, DNAMAN, DNASTAR, BioEdit and Sequencher. In addition, methods for the use of different software packages (including DOGMA.MOSAS.MITOS.GOBASE.OGRe.MitoZoa.tRNAscan-SE.ARWEN.BLAST and MiTFi) to annotate mitochondrial genomic protein-coding genes, rRNA, tRNA and the A +T region are briefly introduced. Finally, application of MEGAS software to analyze the composition of mitochondrial genomes, Sequin software to submit sequences to GenBank, and mitochondrial genome data visualization tools ( CG view. MTviz and OGDRAW) are also briefly introduced.%线粒体基因组的研究已经普及,其正确的拼接和注释是所有后续研究的基础.本文以Staden Package软件为主介绍了拼接和注释的线粒体基因组的方法,同时介绍了其他常用的拼接软件ContigExpress、DNAMAN、DNASTAR、BioEdit和Sequencher,以及利用不同软件(包括DOGMA、MOSAS、MITOS、GOBASE、OGRe、MitoZoa、tRNAscan-SE、ARWEN、BLAST和MiTFi等)对线粒体基因组中的蛋白质编码基因、rRNA、tRNA和A+T富集区进行注释的方法,最后介绍了利用MEGA5软件分析线粒体基因组的组成、Sequin软件提交序列和线粒体基因组数据绘图工具(CG view、MTviz和OGDRAW).
Zad, Damon Daylamani; Agius, Harry
In this paper, we focus on metadata for self-created movies like those found on YouTube and Google Video, the duration of which are increasing in line with falling upload restrictions. While simple tags may have been sufficient for most purposes for traditionally very short video footage that contains a relatively small amount of semantic content, this is not the case for movies of longer duration which embody more intricate semantics. Creating metadata is a time-consuming process that takes a great deal of individual effort; however, this effort can be greatly reduced by harnessing the power of Web 2.0 communities to create, update and maintain it. Consequently, we consider the annotation of movies within Web 2.0 environments, such that users create and share that metadata collaboratively and propose an architecture for collaborative movie annotation. This architecture arises from the results of an empirical experiment where metadata creation tools, YouTube and an MPEG-7 modelling tool, were used by users to create movie metadata. The next section discusses related work in the areas of collaborative retrieval and tagging. Then, we describe the experiments that were undertaken on a sample of 50 users. Next, the results are presented which provide some insight into how users interact with existing tools and systems for annotating movies. Based on these results, the paper then develops an architecture for collaborative movie annotation.
Sarver Aaron L
Full Text Available Abstract Background Next generation sequencing approaches applied to the analyses of transposon insertion junction fragments generated in high throughput forward genetic screens has created the need for clear informatics and statistical approaches to deal with the massive amount of data currently being generated. Previous approaches utilized to 1 map junction fragments within the genome and 2 identify Common Insertion Sites (CISs within the genome are not practical due to the volume of data generated by current sequencing technologies. Previous approaches applied to this problem also required significant manual annotation. Results We describe Transposon Annotation Poisson Distribution Association Network Connectivity Environment (TAPDANCE software, which automates the identification of CISs within transposon junction fragment insertion data. Starting with barcoded sequence data, the software identifies and trims sequences and maps putative genomic sequence to a reference genome using the bowtie short read mapper. Poisson distribution statistics are then applied to assess and rank genomic regions showing significant enrichment for transposon insertion. Novel methods of counting insertions are used to ensure that the results presented have the expected characteristics of informative CISs. A persistent mySQL database is generated and utilized to keep track of sequences, mappings and common insertion sites. Additionally, associations between phenotypes and CISs are also identified using Fisher’s exact test with multiple testing correction. In a case study using previously published data we show that the TAPDANCE software identifies CISs as previously described, prioritizes them based on p-value, allows holistic visualization of the data within genome browser software and identifies relationships present in the structure of the data. Conclusions The TAPDANCE process is fully automated, performs similarly to previous labor intensive approaches
We present Annotated Answer Set Programming, that extends the ex pressive power of disjunctive logic programming with annotation terms, taken from the generalized annotated logic programming framework.
United States Holocaust Memorial Museum, Washington, DC.
This annotated list of 43 videotapes recommended for classroom use addresses various themes for teaching about the Holocaust, including: (1) overviews of the Holocaust; (2) life before the Holocaust; (3) propaganda; (4) racism, anti-Semitism; (5) "enemies of the state"; (6) ghettos; (7) camps; (8) genocide; (9) rescue; (10) resistance; (11)…
Melamed, I D
Bilingual annotators were paid to link roughly sixteen thousand corresponding words between on-line versions of the Bible in modern French and modern English. These annotations are freely available to the research community from http://www.cis.upenn.edu/~melamed . The annotations can be used for several purposes. First, they can be used as a standard data set for developing and testing translation lexicons and statistical translation models. Second, researchers in lexical semantics will be able to mine the annotations for insights about cross-linguistic lexicalization patterns. Third, the annotations can be used in research into certain recently proposed methods for monolingual word-sense disambiguation. This paper describes the annotated texts, the specially-designed annotation tool, and the strategies employed to increase the consistency of the annotations. The annotation process was repeated five times by different annotators. Inter-annotator agreement rates indicate that the annotations are reasonably rel...
Full Text Available The smartphones ownership among the undergraduates in Malaysia was recorded as high. However, little was known about its utilization patterns, thus, the focus of this research was to determine the utilisation patterns of smartphones based on the National Education Technology Standard for Students (NETS.S among engineering undergraduates in Malaysia. This study was based on a quantitative research and the population comprised undergraduates from four Malaysian Technical Universities. A total of 400 questionnaires were analyzed. Based on the results, the undergraduates’ utilisation level of smartphones for communication and collaboration tool was at a high level. Meanwhile, utilisation for operations and concepts tool and research and information fluency tool were at moderate level. Finally, smartphones utilisation as digital citizenship tool and critical thinking, problem solving and creativity tool were both at a low level. Hence, more training and workshops should be given to the students in order to encourage them to fully utilise smartphones in enhancing the higher order thinking skills.
Tang, Hong; Chen, Yunhao
In this paper, we present an approach to learning latent semantic analysis models from loosely annotated images for automatic image annotation and indexing. The given annotation in training images is loose due to: (1) ambiguous correspondences between visual features and annotated keywords; (2) incomplete lists of annotated keywords. The second reason motivates us to enrich the incomplete annotation in a simple way before learning topic models. In particular, some imagined keywords are poured into the incomplete annotation through measuring similarity between keywords. Then, both given and imagined annotations are used to learning probabilistic topic models for automatically annotating new images. We conduct experiments on a typical Corel dataset of images and loose annotations, and compare the proposed method with state-of-the-art discrete annotation methods (using a set of discrete blobs to represent an image). The proposed method improves word-driven probability Latent Semantic Analysis (PLSA-words) up to ...
Blake, J A; Dolan, M; Drabkin, H; Hill, D P; Li, Ni; Sitnikov, D; Bridges, S; Burgess, S; Buza, T; McCarthy, F; Peddinti, D; Pillai, L; Carbon, S; Dietze, H; Ireland, A; Lewis, S E; Mungall, C J; Gaudet, P; Chrisholm, R L; Fey, P; Kibbe, W A; Basu, S; Siegele, D A; McIntosh, B K; Renfro, D P; Zweifel, A E; Hu, J C; Brown, N H; Tweedie, S; Alam-Faruque, Y; Apweiler, R; Auchinchloss, A; Axelsen, K; Bely, B; Blatter, M -C; Bonilla, C; Bouguerleret, L; Boutet, E; Breuza, L; Bridge, A; Chan, W M; Chavali, G; Coudert, E; Dimmer, E; Estreicher, A; Famiglietti, L; Feuermann, M; Gos, A; Gruaz-Gumowski, N; Hieta, R; Hinz, C; Hulo, C; Huntley, R; James, J; Jungo, F; Keller, G; Laiho, K; Legge, D; Lemercier, P; Lieberherr, D; Magrane, M; Martin, M J; Masson, P; Mutowo-Muellenet, P; O'Donovan, C; Pedruzzi, I; Pichler, K; Poggioli, D; Porras Millán, P; Poux, S; Rivoire, C; Roechert, B; Sawford, T; Schneider, M; Stutz, A; Sundaram, S; Tognolli, M; Xenarios, I; Foulgar, R; Lomax, J; Roncaglia, P; Khodiyar, V K; Lovering, R C; Talmud, P J; Chibucos, M; Giglio, M Gwinn; Chang, H -Y; Hunter, S; McAnulla, C; Mitchell, A; Sangrador, A; Stephan, R; Harris, M A; Oliver, S G; Rutherford, K; Wood, V; Bahler, J; Lock, A; Kersey, P J; McDowall, D M; Staines, D M; Dwinell, M; Shimoyama, M; Laulederkind, S; Hayman, T; Wang, S -J; Petri, V; Lowry, T; D'Eustachio, P; Matthews, L; Balakrishnan, R; Binkley, G; Cherry, J M; Costanzo, M C; Dwight, S S; Engel, S R; Fisk, D G; Hitz, B C; Hong, E L; Karra, K; Miyasato, S R; Nash, R S; Park, J; Skrzypek, M S; Weng, S; Wong, E D; Berardini, T Z; Huala, E; Mi, H; Thomas, P D; Chan, J; Kishore, R; Sternberg, P; Van Auken, K; Howe, D; Westerfield, M
The Gene Ontology (GO) Consortium (GOC, http://www.geneontology.org) is a community-based bioinformatics resource that classifies gene product function through the use of structured, controlled vocabularies. Over the past year, the GOC has implemented several processes to increase the quantity, quality and specificity of GO annotations. First, the number of manual, literature-based annotations has grown at an increasing rate. Second, as a result of a new 'phylogenetic annotation' process, manually reviewed, homology-based annotations are becoming available for a broad range of species. Third, the quality of GO annotations has been improved through a streamlined process for, and automated quality checks of, GO annotations deposited by different annotation groups. Fourth, the consistency and correctness of the ontology itself has increased by using automated reasoning tools. Finally, the GO has been expanded not only to cover new areas of biology through focused interaction with experts, but also to capture greater specificity in all areas of the ontology using tools for adding new combinatorial terms. The GOC works closely with other ontology developers to support integrated use of terminologies. The GOC supports its user community through the use of e-mail lists, social media and web-based resources. PMID:23161678
Bug, William; Astahkov, Vadim; Boline, Jyl; Fennema-Notestine, Christine; Grethe, Jeffrey S; Gupta, Amarnath; Kennedy, David N; Rubin, Daniel L; Sanders, Brian; Turner, Jessica A; Martone, Maryann E
The broadly defined mission of the Biomedical Informatics Research Network (BIRN, www.nbirn.net) is to better understand the causes human disease and the specific ways in which animal models inform that understanding. To construct the community-wide infrastructure for gathering, organizing and managing this knowledge, BIRN is developing a federated architecture for linking multiple databases across sites contributing data and knowledge. Navigating across these distributed data sources requires a shared semantic scheme and supporting software framework to actively link the disparate repositories. At the core of this knowledge organization is BIRNLex, a formally-represented ontology facilitating data exchange. Source curators enable database interoperability by mapping their schema and data to BIRNLex semantic classes thereby providing a means to cast BIRNLex-based queries against specific data sources in the federation. We will illustrate use of the source registration, term mapping, and query tools. PMID:18999211
Toussaint, Yannick; Tenier, Sylvain
Les systèmes actuels d'annotation sémantique exploitent peu les connaissances du domaine et fonctionnent essentiellement du texte vers l'ontologie. Pourtant, il est fréquent qu'un élément dans une page doive être annoté par un concept parce que certains autres éléments de cette même page sont annotés par d'autres concepts. Cet article propose une méthode d'annotation prenant en compte cette dépendance entre concepts, exprimée dans une ontologie sous forme de concepts définis. L'utilisation de...
Under a cooperative agreement with the U.S. Department of Energy's Office of Science and Technology, Waste Policy Institute (WPI) is conducting a five-year research project to develop a research-based approach for integrating communication products in stakeholder involvement related to innovative technology. As part of the research, WPI developed this annotated bibliography which contains almost 100 citations of articles/books/resources involving topics related to communication and public involvement aspects of deploying innovative cleanup technology. To compile the bibliography, WPI performed on-line literature searches (e.g., Dialog, International Association of Business Communicators Public Relations Society of America, Chemical Manufacturers Association, etc.), consulted past years proceedings of major environmental waste cleanup conferences (e.g., Waste Management), networked with professional colleagues and DOE sites to gather reports or case studies, and received input during the August 1996 Research Design Team meeting held to discuss the project's research methodology. Articles were selected for annotation based upon their perceived usefulness to the broad range of public involvement and communication practitioners
Fallahkhair, Sanaz; Kennedy, Ian
Lifelong learning requires many skills that are often not taught or are poorly taught. Such skills include speed reading, critical analysis, creative thinking, active reading and even a “little” skill like annotation. There are many ways that readers annotate. A short classification of some ways that reader may annotate includes underlining, using coloured highlighters, interlinear notes, marginal notes, and disassociated notes. This paper presents an investigation into the use of a tool for ...
Tatusova, Tatiana; DiCuccio, Michael; Badretdin, Azat; Chetvernin, Vyacheslav; Nawrocki, Eric P; Zaslavsky, Leonid; Lomsadze, Alexandre; Pruitt, Kim D; Borodovsky, Mark; Ostell, James
Recent technological advances have opened unprecedented opportunities for large-scale sequencing and analysis of populations of pathogenic species in disease outbreaks, as well as for large-scale diversity studies aimed at expanding our knowledge across the whole domain of prokaryotes. To meet the challenge of timely interpretation of structure, function and meaning of this vast genetic information, a comprehensive approach to automatic genome annotation is critically needed. In collaboration with Georgia Tech, NCBI has developed a new approach to genome annotation that combines alignment based methods with methods of predicting protein-coding and RNA genes and other functional elements directly from sequence. A new gene finding tool, GeneMarkS+, uses the combined evidence of protein and RNA placement by homology as an initial map of annotation to generate and modify ab initio gene predictions across the whole genome. Thus, the new NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) relies more on sequence similarity when confident comparative data are available, while it relies more on statistical predictions in the absence of external evidence. The pipeline provides a framework for generation and analysis of annotation on the full breadth of prokaryotic taxonomy. For additional information on PGAP see https://www.ncbi.nlm.nih.gov/genome/annotation_prok/ and the NCBI Handbook, https://www.ncbi.nlm.nih.gov/books/NBK174280/. PMID:27342282
We describe an automatic face tracker plugin for the ANVIL annotation tool. The face tracker produces data for velocity and for acceleration in two dimensions. We compare the annotations generated by the face tracking algorithm with independently made manual annotations for head movements. The...
Lim, Ee-Lon; Hew, Khe Foon
E-books offer a range of benefits to both educators and students, including ease of accessibility and searching capabilities. However, the majority of current e-books are repository-cum-delivery platforms of textual information. Hitherto, there is a lack of empirical research that examines e-books with annotative and sharing capabilities. This…
Gresham Cathy R
Full Text Available Abstract Background Modeling results from chicken microarray studies is challenging for researchers due to little functional annotation associated with these arrays. The Affymetrix GenChip chicken genome array, one of the biggest arrays that serve as a key research tool for the study of chicken functional genomics, is among the few arrays that link gene products to Gene Ontology (GO. However the GO annotation data presented by Affymetrix is incomplete, for example, they do not show references linked to manually annotated functions. In addition, there is no tool that facilitates microarray researchers to directly retrieve functional annotations for their datasets from the annotated arrays. This costs researchers amount of time in searching multiple GO databases for functional information. Results We have improved the breadth of functional annotations of the gene products associated with probesets on the Affymetrix chicken genome array by 45% and the quality of annotation by 14%. We have also identified the most significant diseases and disorders, different types of genes, and known drug targets represented on Affymetrix chicken genome array. To facilitate functional annotation of other arrays and microarray experimental datasets we developed an Array GO Mapper (AGOM tool to help researchers to quickly retrieve corresponding functional information for their dataset. Conclusion Results from this study will directly facilitate annotation of other chicken arrays and microarray experimental datasets. Researchers will be able to quickly model their microarray dataset into more reliable biological functional information by using AGOM tool. The disease, disorders, gene types and drug targets revealed in the study will allow researchers to learn more about how genes function in complex biological systems and may lead to new drug discovery and development of therapies. The GO annotation data generated will be available for public use via AgBase website and
Motivation: Text mining in the biomedical domain in recent years has focused on the development of tools for recognizing named entities and extracting relations. Such research resulted from the need for such tools as basic components for more advanced solutions. Named entity recognition, entity mention normalization, and relationship extraction now have reached a stage where they perform comparably to human annotators (considering inter--annotator agreement, measured in many studies to be aro...
Ebner, Hannes; Manouselis, Nikos; Palmér, Matthias; Enoksson, Fredrik; Palavitsinis, Nikos; Kastrantas, Kostas; Naeve, Ambjörn
This paper introduces a Web-based tool that has been developed to facilitate learning object annotation in agricultural learning repositories with IEEE LOM-compliant metadata. More specifically, it presents how an application profile of the IEEE LOM standard has been developed for the description of learning objects on organic agriculture and agroecology. Then, it describes the design and prototype development of the Organic.Edunet repository tool: a Web-based for annotating learning objects ...
This paper describes AnnaBot, one of the first tools to verify correct use of Annotation-based metadata in the Java programming language. These Annotations are a standard Java 5 mechanism used to attach metadata to types, methods, or fields without using an external configuration file. A binary representation of the Annotation becomes part of the compiled “.class” file, for inspection by another component or library at runtime. Java Annotations were introduced into the Java language in 2004 a...
Roussey, C.; Bernard, S.
/ Dans cet article nous décrivons les différents schémas d’annotation envisagés pour annoter des bulletins agricoles disponibles sur le web. Notre but est de publier aussi sur le web de données les annotations manuelles permettant le catalogage des bulletins mais aussi les index utilisables par un système de recherche d’information sémantique.
Roussey, C.; Bernard, S.
Dans cet article nous décrivons les différents schémas d'annotation envisagés pour annoter des bulletins agricoles disponibles sur le web. Notre but est de publier aussi sur le web de données les annotations manuelles permettant le catalogage des bulletins mais aussi les index utilisables par un système de recherche d'information sémantique.
Li, Yunjia; Wald, Mike; Wills, Gary
With the fast growth of multimedia sharing and annotating applications on the Web, there is an increasing research interests in semantic annotations of multimedia. However, applying linked data principles in multimedia annotations is a relatively new topic, especially when annotations are related to media fragments. This paper, therefore, discusses this problem and further breaks it down into three fundamental sub-questions: 1) choosing media fragment URIs 2) Dereferencing media fragment URIs...
Sun, Choong-Hyun; Kim, Min-Sung; Han, Youngwoong; Yi, Gwan-Su
COFECO is a web-based tool for a composite annotation of protein complexes, KEGG pathways and Gene Ontology (GO) terms within a class of genes and their orthologs under study. Widely used functional enrichment tools using GO and KEGG pathways create large list of annotations that make it difficult to derive consolidated information and often include over-generalized terms. The interrelationship of annotation terms can be more clearly delineated by integrating the information of physically int...
IIS--Integrated Interactome System: a web-based platform for the annotation, analysis and visualization of protein-metabolite-gene-drug interactions by integrating a variety of data sources and tools.
Marcelo Falsarella Carazzolle
Full Text Available High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted.We describe here the Integrated Interactome System (IIS, an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system; (ii Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web.We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with
Buza, Teresia J; Kumar, Ranjit; Gresham, Cathy R; Burgess, Shane C.; McCarthy, Fiona M
Modeling results from chicken microarray studies is challenging for researchers due to little functional annotation associated with these arrays. The Affymetrix GenChip chicken genome array, one of the biggest arrays that serve as a key research tool for the study of chicken functional genomics, is among the few arrays that link gene products to Gene Ontology (GO). However the GO annotation data presented by Affymetrix is incomplete, for example, they do not show references linked to manually...
Full Text Available In functional programming languages, the classic form of annotation is a single type constraint on a term. Intersection types add complications: a single term may have to be checked several times against different types, in different contexts, requiring annotation with several types. Moreover, it is useful (in some systems, necessary to indicate the context in which each such type is to be used. This paper explores the technical design space of annotations in systems with intersection types. Earlier work (Dunfield and Pfenning 2004 introduced contextual typing annotations, which we now tease apart into more elementary mechanisms: a "right hand" annotation (the standard form, a "left hand" annotation (the context in which a right-hand annotation is to be used, a merge that allows for multiple annotations, and an existential binder for index variables. The most novel element is the left-hand annotation, which guards terms (and right-hand annotations with a judgment that must follow from the current context.
Hague, Matthew; Penelle, Vincent
Annotated pushdown automata provide an automaton model of higher-order recursion schemes, which may in turn be used to model higher-order programs for the purposes of verification. We study Ground Annotated Stack Tree Rewrite Systems -- a tree rewrite system where each node is labelled by the configuration of an annotated pushdown automaton. This allows the modelling of fork and join constructs in higher-order programs and is a generalisation of higher-order stack trees recently introduced by...
Cieri, Christopher; Bird, Steven
Annotation graphs and annotation servers offer infrastructure to support the analysis of human language resources in the form of time-series data such as text, audio and video. This paper outlines areas of common need among empirical linguists and computational linguists. After reviewing examples of data and tools used or under development for each of several areas, it proposes a common framework for future tool development, data annotation and resource sharing based upon annotation graphs an...
As the sequencing stage of human genome project is near the end, the work has begun for discovering novel genes from genome sequences and annotating their biological functions. Here are reviewed current major bioinformatics tools and technologies available for large scale gene discovery and annotation from human genome sequences. Some ideas about possible future development are also provided.
Jerbi, Houssem; Ravat, Franck; Teste, Olivier
This paper deals with personalization of annotated OLAP systems. Data constellation is extended to support annotations and user preferences. Annotations reflect the decision-maker experience whereas user preferences enable users to focus on the most interesting data. User preferences allow annotated contextual recommendations helping the decision-maker during his/her multidimensional navigations.
Abe, Jair Minoro; Nakamatsu, Kazumi
This book is written as an introduction to annotated logics. It provides logical foundations for annotated logics, discusses some interesting applications of these logics and also includes the authors' contributions to annotated logics. The central idea of the book is to show how annotated logic can be applied as a tool to solve problems of technology and of applied science. The book will be of interest to pure and applied logicians, philosophers, and computer scientists as a monograph on a kind of paraconsistent logic. But, the layman will also take profit from its reading.
Brust, Matthias R
Today's computer-based annotation systems implement a wide range of functionalities that often go beyond those available in traditional paper-and-pencil annotations. Conceptually, annotation systems are based on thoroughly investigated psycho-sociological and pedagogical learning theories. They offer a huge diversity of annotation types that can be placed in textual as well as in multimedia format. Additionally, annotations can be published or shared with a group of interested parties via well-organized repositories. Although highly sophisticated annotation systems exist both conceptually as well as technologically, we still observe that their acceptance is somewhat limited. In this paper, we argue that nowadays annotation systems suffer from several fundamental problems that are inherent in the traditional paper-and-pencil annotation paradigm. As a solution, we propose to shift the annotation paradigm for the implementation of annotation system.
Robert A Morris
Full Text Available Electronic annotation of scientific data is very similar to annotation of documents. Both types of annotation amplify the original object, add related knowledge to it, and dispute or support assertions in it. In each case, annotation is a framework for discourse about the original object, and, in each case, an annotation needs to clearly identify its scope and its own terminology. However, electronic annotation of data differs from annotation of documents: the content of the annotations, including expectations and supporting evidence, is more often shared among members of networks. Any consequent actions taken by the holders of the annotated data could be shared as well. But even those current annotation systems that admit data as their subject often make it difficult or impossible to annotate at fine-enough granularity to use the results in this way for data quality control. We address these kinds of issues by offering simple extensions to an existing annotation ontology and describe how the results support an interest-based distribution of annotations. We are using the result to design and deploy a platform that supports annotation services overlaid on networks of distributed data, with particular application to data quality control. Our initial instance supports a set of natural science collection metadata services. An important application is the support for data quality control and provision of missing data. A previous proof of concept demonstrated such use based on data annotations modeled with XML-Schema.
Full Text Available Examples of Automated Evaluation platforms deployed as Web server are currently very rare and often underestimated. Time and, effort savings, faster system improvement, com- mon paradigm of evaluation for a community, the beneﬁts offered by such services are plentiful. In this paper, we present a platform for evaluating automatically parsers and we comment on its deployment during an evaluation campaign. First, we draw up a state-of-the-art for plat- forms used in evaluation of NLP systems, then we present the tools available for Web server deployment. Next, we describe our platform and its deployment in the PASSAGE project as a Web server. Finally we show the interest of generalizing such service to other NLP domains.
Anantharaman, K.; Shivakumar, V.; Saha, D.
India's nuclear programme envisages a large-scale utilisation of thorium, as it has limited deposits of uranium but vast deposits of thorium. The large-scale utilisation of thorium requires the adoption of closed fuel cycle. The stable nature of thoria and the radiological issues associated with thoria poses challenges in the adoption of a closed fuel cycle. A thorium fuel based Advanced Heavy Water Reactor (AHWR) is being planned to provide impetus to development of technologies for the closed thorium fuel cycle. Thoria fuel has been loaded in Indian reactors and test irradiations have been carried out with (Th-Pu) MOX fuel. Irradiated thorium assemblies have been reprocessed and the separated 233U fuel has been used for test reactor KAMINI. The paper highlights the Indian experience with the use of thorium and brings out various issues associated with the thorium cycle.
Mayer Klaus FX; Spannagl Manuel; Ernst Rebecca; Klee Kathrin
Abstract Background Apollo, a genome annotation viewer and editor, has become a widely used genome annotation and visualization tool for distributed genome annotation projects. When using Apollo for annotation, database updates are carried out by uploading intermediate annotation files into the respective database. This non-direct database upload is laborious and evokes problems of data synchronicity. Results To overcome these limitations we extended the Apollo data adapter with a generic, co...
AlJadda, Khalifeh; Ranzinger, Rene; Porterfield, Melody; Weatherly, Brent; Korayem, Mohammed; Miller, John A.; Rasheed, Khaled; Kochut, Krys J; York, William S.
Several algorithms and tools have been developed to (semi) automate the process of glycan identification by interpreting Mass Spectrometric data. However, each has limitations when annotating MSn data with thousands of MS spectra using uncurated public databases. Moreover, the existing tools are not designed to manage MSn data where n > 2. We propose a novel software package to automate the annotation of tandem MS data. This software consists of two major components. The first, is a free, sem...
Hamilton John P
Full Text Available Abstract Background Despite the improvements of tools for automated annotation of genome sequences, manual curation at the structural and functional level can provide an increased level of refinement to genome annotation. The Institute for Genomic Research Rice Genome Annotation (hereafter named the Osa1 Genome Annotation is the product of an automated pipeline and, for this reason, will benefit from the input of biologists with expertise in rice and/or particular gene families. Leveraging knowledge from a dispersed community of scientists is a demonstrated way of improving a genome annotation. This requires tools that facilitate 1 the submission of gene annotation to an annotation project, 2 the review of the submitted models by project annotators, and 3 the incorporation of the submitted models in the ongoing annotation effort. Results We have developed the Eukaryotic Community Annotation Package (EuCAP, an annotation tool, and have applied it to the rice genome. The primary level of curation by community annotators (CA has been the annotation of gene families. Annotation can be submitted by email or through the EuCAP Web Tool. The CA models are aligned to the rice pseudomolecules and the coordinates of these alignments, along with functional annotation, are stored in the MySQL EuCAP Gene Model database. Web pages displaying the alignments of the CA models to the Osa1 Genome models are automatically generated from the EuCAP Gene Model database. The alignments are reviewed by the project annotators (PAs in the context of experimental evidence. Upon approval by the PAs, the CA models, along with the corresponding functional annotations, are integrated into the Osa1 Genome Annotation. The CA annotations, grouped by family, are displayed on the Community Annotation pages of the project website http://rice.tigr.org, as well as in the Community Annotation track of the Genome Browser. Conclusion We have applied EuCAP to rice. As of July 2007, the
Hamilton John P; Campbell Matthew; Thibaud-Nissen Françoise; Zhu Wei; Buell C
Abstract Background Despite the improvements of tools for automated annotation of genome sequences, manual curation at the structural and functional level can provide an increased level of refinement to genome annotation. The Institute for Genomic Research Rice Genome Annotation (hereafter named the Osa1 Genome Annotation) is the product of an automated pipeline and, for this reason, will benefit from the input of biologists with expertise in rice and/or particular gene families. Leveraging k...
This thesis focuses on the bioinformatic analysis and annotation of the genome of the marine planctomycete Rhodopirellula baltica. A comprehensive bioinformatic pipeline was set up and established that comprises gene prediction, annotation and visualization tools. Considerable effort was put into the manual annotation process.The annotation of the genome of Rhodopirellula baltica revealed that this organism is specialized on the aerobic degradation of complex carbohydrates. Its genome harbors...
Saur, Drew D.; Tan, Yap-Peng; Kulkarni, Sanjeev R.; Ramadge, Peter J.
Automated analysis and annotation of video sequences are important for digital video libraries, content-based video browsing and data mining projects. A successful video annotation system should provide users with useful video content summary in a reasonable processing time. Given the wide variety of video genres available today, automatically extracting meaningful video content for annotation still remains hard by using current available techniques. However, a wide range video has inherent structure such that some prior knowledge about the video content can be exploited to improve our understanding of the high-level video semantic content. In this paper, we develop tools and techniques for analyzing structured video by using the low-level information available directly from MPEG compressed video. Being able to work directly in the video compressed domain can greatly reduce the processing time and enhance storage efficiency. As a testbed, we have developed a basketball annotation system which combines the low-level information extracted from MPEG stream with the prior knowledge of basketball video structure to provide high level content analysis, annotation and browsing for events such as wide- angle and close-up views, fast breaks, steals, potential shots, number of possessions and possession times. We expect our approach can also be extended to structured video in other domains.
Randle-Boggis, Richard J; Helgason, Thorunn; Sapp, Melanie; Ashton, Peter D
The advent of next-generation sequencing has allowed huge amounts of DNA sequence data to be produced, advancing the capabilities of microbial ecosystem studies. The current challenge is to identify from which microorganisms and genes the DNA originated. Several tools and databases are available for annotating DNA sequences. The tools, databases and parameters used can have a significant impact on the results: naïve choice of these factors can result in a false representation of community composition and function. We use a simulated metagenome to show how different parameters affect annotation accuracy by evaluating the sequence annotation performances of MEGAN, MG-RAST, One Codex and Megablast. This simulated metagenome allowed the recovery of known organism and function abundances to be quantitatively evaluated, which is not possible for environmental metagenomes. The performance of each program and database varied, e.g. One Codex correctly annotated many sequences at the genus level, whereas MG-RAST RefSeq produced many false positive annotations. This effect decreased as the taxonomic level investigated increased. Selecting more stringent parameters decreases the annotation sensitivity, but increases precision. Ultimately, there is a trade-off between taxonomic resolution and annotation accuracy. These results should be considered when annotating metagenomes and interpreting results from previous studies. PMID:27162180
Gorbalenya Alexander E
Full Text Available Abstract Background A growing diversity of biological data is tagged with unique identifiers (UIDs associated with polynucleotides and proteins to ensure efficient computer-mediated data storage, maintenance, and processing. These identifiers, which are not informative for most people, are often substituted by biologically meaningful names in various presentations to facilitate utilization and dissemination of sequence-based knowledge. This substitution is commonly done manually that may be a tedious exercise prone to mistakes and omissions. Results Here we introduce SNAD (Sequence Name Annotation-based Designer that mediates automatic conversion of sequence UIDs (associated with multiple alignment or phylogenetic tree, or supplied as plain text list into biologically meaningful names and acronyms. This conversion is directed by precompiled or user-defined templates that exploit wealth of annotation available in cognate entries of external databases. Using examples, we demonstrate how this tool can be used to generate names for practical purposes, particularly in virology. Conclusion A tool for controllable annotation-based conversion of sequence UIDs into biologically meaningful names and acronyms has been developed and placed into service, fostering links between quality of sequence annotation, and efficiency of communication and knowledge dissemination among researchers.
This annotated bibliography contains summaries of articles and chapters of books, which are relevant to traceability. After each summary there is a part about the relevancy of the paper for the LEI project. The aim of the LEI-project is to gain insight in several aspects of traceability in order to
Heaton, Pamela; Wallace, Gregory L.
Background: Whilst interest has focused on the origin and nature of the savant syndrome for over a century, it is only within the past two decades that empirical group studies have been carried out. Methods: The following annotation briefly reviews relevant research and also attempts to address outstanding issues in this research area.…
Song, Dezhao; Chute, Christopher G; Tao, Cui
To facilitate clinical research, clinical data needs to be stored in a machine processable and understandable way. Manual annotating clinical data is time consuming. Automatic approaches (e.g., Natural Language Processing systems) have been adopted to convert such data into structured formats; however, the quality of such automatically extracted data may not always be satisfying. In this paper, we propose Semantator, a semi-automatic tool for document annotation with Semantic Web ontologies. With a loaded free text document and an ontology, Semantator supports the creation/deletion of ontology instances for any document fragment, linking/disconnecting instances with the properties in the ontology, and also enables automatic annotation by connecting to the NCBO annotator and cTAKES. By representing annotations in Semantic Web standards, Semantator supports reasoning based upon the underlying semantics of the owl:disjointWith and owl:equivalentClass predicates. We present discussions based on user experiences of using Semantator. PMID:22779043
I-Min A Chen
Full Text Available Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/.
Coal based thermal power stations have been the major source of power generation in our country in the past and would continue for decades to come. In India, thermal generation which contributes about 72% of the overall power generation of 2,45,000 MU (1989-90) is the main source of power and mainly based on coal firing. Total ash generation in India presently is to the tune of 38 million tonnes per annum. India is fourth in the world as far as coal ash generation is concerned. USSR is first, (100 million tonnes), then come USA (45 million tonnes) and China (41 million tonnes). The basic problem of thermal power station fired with high ash content coal is the generation of huge quantity of coal ash which would pose serious environmental and other related problems. The present paper analyses the extensive scope of utilisation of coal ash and enlightens the strategies to be adopted to overcome the related problems for proper utilisation of coal ash. (author). 9 tabs
Lynsay A. Shepherd
Full Text Available Despite the number of tools created to help end-users reduce risky security behaviours, users are still falling victim to online attacks. This paper proposes a browser extension utilising affective feedback to provide warnings on detection of risky behaviour. The paper provides an overview of behaviour considered to be risky, explaining potential threats users may face online. Existing tools developed to reduce risky security behaviours in end-users have been compared, discussing the success rates of various methodologies. Ongoing research is described which attempts to educate users regarding the risks and consequences of poor security behaviour by providing the appropriate feedback on the automatic recognition of risky behaviour. The paper concludes that a solution utilising a browser extension is a suitable method of monitoring potentially risky security behaviour. Ultimately, future work seeks to implement an affective feedback mechanism within the browser extension with the aim of improving security awareness.
Childs Kevin L
Full Text Available Abstract Background A goal of the Bovine Genome Database (BGD; http://BovineGenome.org has been to support the Bovine Genome Sequencing and Analysis Consortium (BGSAC in the annotation and analysis of the bovine genome. We were faced with several challenges, including the need to maintain consistent quality despite diversity in annotation expertise in the research community, the need to maintain consistent data formats, and the need to minimize the potential duplication of annotation effort. With new sequencing technologies allowing many more eukaryotic genomes to be sequenced, the demand for collaborative annotation is likely to increase. Here we present our approach, challenges and solutions facilitating a large distributed annotation project. Results and Discussion BGD has provided annotation tools that supported 147 members of the BGSAC in contributing 3,871 gene models over a fifteen-week period, and these annotations have been integrated into the bovine Official Gene Set. Our approach has been to provide an annotation system, which includes a BLAST site, multiple genome browsers, an annotation portal, and the Apollo Annotation Editor configured to connect directly to our Chado database. In addition to implementing and integrating components of the annotation system, we have performed computational analyses to create gene evidence tracks and a consensus gene set, which can be viewed on individual gene pages at BGD. Conclusions We have provided annotation tools that alleviate challenges associated with distributed annotation. Our system provides a consistent set of data to all annotators and eliminates the need for annotators to format data. Involving the bovine research community in genome annotation has allowed us to leverage expertise in various areas of bovine biology to provide biological insight into the genome sequence.
Cardeñosa Lera, Jesús; Gallardo Pérez, Carolina; Maldonado Martínez, Ángeles
Each concrete field of disciplinary or thematic specializations makes use of its own terminology. The compilation, definition, and organization of terms used in a given domain are a basic task, because it becomes the base for the constitution of specialized terminology resources of great usefulness. Thesauri are a type of terminological resource of increasing relevance at the present time; frequently used in the recovery and localization of information in digital environments. The hierarchic ...
An annotated summary of 204 articles and publications on burrs, burr prevention and deburring is presented. Thirty-seven deburring processes are listed. Entries cited include English, Russian, French, Japanese and German language articles. Entries are indexed by deburring processes, author, and language. Indexes also indicate which references discuss equipment and tooling, how to use a process, economics, burr properties, and how to design to minimize burr problems. Research studies are identified as are the materials deburred
Gunn, Teresa M.
Functional annotation of every gene in the mouse genome is a herculean task that requires a multifaceted approach. Many large-scale initiatives are contributing to this undertaking. The International Knockout Mouse Consortium (IKMC) plans to mutate every protein-coding gene, using a combination of gene trapping and gene targeting in embryonic stem cells. Many other groups are performing using the chemical mutagen ethylnitrosourea (ENU) or transpon-based systems to induce mutations, screening ...
Sanfilippo, Antonio P.; Tratz, Stephen C.; Gregory, Michelle L.; Chappell, Alan R.; Whitney, Paul D.; Posse, Christian; Paulson, Patrick R.; Baddeley, Bob; Hohimer, Ryan E.; White, Amanda M.
Semantic Web applications require robust and accurate annotation tools that are capable of automating the assignment of ontological classes to words in naturally occurring text (ontological annotation). Most current ontologies do not include rich lexical databases and are therefore not easily integrated with word sense disambiguation algorithms that are needed to automate ontological annotation. WordNet provides a potentially ideal solution to this problem as it offers a highly structured lexical conceptual representation that has been extensively used to develop word sense disambiguation algorithms. However, WordNet has not been designed as an ontology, and while it can be easily turned into one, the result of doing this would present users with serious practical limitations due to the great number of concepts (synonym sets) it contains. Moreover, mapping WordNet to an existing ontology may be difficult and requires substantial labor. We propose to overcome these limitations by developing an analytical platform that (1) provides a WordNet-based ontology offering a manageable and yet comprehensive set of concept classes, (2) leverages the lexical richness of WordNet to give an extensive characterization of concept class in terms of lexical instances, and (3) integrates a class recognition algorithm that automates the assignment of concept classes to words in naturally occurring text. The ensuing framework makes available an ontological annotation platform that can be effectively integrated with intelligence analysis systems to facilitate evidence marshaling and sustain the creation and validation of inference models.
Sanfilippo, Antonio P.; Tratz, Stephen C.; Gregory, Michelle L.; Chappell, Alan R.; Whitney, Paul D.; Posse, Christian; Paulson, Patrick R.; Baddeley, Bob L.; Hohimer, Ryan E.; White, Amanda M.
Semantic Web applications require robust and accurate annotation tools that are capable of automating the assignment of ontological classes to words in naturally occurring text (ontological annotation). Most current ontologies do not include rich lexical databases and are therefore not easily integrated with word sense disambiguation algorithms that are needed to automate ontological annotation. WordNet provides a potentially ideal solution to this problem as it offers a highly structured lexical conceptual representation that has been extensively used to develop word sense disambiguation algorithms. However, WordNet has not been designed as an ontology, and while it can be easily turned into one, the result of doing this would present users with serious practical limitations due to the great number of concepts (synonym sets) it contains. Moreover, mapping WordNet to an existing ontology may be difficult and requires substantial labor. We propose to overcome these limitations by developing an analytical platform that (1) provides a WordNet-based ontology offering a manageable and yet comprehensive set of concept classes, (2) leverages the lexical richness of WordNet to give an extensive characterization of concept class in terms of lexical instances, and (3) integrates a class recognition algorithm that automates the assignment of concept classes to words in naturally occurring text. The ensuing framework makes available an ontological annotation platform that can be effectively integrated with intelligence analysis systems to facilitate evidence marshaling and sustain the creation and validation of inference models.
Zhou, Jiang; Duane, Aaron; Albatal, Rami; Gurrin, Cathal; Johansen, Dag
Google Glass has potential to be a real-time data capture and annotation tool. With professional sports as a use-case, we present a platform which helps a football coach capture and annotate interesting events using Google Glass. In our implementation, an interesting event is indicated by a predefined hand gesture or motion, and our platform can automatically detect these gestures in a video without training any classifier. Three event detectors are examined and our experiment shows that the ...
The car parking used as control tool of individual motor traffic. Good practices of european towns; Le stationnement utilise comme outil de regulation des deplacements individuels motorises. Bonnes pratiques de villes europeennes
Cahn, M.; Vallar, J.P.
This study aims to identify and present significant actions of european towns in the domain of local parking policy as a control tool of motor traffic. Some cases are presented to illustrate the study and six axis of actions have been identified: parking restriction measures to protect the town center and encourage people to use other transport systems; urban areas regulations; initiatives in little towns; tariffs of parking; assistance to disabled persons and actions realized in outlying areas. (A.L.B.)
R. Siemens; M. Timney; C. Leitch; C. Koolen; A. Garnett
The two annotated bibliographies present in this publication document and feature pertinent discussions toward the activity of modeling the social edition, first exploring reading devices, tools and social media issues and, second, social networking tools for professional readers in the Humanities.
The numerous application of coal with high content of humic substances are known. They are used in many branches of industry. The complex study of the composition of coal from upper Nitra mines has directed research to its application in the field of ecology and agriculture. The effective sorption layers of this coal and their humic acids can trap a broad spectrum of toxic harmful substances present in industrial wastes, particularly heavy metals. A major source of humic acids is coal - the most abundant and predominant product of plant residue coalification. All ranks of coal contain humic acids but lignite from Novaky deposit represents the most easily available and concentrated from of humic acids. The possibilities of utilisation of humic acids to remove heavy metals from waste waters was studied. The residual concentrations of the investigated metals in the aqueous phase were determined by AAs. From the results follows that the samples of coals humic acids can be used for the heavy metal removal from metal solutions and the real acid mine water. Oxidised coal with high content of humic acids and nitrogen is used in agriculture as fertilizer. Humic acids are active component in coal and can help to utilize almost quantitatively nitrogen in soil. The humic substances block and stabilize toxic metal residues already present in soil. (author)
Martinez Alonso, Hector; Johannsen, Anders Trærup; Lopez de Lacalle, Oier;
High agreement is a common objective when annotating data for word senses. However, a number of factors make perfect agreement impossible, e.g. the limitations of the sense inventories, the difficulty of the examples or the interpretation preferences of the annotations. Estimating potential...... agreement is thus a relevant task to supplement the evaluation of sense annotations. In this article we propose two methods to predict agreement on word-annotation instances. We experiment with a continuous representation and a three-way discretization of observed agreement. In spite of the difficulty of...
Aken, Bronwen L; Ayling, Sarah; Barrell, Daniel; Clarke, Laura; Curwen, Valery; Fairley, Susan; Fernandez Banet, Julio; Billis, Konstantinos; García Girón, Carlos; Hourlier, Thibaut; Howe, Kevin; Kähäri, Andreas; Kokocinski, Felix; Martin, Fergal J; Murphy, Daniel N; Nag, Rishi; Ruffier, Magali; Schuster, Michael; Tang, Y Amy; Vogel, Jan-Hinnerk; White, Simon; Zadissa, Amonida; Flicek, Paul; Searle, Stephen M J
The Ensembl gene annotation system has been used to annotate over 70 different vertebrate species across a wide range of genome projects. Furthermore, it generates the automatic alignment-based annotation for the human and mouse GENCODE gene sets. The system is based on the alignment of biological sequences, including cDNAs, proteins and RNA-seq reads, to the target genome in order to construct candidate transcript models. Careful assessment and filtering of these candidate transcripts ultimately leads to the final gene set, which is made available on the Ensembl website. Here, we describe the annotation process in detail.Database URL: http://www.ensembl.org/index.html. PMID:27337980
Full Text Available Abstract Background Modern biomedical research depends on a complete and accurate proteome. With the widespread adoption of new sequencing technologies, genome sequences are generated at a near exponential rate, diminishing the time and effort that can be invested in genome annotation. The resulting gene set contains numerous errors in even the most basic form of annotation: the primary structure of the proteins. Results The application of experimental proteomics data to genome annotation, called proteogenomics, can quickly and efficiently discover misannotations, yielding a more accurate and complete genome annotation. We present a comprehensive proteogenomic analysis of the plague bacterium, Yersinia pestis KIM. We discover non-annotated genes, correct protein boundaries, remove spuriously annotated ORFs, and make major advances towards accurate identification of signal peptides. Finally, we apply our data to 21 other Yersinia genomes, correcting and enhancing their annotations. Conclusions In total, 141 gene models were altered and have been updated in RefSeq and Genbank, which can be accessed seamlessly through any NCBI tool (e.g. blast or downloaded directly. Along with the improved gene models we discover new, more accurate means of identifying signal peptides in proteomics data.
Channin, David S; Mongkolwat, Pattanasak; Kleper, Vladimir; Sepukar, Kastubh; Rubin, Daniel L
Image annotation and markup are at the core of medical interpretation in both the clinical and the research setting. Digital medical images are managed with the DICOM standard format. While DICOM contains a large amount of meta-data about whom, where, and how the image was acquired, DICOM says little about the content or meaning of the pixel data. An image annotation is the explanatory or descriptive information about the pixel data of an image that is generated by a human or machine observer. An image markup is the graphical symbols placed over the image to depict an annotation. While DICOM is the standard for medical image acquisition, manipulation, transmission, storage, and display, there are no standards for image annotation and markup. Many systems expect annotation to be reported verbally, while markups are stored in graphical overlays or proprietary formats. This makes it difficult to extract and compute with both of them. The goal of the Annotation and Image Markup (AIM) project is to develop a mechanism, for modeling, capturing, and serializing image annotation and markup data that can be adopted as a standard by the medical imaging community. The AIM project produces both human- and machine-readable artifacts. This paper describes the AIM information model, schemas, software libraries, and tools so as to prepare researchers and developers for their use of AIM. PMID:19294468
Full Text Available Abstract Background Gene annotation is a pivotal component in computational genomics, encompassing prediction of gene function, expression analysis, and sequence scrutiny. Hence, quantitative measures of the annotation landscape constitute a pertinent bioinformatics tool. GeneCards® is a gene-centric compendium of rich annotative information for over 50,000 human gene entries, building upon 68 data sources, including Gene Ontology (GO, pathways, interactions, phenotypes, publications and many more. Results We present the GeneCards Inferred Functionality Score (GIFtS which allows a quantitative assessment of a gene's annotation status, by exploiting the unique wealth and diversity of GeneCards information. The GIFtS tool, linked from the GeneCards home page, facilitates browsing the human genome by searching for the annotation level of a specified gene, retrieving a list of genes within a specified range of GIFtS value, obtaining random genes with a specific GIFtS value, and experimenting with the GIFtS weighting algorithm for a variety of annotation categories. The bimodal shape of the GIFtS distribution suggests a division of the human gene repertoire into two main groups: the high-GIFtS peak consists almost entirely of protein-coding genes; the low-GIFtS peak consists of genes from all of the categories. Cluster analysis of GIFtS annotation vectors provides the classification of gene groups by detailed positioning in the annotation arena. GIFtS also provide measures which enable the evaluation of the databases that serve as GeneCards sources. An inverse correlation is found (for GIFtS>25 between the number of genes annotated by each source, and the average GIFtS value of genes associated with that source. Three typical source prototypes are revealed by their GIFtS distribution: genome-wide sources, sources comprising mainly highly annotated genes, and sources comprising mainly poorly annotated genes. The degree of accumulated knowledge for a
Elsik, Christine G; Tayal, Aditi; Diesh, Colin M; Unni, Deepak R; Emery, Marianne L; Nguyen, Hung N; Hagen, Darren E
We report an update of the Hymenoptera Genome Database (HGD) (http://HymenopteraGenome.org), a model organism database for insect species of the order Hymenoptera (ants, bees and wasps). HGD maintains genomic data for 9 bee species, 10 ant species and 1 wasp, including the versions of genome and annotation data sets published by the genome sequencing consortiums and those provided by NCBI. A new data-mining warehouse, HymenopteraMine, based on the InterMine data warehousing system, integrates the genome data with data from external sources and facilitates cross-species analyses based on orthology. New genome browsers and annotation tools based on JBrowse/WebApollo provide easy genome navigation, and viewing of high throughput sequence data sets and can be used for collaborative genome annotation. All of the genomes and annotation data sets are combined into a single BLAST server that allows users to select and combine sequence data sets to search. PMID:26578564
Damien E. ZOMAHOUN
Full Text Available In the quest for models that could help to represen t the meaning of images, some approaches have used contextual knowledge by building semantic hierarchi es. Others have resorted to the integration of imag es analysis improvement knowledge and images interpret ation using ontologies. The images are often annotated with a set of keywords (or ontologies, w hose relevance remains highly subjective and relate d to only one interpretation (one annotator. However , an image can get many associated semantics because annotators can interpret it differently. Th e purpose of this paper is to propose a collaborati ve annotation system that brings out the meaning of im ages from the different interpretations of annotato rs. The different works carried out in this paper lead to a semantic model of an image, i.e. the different means that a picture may have. This method relies o n the different tools of the Semantic Web, especial ly ontologies.
Full Text Available Abstract Background Minimotifs are short peptide sequences within one protein, which are recognized by other proteins or molecules. While there are now several minimotif databases, they are incomplete. There are reports of many minimotifs in the primary literature, which have yet to be annotated, while entirely novel minimotifs continue to be published on a weekly basis. Our recently proposed function and sequence syntax for minimotifs enables us to build a general tool that will facilitate structured annotation and management of minimotif data from the biomedical literature. Results We have built the MimoSA application for minimotif annotation. The application supports management of the Minimotif Miner database, literature tracking, and annotation of new minimotifs. MimoSA enables the visualization, organization, selection and editing functions of minimotifs and their attributes in the MnM database. For the literature components, Mimosa provides paper status tracking and scoring of papers for annotation through a freely available machine learning approach, which is based on word correlation. The paper scoring algorithm is also available as a separate program, TextMine. Form-driven annotation of minimotif attributes enables entry of new minimotifs into the MnM database. Several supporting features increase the efficiency of annotation. The layered architecture of MimoSA allows for extensibility by separating the functions of paper scoring, minimotif visualization, and database management. MimoSA is readily adaptable to other annotation efforts that manually curate literature into a MySQL database. Conclusions MimoSA is an extensible application that facilitates minimotif annotation and integrates with the Minimotif Miner database. We have built MimoSA as an application that integrates dynamic abstract scoring with a high performance relational model of minimotif syntax. MimoSA's TextMine, an efficient paper-scoring algorithm, can be used to
Gaudet, Pascale; Livstone, Michael S; Lewis, Suzanna E; Thomas, Paul D
The goal of the Gene Ontology (GO) project is to provide a uniform way to describe the functions of gene products from organisms across all kingdoms of life and thereby enable analysis of genomic data. Protein annotations are either based on experiments or predicted from protein sequences. Since most sequences have not been experimentally characterized, most available annotations need to be based on predictions. To make as accurate inferences as possible, the GO Consortium's Reference Genome Project is using an explicit evolutionary framework to infer annotations of proteins from a broad set of genomes from experimental annotations in a semi-automated manner. Most components in the pipeline, such as selection of sequences, building multiple sequence alignments and phylogenetic trees, retrieving experimental annotations and depositing inferred annotations, are fully automated. However, the most crucial step in our pipeline relies on software-assisted curation by an expert biologist. This curation tool, Phylogenetic Annotation and INference Tool (PAINT) helps curators to infer annotations among members of a protein family. PAINT allows curators to make precise assertions as to when functions were gained and lost during evolution and record the evidence (e.g. experimentally supported GO annotations and phylogenetic information including orthology) for those assertions. In this article, we describe how we use PAINT to infer protein function in a phylogenetic context with emphasis on its strengths, limitations and guidelines. We also discuss specific examples showing how PAINT annotations compare with those generated by other highly used homology-based methods. PMID:21873635
Full Text Available This study is performed to show the effects of Facebook in French language instruction as a tool. It seeks to analyze the activities of using Facebook in a position of learning French. To conduct this study, we set up a Facebook group with 23 students in French Preparatory Program and three teachers who lead the program in the Department of Translation and Interpreting, Faculty of Letters and Sciences, University of Mersin, in the study period 2014-2015 fall semester. This closed group, teachers shared topics of their courses, images, writings, videos and they want their students to answer to their questions in the group. The students are also participated to the activities making comments under the sharing, giving answers to asked questions, making themselves sharing on this platform. The course announcements were also shared in the group. Moreover, in the current application portion exemplary, extracts the students' work were made anonymous to protect the right to privacy. For the protection of the names and photos profiles have hidden. In the period of the application, the observation technique was used. At the end of the application a student survey was submitted as part of the copy of this study. For the analysis of survey data the content analysis technique was used.
Cuellar-Partida, Gabriel; Renteria, Miguel E.; Macgregor, Stuart
Background Genome-wide association studies (GWAS) are an important tool for the mapping of complex traits and diseases. Visual inspection of genomic annotations may be used to generate insights into the biological mechanisms underlying GWAS-identified loci. Results We developed LocusTrack, a web-based application that annotates and creates plots of regional GWAS results and incorporates user-specified tracks that display annotations such as linkage disequilibrium (LD), phylogenetic conservati...
Hill, James A.; Smith, Bryan E.; Papoulias, Panagiotis G.; Andrews, Philip C.
ProteomeCommons.org has implemented a resource that incorporates concepts of Web 2.0 social networking for collaborative annotation of data sets placed in the Tranche repository. The annotation tools are part of a project management resource that is effective for individual laboratories or large distributed groups. The creation of the resource was motivated by the need for a way to encourage annotation of data sets with high accuracy and compliance rates. The system is designed to respond to ...
Haas, B J; Salzberg, S L; Zhu, W; Pertea, M; Allen, J E; Orvis, J; White, O; Buell, C R; Wortman, J R
EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation.
Gao, Jing; Ade, Alex S; Tarcea, V. Glenn; Weymouth, Terry E.; Mirel, Barbara R.; Jagadish, H.V.; States, David J.
Summary: The MiMI molecular interaction repository integrates data from multiple sources, resolves interactions to standard gene names and symbols, links to annotation data from GO, MeSH and PubMed and normalizes the descriptions of interaction type. Here, we describe a Cytoscape plugin that retrieves interaction and annotation data from MiMI and links out to multiple data sources and tools. Community annotation of the interactome is supported.
Cattuto, Ciro; Baldassarri, Andrea; Schehr, G; Loreto, Vittorio
The enormous increase of popularity and use of the WWW has led in the recent years to important changes in the ways people communicate. An interesting example of this fact is provided by the now very popular social annotation systems, through which users annotate resources (such as web pages or digital photographs) with text keywords dubbed tags. Understanding the rich emerging structures resulting from the uncoordinated actions of users calls for an interdisciplinary effort. In particular concepts borrowed from statistical physics, such as random walks, and the complex networks framework, can effectively contribute to the mathematical modeling of social annotation systems. Here we show that the process of social annotation can be seen as a collective but uncoordinated exploration of an underlying semantic space, pictured as a graph, through a series of random walks. This modeling framework reproduces several aspects, so far unexplained, of social annotation, among which the peculiar growth of the size of the...
Martinez Alonso, Hector
Regular polysemy has received a lot of attention from the theory of lexical semantics and from computational linguistics. However, there is no consensus on how to represent the sense of underspecified examples at the token level, namely when annotating or disambiguating senses of metonymic words...... like “London” (Location/Organization) or “cup” (Container/Content). The goal of this dissertation is to assess whether metonymic sense underspecification justifies incorporating a third sense into our sense inventories, thereby treating the underspecified sense as independent from the literal and...... metonymic. We have conducted an analysis in English, Danish and Spanish. Later on, we have tried to replicate the human judgments by means of unsupervised and semi-supervised sense prediction. The automatic sense-prediction systems have been unable to find empiric evidence for the underspecified sense, even...
The paper deals with possibilities of utilisation of biogas for energy purposes by combined production of electric power and heat. As a example of like this cogeneration units the urban waste water treatment Lucenec is given. (authors)
Markowitz, Victor M.; Mavromatis, Konstantinos; Ivanova, Natalia N.; Chen, I-Min A.; Chu, Ken; Kyrpides, Nikos C.
A rapidly increasing number of microbial genomes are sequenced by organizations worldwide and are eventually included into various public genome data resources. The quality of the annotations depends largely on the original dataset providers, with erroneous or incomplete annotations often carried over into the public resources and difficult to correct. We have developed an Expert Review (ER) version of the Integrated Microbial Genomes (IMG) system, with the goal of supporting systematic and efficient revision of microbial genome annotations. IMG ER provides tools for the review and curation of annotations of both new and publicly available microbial genomes within IMG's rich integrated genome framework. New genome datasets are included into IMG ER prior to their public release either with their native annotations or with annotations generated by IMG ER's annotation pipeline. IMG ER tools allow addressing annotation problems detected with IMG's comparative analysis tools, such as genes missed by gene prediction pipelines or genes without an associated function. Over the past year, IMG ER was used for improving the annotations of about 150 microbial genomes.
Software frameworks and architectures are in need for meta data to efficiently support model integration. Modelers have to know the context of a model, often stepping into modeling semantics and auxiliary information usually not provided in a concise structure and universal format, consumable by a range of (modeling) tools. XML often seems the obvious solution for capturing meta data, but its wide adoption to facilitate model interoperability is limited by XML schema fragmentation, complexity, and verbosity outside of a data-automation process. Ontologies seem to overcome those shortcomings, however the practical significance of their use remains to be demonstrated. OMS version 3 took a different approach for meta data representation. The fundamental building block of a modular model in OMS is a software component representing a single physical process, calibration method, or data access approach. Here, programing language features known as Annotations or Attributes were adopted. Within other (non-modeling) frameworks it has been observed that annotations lead to cleaner and leaner application code. Framework-supported model integration, traditionally accomplished using Application Programming Interfaces (API) calls is now achieved using descriptive code annotations. Fully annotated components for various hydrological and Ag-system models now provide information directly for (i) model assembly and building, (ii) data flow analysis for implicit multi-threading or visualization, (iii) automated and comprehensive model documentation of component dependencies, physical data properties, (iv) automated model and component testing, calibration, and optimization, and (v) automated audit-traceability to account for all model resources leading to a particular simulation result. Such a non-invasive methodology leads to models and modeling components with only minimal dependencies on the modeling framework but a strong reference to its originating code. Since models and
The buffy coat technique (BCT), a parasitological test, and an indirect antibody ELISA (Ab-ELISA) were used to detect trypanosome infections in blood and serum samples, respectively, collected on N'Dama cattle exposed to natural high tsetse challenge. These two diagnostic tools were also utilized to assess trypanosomal status in sequentially collected blood and serum samples from two groups composed of 5 N'Dama cattle each experimentally challenged with Trypanosoma congolense and T. vivax, In both studies, packed red cell volume (PCV) and live weight were measured. The specificity of the Ab-ELISA was computed by testing approximately 70 serum samples obtained from a cattle population kept under zero tsetse challenge. The specificity was found to be 95.8% for T. vivax and 97. 1 % for T. congolense. In the field study, 3.9% (12/310) of blood samples was parasitologically positive. In corresponding serum samples the prevalence of positive trypanosome sero-reactors was 54.8% (170/310). However, antibodies against trypanosomes persisted in serum when blood samples were no longer parasitologically positive. In both blood and serum samples, T. vivax was found to be the main infecting species. The sensitivity of the Ab-ELISA for T. vivax was 81.8%. Due to the extremely low numbers of T. congolense infection (only one), as detected by BCT, the sensitivity for that trypanosome species was not computed. In the experimentally challenged cattle, 80% (24/30) and 33.3% (10/30) of blood samples were BCT positive for T. congolense and T. vivax, respectively. Antibodies in corresponding sera were present in 69% (20/29) and 96.3% (26/27) of animals challenged with T. congolense and T. vivax, respectively. The serological assay for T. congolense antibody detection exhibited high cross-reactivity with T. vivax antigens, as assessed in sera collected from T. vivax infected animals. In the field study, cattle showing the presence of antibodies against T. congolense and/or T. vivax had
Kumar, Kamal; Desai, Valmik; Cheng, Li; Khitrov, Maxim; Grover, Deepak; Satya, Ravi Vijaya; Yu, Chenggang; Zavaljevski, Nela; Reifman, Jaques
Background The annotation of genomes from next-generation sequencing platforms needs to be rapid, high-throughput, and fully integrated and automated. Although a few Web-based annotation services have recently become available, they may not be the best solution for researchers that need to annotate a large number of genomes, possibly including proprietary data, and store them locally for further analysis. To address this need, we developed a standalone software application, the Annotation of microbial Genome Sequences (AGeS) system, which incorporates publicly available and in-house-developed bioinformatics tools and databases, many of which are parallelized for high-throughput performance. Methodology The AGeS system supports three main capabilities. The first is the storage of input contig sequences and the resulting annotation data in a central, customized database. The second is the annotation of microbial genomes using an integrated software pipeline, which first analyzes contigs from high-throughput sequencing by locating genomic regions that code for proteins, RNA, and other genomic elements through the Do-It-Yourself Annotation (DIYA) framework. The identified protein-coding regions are then functionally annotated using the in-house-developed Pipeline for Protein Annotation (PIPA). The third capability is the visualization of annotated sequences using GBrowse. To date, we have implemented these capabilities for bacterial genomes. AGeS was evaluated by comparing its genome annotations with those provided by three other methods. Our results indicate that the software tools integrated into AGeS provide annotations that are in general agreement with those provided by the compared methods. This is demonstrated by a >94% overlap in the number of identified genes, a significant number of identical annotated features, and a >90% agreement in enzyme function predictions. PMID:21408217
Sun, Yanyan; Gao, Fei
Web annotation is a Web 2.0 technology that allows learners to work collaboratively on web pages or electronic documents. This study explored the use of Web annotation as an online discussion tool by comparing it to a traditional threaded discussion forum. Ten graduate students participated in the study. Participants had access to both a Web…
Full Text Available Abstract Background Automated extraction systems have become a time saving necessity in Systems Biology. Considerable human effort is needed to model, analyse and simulate biological networks. Thus, one of the challenges posed to Biomedical Text Mining tools is that of learning to recognise a wide variety of biological concepts with different functional roles to assist in these processes. Results Here, we present a novel corpus concerning the integrated cellular responses to nutrient starvation in the model-organism Escherichia coli. Our corpus is a unique resource in that it annotates biomedical concepts that play a functional role in expression, regulation and metabolism. Namely, it includes annotations for genetic information carriers (genes and DNA, RNA molecules, proteins (transcription factors, enzymes and transporters, small metabolites, physiological states and laboratory techniques. The corpus consists of 130 full-text papers with a total of 59043 annotations for 3649 different biomedical concepts; the two dominant classes are genes (highest number of unique concepts and compounds (most frequently annotated concepts, whereas other important cellular concepts such as proteins account for no more than 10% of the annotated concepts. Conclusions To the best of our knowledge, a corpus that details such a wide range of biological concepts has never been presented to the text mining community. The inter-annotator agreement statistics provide evidence of the importance of a consolidated background when dealing with such complex descriptions, the ambiguities naturally arising from the terminology and their impact for modelling purposes. Availability is granted for the full-text corpora of 130 freely accessible documents, the annotation scheme and the annotation guidelines. Also, we include a corpus of 340 abstracts.
Barrell, Daniel; Dimmer, Emily; Huntley, Rachael P; Binns, David; O'Donovan, Claire; Apweiler, Rolf
The Gene Ontology Annotation (GOA) project at the EBI (http://www.ebi.ac.uk/goa) provides high-quality electronic and manual associations (annotations) of Gene Ontology (GO) terms to UniProt Knowledgebase (UniProtKB) entries. Annotations created by the project are collated with annotations from external databases to provide an extensive, publicly available GO annotation resource. Currently covering over 160 000 taxa, with greater than 32 million annotations, GOA remains the largest and most comprehensive open-source contributor to the GO Consortium (GOC) project. Over the last five years, the group has augmented the number and coverage of their electronic pipelines and a number of new manual annotation projects and collaborations now further enhance this resource. A range of files facilitate the download of annotations for particular species, and GO term information and associated annotations can also be viewed and downloaded from the newly developed GOA QuickGO tool (http://www.ebi.ac.uk/QuickGO), which allows users to precisely tailor their annotation set. PMID:18957448
We present CAVaT, a tool that performs Corpus Analysis and Validation for TimeML. CAVaT is an open source, modular checking utility for statistical analysis of features specific to temporally-annotated natural language corpora. It provides reporting, highlights salient links between a variety of general and time-specific linguistic features, and also validates a temporal annotation to ensure that it is logically consistent and sufficiently annotated. Uniquely, CAVaT provides analysis specific to TimeML-annotated temporal information. TimeML is a standard for annotating temporal information in natural language text. In this paper, we present the reporting part of CAVaT, and then its error-checking ability, including the workings of several novel TimeML document verification methods. This is followed by the execution of some example tasks using the tool to show relations between times, events, signals and links. We also demonstrate inconsistencies in a TimeML corpus (TimeBank) that have been detected with CAVaT...
Full Text Available Transposable elements (TEs are mobile, repetitive sequences that make up significant fractions of metazoan genomes. Despite their near ubiquity and importance in genome and chromosome biology, most efforts to annotate TEs in genome sequences rely on the results of a single computational program, RepeatMasker. In contrast, recent advances in gene annotation indicate that high-quality gene models can be produced from combining multiple independent sources of computational evidence. To elevate the quality of TE annotations to a level comparable to that of gene models, we have developed a combined evidence-model TE annotation pipeline, analogous to systems used for gene annotation, by integrating results from multiple homology-based and de novo TE identification methods. As proof of principle, we have annotated "TE models" in Drosophila melanogaster Release 4 genomic sequences using the combined computational evidence derived from RepeatMasker, BLASTER, TBLASTX, all-by-all BLASTN, RECON, TE-HMM and the previous Release 3.1 annotation. Our system is designed for use with the Apollo genome annotation tool, allowing automatic results to be curated manually to produce reliable annotations. The euchromatic TE fraction of D. melanogaster is now estimated at 5.3% (cf. 3.86% in Release 3.1, and we found a substantially higher number of TEs (n = 6,013 than previously identified (n = 1,572. Most of the new TEs derive from small fragments of a few hundred nucleotides long and highly abundant families not previously annotated (e.g., INE-1. We also estimated that 518 TE copies (8.6% are inserted into at least one other TE, forming a nest of elements. The pipeline allows rapid and thorough annotation of even the most complex TE models, including highly deleted and/or nested elements such as those often found in heterochromatic sequences. Our pipeline can be easily adapted to other genome sequences, such as those of the D. melanogaster heterochromatin or other
Juncker, Agnieszka; Jensen, Lars J.; Pierleoni, Andrea; Bernsel, Andreas; Tress, Michael L.; Bork, Peer; Von Heijne, Gunnar; Valencia, Alfonso; A Ouzounis, Christos; Casadio, Rita; Brunak, Søren
A recent trend in computational methods for annotation of protein function is that many prediction tools are combined in complex workflows and pipelines to facilitate the analysis of feature combinations, for example, the entire repertoire of kinase-binding motifs in the human proteome....
Loewenstein, Yaniv; Raimondo, Domenico; Redfern, Oliver C.; Watson, James; Frishman, Dmitrij; Linial, Michal; Orengo, Christine; Thornton, Janet; Tramontano, Anna
With many genomes now sequenced, computational annotation methods to characterize genes and proteins from their sequence are increasingly important. The BioSapiens Network has developed tools to address all stages of this process, and here we review progress in the automated prediction of protein function based on protein sequence and structure.
Syed Mubashir Ali
This study is undertaken to test whether or not there exists gender bias in health care utilisation of sick children in Pakistan. Overall, the results are encouraging, as medical consultation has been sought for by a very high proportion (79 percent) of sick children. Moreover, there do not appear to be significant differences by gender in health care utilisation, be it curative or preventive. This is so in spite of the fact that many studies on various gender-related issues in Pakistan have ...
Summary Objective To summarize the best papers in the field of Knowledge Representation and Management (KRM). Methods A comprehensive review of medical informatics literature was performed to select some of the most interesting papers of KRM published in 2014. Results Four articles were selected, two focused on annotation and information retrieval using an ontology. The two others focused mainly on ontologies, one dealing with the usage of a temporal ontology in order to analyze the content of narrative document, one describing a methodology for building multilingual ontologies. Conclusion Semantic models began to show their efficiency, coupled with annotation tools. PMID:26293860
Full Text Available Resource provisioning based on virtual machine (VM has been widely accepted and adopted in cloud computing environments. A key problem resulting from using static scheduling approaches for allocating VMs on different physical machines (PMs is that resources tend to be not fully utilised. Although some existing cloud reconfiguration algorithms have been developed to address the problem, they normally result in high migration costs and low resource utilisation due to ignoring the multi-dimensional characteristics of VMs and PMs. In this paper we present and evaluate a new algorithm for improving resource utilisation for cloud providers. By using a multivariate probabilistic model, our algorithm selects suitable PMs for VM re-allocation which are then used to generate a reconfiguration plan. We also describe two heuristics metrics which can be used in the algorithm to capture the multi-dimensional characteristics of VMs and PMs. By combining these two heuristics metrics in our experiments, we observed that our approach improves the resource utilisation level by around 8% for cloud providers, such as IC Cloud, which accept user-defined VM configurations and 14% for providers, such as Amazon EC2, which only provide limited types of VM configurations.
Energetic utilisation of biomass has been known since prehistoric times and was only pushed into the background by the technological developments of the last century. The energy crisis and, more recently, environmental problems have now brought it back to the fore, and efforts are being made worldwide to find modern technical applications for biomass and contribute to its advance. (orig.)
Full Text Available Abstract Background Information Extraction (IE is a component of text mining that facilitates knowledge discovery by automatically locating instances of interesting biomedical events from huge document collections. As events are usually centred on verbs and nominalised verbs, understanding the syntactic and semantic behaviour of these words is highly important. Corpora annotated with information concerning this behaviour can constitute a valuable resource in the training of IE components and resources. Results We have defined a new scheme for annotating sentence-bound gene regulation events, centred on both verbs and nominalised verbs. For each event instance, all participants (arguments in the same sentence are identified and assigned a semantic role from a rich set of 13 roles tailored to biomedical research articles, together with a biological concept type linked to the Gene Regulation Ontology. To our knowledge, our scheme is unique within the biomedical field in terms of the range of event arguments identified. Using the scheme, we have created the Gene Regulation Event Corpus (GREC, consisting of 240 MEDLINE abstracts, in which events relating to gene regulation and expression have been annotated by biologists. A novel method of evaluating various different facets of the annotation task showed that average inter-annotator agreement rates fall within the range of 66% - 90%. Conclusion The GREC is a unique resource within the biomedical field, in that it annotates not only core relationships between entities, but also a range of other important details about these relationships, e.g., location, temporal, manner and environmental conditions. As such, it is specifically designed to support bio-specific tool and resource development. It has already been used to acquire semantic frames for inclusion within the BioLexicon (a lexical, terminological resource to aid biomedical text mining. Initial experiments have also shown that the corpus may
Cattuto, Ciro; Barrat, Alain; Baldassarri, Andrea; Schehr, Gregory; Loreto, Vittorio
The enormous increase of popularity and use of the worldwide web has led in the recent years to important changes in the ways people communicate. An interesting example of this fact is provided by the now very popular social annotation systems, through which users annotate resources (such as web pages or digital photographs) with keywords known as "tags." Understanding the rich emergent structures resulting from the uncoordinated actions of users calls for an interdisciplinary effort. In particular concepts borrowed from statistical physics, such as random walks (RWs), and complex networks theory, can effectively contribute to the mathematical modeling of social annotation systems. Here, we show that the process of social annotation can be seen as a collective but uncoordinated exploration of an underlying semantic space, pictured as a graph, through a series of RWs. This modeling framework reproduces several aspects, thus far unexplained, of social annotation, among which are the peculiar growth of the size of the vocabulary used by the community and its complex network structure that represents an externalization of semantic structures grounded in cognition and that are typically hard to access. PMID:19506244
Stank, Antonia; Richter, Stefan; Wade, Rebecca C
PRO: tein S: tructure A: nnotation T: ool-plus (ProSAT(+)) is a new web server for mapping protein sequence annotations onto a protein structure and visualizing them simultaneously with the structure. ProSAT(+) incorporates many of the features of the preceding ProSAT and ProSAT2 tools but also provides new options for the visualization and sharing of protein annotations. Data are extracted from the UniProt KnowledgeBase, the RCSB PDB and the PDBe SIFTS resource, and visualization is performed using JSmol. User-defined sequence annotations can be added directly to the URL, thus enabling visualization and easy data sharing. ProSAT(+) is available at http://prosat.h-its.org. PMID:27284084
Mayer Klaus FX
Full Text Available Abstract Background Apollo, a genome annotation viewer and editor, has become a widely used genome annotation and visualization tool for distributed genome annotation projects. When using Apollo for annotation, database updates are carried out by uploading intermediate annotation files into the respective database. This non-direct database upload is laborious and evokes problems of data synchronicity. Results To overcome these limitations we extended the Apollo data adapter with a generic, configurable web service client that is able to retrieve annotation data in a GAME-XML-formatted string and pass it on to Apollo's internal input routine. Conclusion This Apollo web service adapter, Apollo2Go, simplifies the data exchange in distributed projects and aims to render the annotation process more comfortable. The Apollo2Go software is freely available from ftp://ftpmips.gsf.de/plants/apollo_webservice.
Berardini, Tanya Z; Li, Donghui; Muller, Robert; Chetty, Raymond; Ploetz, Larry; Singh, Shanker; Wensel, April; Huala, Eva
As the scientific literature grows, leading to an increasing volume of published experimental data, so does the need to access and analyze this data using computational tools. The most commonly used method to convert published experimental data on gene function into controlled vocabulary annotations relies on a professional curator, employed by a model organism database or a more general resource such as UniProt, to read published articles and compose annotation statements based on the articles' contents. A more cost-effective and scalable approach capable of capturing gene function data across the whole range of biological research organisms in computable form is urgently needed. We have analyzed a set of ontology annotations generated through collaborations between the Arabidopsis Information Resource and several plant science journals. Analysis of the submissions entered using the online submission tool shows that most community annotations were well supported and the ontology terms chosen were at an appropriate level of specificity. Of the 503 individual annotations that were submitted, 97% were approved and community submissions captured 72% of all possible annotations. This new method for capturing experimental results in a computable form provides a cost-effective way to greatly increase the available body of annotations without sacrificing annotation quality. Database URL: www.arabidopsis.org. PMID:22859749
Pedersen, Brent S; Layer, Ryan M; Quinlan, Aaron R
The integration of genome annotations is critical to the identification of genetic variants that are relevant to studies of disease or other traits. However, comprehensive variant annotation with diverse file formats is difficult with existing methods. Here we describe vcfanno, which flexibly extracts and summarizes attributes from multiple annotation files and integrates the annotations within the INFO column of the original VCF file. By leveraging a parallel "chromosome sweeping" algorithm, we demonstrate substantial performance gains by annotating ~85,000 variants per second with 50 attributes from 17 commonly used genome annotation resources. Vcfanno is available at https://github.com/brentp/vcfanno under the MIT license. PMID:27250555
Full Text Available Abstract Background Ranked gene lists from microarray experiments are usually analysed by assigning significance to predefined gene categories, e.g., based on functional annotations. Tools performing such analyses are often restricted to a category score based on a cutoff in the ranked list and a significance calculation based on random gene permutations as null hypothesis. Results We analysed three publicly available data sets, in each of which samples were divided in two classes and genes ranked according to their correlation to class labels. We developed a program, Catmap (available for download at http://bioinfo.thep.lu.se/Catmap, to compare different scores and null hypotheses in gene category analysis, using Gene Ontology annotations for category definition. When a cutoff-based score was used, results depended strongly on the choice of cutoff, introducing an arbitrariness in the analysis. Comparing results using random gene permutations and random sample permutations, respectively, we found that the assigned significance of a category depended strongly on the choice of null hypothesis. Compared to sample label permutations, gene permutations gave much smaller p-values for large categories with many coexpressed genes. Conclusions In gene category analyses of ranked gene lists, a cutoff independent score is preferable. The choice of null hypothesis is very important; random gene permutations does not work well as an approximation to sample label permutations.
Mokeddem, Hakim; Desmoulins, Cyrille
Les raisonnements automatiques dans les EIAH basés sur les ontologies reposent le plus souvent sur des algorithmes ad hoc du fait des langages de représentations utilisés. Cet article développe autour de l'exemple du logiciel d'annotation MemoNote, les avantages de l'utilisation des technologies standards du web sémantique pour permettre des raisonnements sur les ontologies. A partir des limites de la version MemoNote actuelle basée sur le langage des Frames, l'utilisation d'une représentatio...
Guarda, Paolo; Kiyavitskaya, Nadzeya; Zannone, Nicola
The increasing complexity of software systems and growing demand for regulations compliance require effective methods and tools to support requirements analysts activities. In order to facilitate alignment of software system requirements and regulations, systematic methods and tools automating regulations analysis must be developed. This work explores applicability of the semantic annotation tool Cerno to mining of rights and obligations from European privacy directives.
Graff, Heidi Jeannet; Siersma, Volkert Dirk; Kragstrup, Jakob;
Introduction: Several studies have documented thatinternational adoptees have an increased occurrence ofhealth problems and contacts to the health-care systemafter arriving to their new country of residence. This maybe explained by pre-adoption adversities, especially for theperiod immediately...... after adoption. Our study aimed to theassess health-care utilisation of international adoptees inprimary and secondary care for somatic and psychiatricdiagnoses in a late post-adoption period. Is there an increaseduse of the health-care system in this period, evenwhen increased morbidity in the group...... of internationaladoptees is taken into consideration? Methods: This was a Danish register-based cohort studyexamining health-care utilisation in a multivariable two-partmodel. The prevalence of selected outcomes and the quantityof use were assessed in a late (year three, four and five)post-adoption period. The cohort...
Covington, William G., Jr.
This annotated bibliography presents annotations of 31 books and journal articles dealing with systems theory and its relation to organizational communication, marketing, information theory, and cybernetics. Materials were published between 1963 and 1992 and are listed alphabetically by author. (RS)
This paper describes the Linguistic Annotation Framework under development within ISO TC37 SC4 WG1. The Linguistic Annotation Framework is intended to serve as a basis for harmonizing existing language resources as well as developing new ones.
Massachusetts Dept. of Education, Boston. Bureau of Nutrition Education and School Food Services.
This annotated bibliography on nutrition is for the use of teachers at the elementary grade level. It contains a list of books suitable for reading about nutrition and foods for pupils from kindergarten through the sixth grade. Films and audiovisual presentations for classroom use are also listed. The names and addresses from which these materials…
Lindstrøm, Bo; Wells, Lisa Marie
Coloured Petri nets (CP-nets) can be used for several fundamentally different purposes like functional analysis, performance analysis, and visualisation. To be able to use the corresponding tool extensions and libraries it is sometimes necessary to include extra auxiliary information in the CP......-net. An example of such auxiliary information is a counter which is associated with a token to be able to do performance analysis. Modifying colour sets and arc inscriptions in a CP-net to support a specific use may lead to creation of several slightly different CP-nets – only to support the different...... uses of the same basic CP-net. One solution to this problem is that the auxiliary information is not integrated into colour sets and arc inscriptions of a CP-net, but is kept separately. This makes it easy to disable this auxiliary information if a CP-net is to be used for another purpose. This paper...
Bates, Eric; Hinch, Peter
The objective of this research was to ascertain the usefulness of utilising social media to enhance graduate attributes. This study was conducted during one semester and concentrated on one aspect of graduate attributes which were interview skills. Two videos were scripted, shot and edited that focused on interviews from the perspective of both the interviewer and the interviewee. These videos were incorporated into workshops with first year and second year level 8 undergraduate students. Pre...
Napel, ten, H.M.Th.D.; Bianchi, F.J.J.A.; Bestman, M.W.P.
This paper explores the potential of utilising robust crops and livestock for improving sustainability of agriculture. Two approaches for dealing with unwanted fluctuations that may influence agricultural production, such as diseases and pests, are discussed. The prevailing approach, which we call the ‘Control Model’, is to protect crops and livestock from disturbances as much as possible, to regain balance with monitoring and intervention and to look for add-on solutions only. There are a nu...
Bada Michael; Eckert Miriam; Evans Donald; Garcia Kristin; Shipley Krista; Sitnikov Dmitry; Baumgartner William A; Cohen K; Verspoor Karin; Blake Judith A; Hunter Lawrence E
Abstract Background Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text. Results This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP) community. CRA...
X.J. Wang; L. Zhang; X. Li; W.Y. Ma
Although it has been studied for years by the computer vision and machine learning communities, image annotation is still far from practical. In this paper, we propose a novel attempt at model-free image annotation, which is a data-driven approach that annotates images by mining their search results
Improving information retrieval is annotation¹s central goal. However, without sufficient planning, annotation - especially when running a robot and attaching automatically extracted content - risks producing incoherent information. The author recommends answering eight questions before you annotate. He provides a practical application of this approach, and discusses applying the questions to other systems.
Zhang, Fan; Litman, Diane
This paper explores the annotation and classification of students' revision behaviors in argumentative writing. A sentence-level revision schema is proposed to capture why and how students make revisions. Based on the proposed schema, a small corpus of student essays and revisions was annotated. Studies show that manual annotation is reliable with…
Salzberg, Steven L.
The annotation of most genomes becomes outdated over time, owing in part to our ever-improving knowledge of genomes and in part to improvements in bioinformatics software. Unfortunately, annotation is rarely if ever updated and resources to support routine reannotation are scarce. Wiki software, which would allow many scientists to edit each genome's annotation, offers one possible solution.
Next generation sequencing (NGS) has revolutionized the field of genomics and its wide range of applications has resulted in the genome-wide analysis of hundreds of species and the development of thousands of computational tools. This thesis represents my work on NGS analysis of four species, Lotus...... japonicus (Lotus), Vaccinium corymbosum (blueberry), Stegodyphus mimosarum (spider) and Trifolium occidentale (clover). From a bioinformatics data analysis perspective, my work can be divided into three parts; genome annotation, small RNA, and gene expression analysis. Lotus is a legume of significant...... agricultural and biological importance. Its capacity to form symbiotic relationships with rhizobia and microrrhizal fungi has fascinated researchers for years. Lotus has a small genome of approximately 470 Mb and a short life cycle of 2 to 3 months, which has made Lotus a model legume plant for many molecular...
Vitanovski, Dime; Schaller, Christian; Hahn, Dieter; Daum, Volker; Hornegger, Joachim
Although the medical scanners are rapidly moving towards a three-dimensional paradigm, the manipulation and annotation/labeling of the acquired data is still performed in a standard 2D environment. Editing and annotation of three-dimensional medical structures is currently a complex task and rather time-consuming, as it is carried out in 2D projections of the original object. A major problem in 2D annotation is the depth ambiguity, which requires 3D landmarks to be identified and localized in at least two of the cutting planes. Operating directly in a three-dimensional space enables the implicit consideration of the full 3D local context, which significantly increases accuracy and speed. A three-dimensional environment is as well more natural optimizing the user's comfort and acceptance. The 3D annotation environment requires the three-dimensional manipulation device and display. By means of two novel and advanced technologies, Wii Nintendo Controller and Philips 3D WoWvx display, we define an appropriate 3D annotation tool and a suitable 3D visualization monitor. We define non-coplanar setting of four Infrared LEDs with a known and exact position, which are tracked by the Wii and from which we compute the pose of the device by applying a standard pose estimation algorithm. The novel 3D renderer developed by Philips uses either the Z-value of a 3D volume, or it computes the depth information out of a 2D image, to provide a real 3D experience without having some special glasses. Within this paper we present a new framework for manipulation and annotation of medical landmarks directly in three-dimensional volume.
Metadata is used to describe documents and applications, improving information seeking and retrieval and its understanding and use. Metadata can be expressed in a wide variety of vocabularies and languages, and can be created and maintained with a variety of tools. Ontology based annotation refers to the process of creating metadata using ontologies as their vocabularies. We present similarities and differences with respect to other approaches for metadata creation, and describe languages and...
Tuarob, S.; Pouchard, L. C.; Noy, N.; Horsburgh, J. S.; Palanisamy, G.
Earth sciences have become more data-intensive, requiring access to heterogeneous data collected from multiple places, times, and thematic scales. For example, research on climate change may involve exploring and analyzing observational data such as the migration of animals and temperature shifts across the earth, as well as various model-observation inter-comparison studies. Recently, DataONE, a federated data network built to facilitate access to and preservation of environmental and ecological data, has come to exist. ONEMercury has recently been implemented as part of the DataONE project to serve as a portal for discovering and accessing environmental and observational data across the globe. ONEMercury harvests metadata from the data hosted by multiple data repositories and makes it searchable via a common search interface built upon cutting edge search engine technology, allowing users to interact with the system, intelligently filter the search results on the fly, and fetch the data from distributed data sources. Linking data from heterogeneous sources always has a cost. A problem that ONEMercury faces is the different levels of annotation in the harvested metadata records. Poorly annotated records tend to be missed during the search process as they lack meaningful keywords. Furthermore, such records would not be compatible with the advanced search functionality offered by ONEMercury as the interface requires a metadata record be semantically annotated. The explosion of the number of metadata records harvested from an increasing number of data repositories makes it impossible to annotate the harvested records manually, urging the need for a tool capable of automatically annotating poorly curated metadata records. In this paper, we propose a topic-model (TM) based approach for automatic metadata annotation. Our approach mines topics in the set of well annotated records and suggests keywords for poorly annotated records based on topic similarity. We utilize the
Cros, Marie-Josee; de Monte, Antoine; Mariette, Jérôme; Bardou, Philippe; Grenier-Boley, Benjamin; Gautheret, Daniel
The annotation of noncoding RNA genes remains a major bottleneck in genome sequencing projects. Most genome sequences released today still come with sets of tRNAs and rRNAs as the only annotated RNA elements, ignoring hundreds of other RNA families. We have developed a web environment that is dedicated to noncoding RNA (ncRNA) prediction, annotation, and analysis and allows users to run a variety of tools in an integrated and flexible manner. This environment offers complementary ncRNA gene f...
Burghardt, Manuel; Wolff, Christian
Wir diskutieren zunächst die Problematik der (syntaktischen) Annotation diachroner Korpora und stellen anschließend eine Evaluationsstudie vor, bei der mehr als 50 Annotationswerkzeuge und -frameworks vor dem Hintergrund eines funktionalen und software-ergonomischen Anforderungsprofils nach dem Qualitätsmodell von ISO/IEC 9126-1:2001 (Software engineering – Product quality – Part 1: Quality model) und ISO/IEC 25000:2005 (Software Engineering – Software product Quality Requirements and Evaluat...
Methods to assess, access and treat pathology withinthe gastrointestinal tract continue to evolve with videoendoscopy replacing radiology as the gold standard.Whilst endoscope technology develops further with theadvent of newer higher resolution chips, an array ofadjuncts has been developed to enhance endoscopy inother ways; most notable is the use of magnets. Magnetsare utilised in many areas, ranging from endoscopictraining, lesion resection, aiding manoeuvrability ofcapsule endoscopes, to assisting in easy placement oftubes for nutritional feeding. Some of these are still at anexperimental stage, whilst others are being increasinglyincorporated in our everyday practice.
Loh, Xian Jun; Lee, Tung-Chun; Dou, Qingqing; Deen, G Roshan
The delivery of genetic materials into cells to elicit cellular responses has been extensively studied by biomaterials scientists globally. Many materials such as lipids, peptides, viruses, synthetically modified cationic polymers and certain inorganic nanomaterials could be used to complex the negatively charged plasmids and deliver the formed package into cells. The recent literature on the delivery of genetic materials utilising inorganic nanoparticles is carefully examined in this review. We have picked out the most relevant references and concisely summarised the findings with illustrated examples. We further propose alternative approaches and suggest future pathways towards the practical use of multifunctional nanocarriers. PMID:26484365
Xin, Jiwen; Mark, Adam; Afrasiabi, Cyrus; Tsueng, Ginger; Juchler, Moritz; Gopal, Nikhil; Stupp, Gregory S.; Putman, Timothy E.; Ainscough, Benjamin J.; Griffith, Obi L.; Torkamani, Ali; Whetzel, Patricia L.; Mungall, Christopher J.; Mooney, Sean D; Su, Andrew I
Efficient tools for data management and integration are essential for many aspects of high-throughput biology. In particular, annotations of genes and human genetic variants are commonly used but highly fragmented across many resources. Here, we describe MyGene.info and MyVariant.info, high-performance web services for querying gene and variant annotation information. These web services are currently accessed more than three million times permonth. They also demonstrate a generalizable cloud-...
Letunic, I.; Goodstadt, L.; Dickens, N.J.; Doerks, T.; Schultz, J; R. Mott; Ciccarelli, F.; Copley, R. R.; Ponting, C. P.; Bork, P.
SMART (Simple Modular Architecture Research Tool, http://smart.embl-heidelberg.de) is a web-based resource used for the annotation of protein domains and the analysis of domain architectures, with particular emphasis on mobile eukaryotic domains. Extensive annotation for each domain family is available, providing information relating to function, subcellular localization, phyletic distribution and tertiary structure. The January 2002 release has added more than 200 hand-curated domain models....
Kamps, Jaap; Karlgren, Jussi; Schenkel, Ralf
There is an increasing amount of structure on the Web as a result of modern Web lan- guages, user tagging and annotation, and emerging robust NLP tools. These meaningful, semantic, annotations hold the promise to significantly enhance information access, by en- hancing the depth of analysis of today’s systems. Currently, we have only started exploring the possibilities and only begin to understand how these valuable semantic cues can be put to fruitful use. The workshop had an interactive for...
Kamps, Jaap; Karlgren, Jussi; Schenkel, Ralf
There is an increasing amount of structure on the Web as a result of modern Web lan- guages, user tagging and annotation, and emerging robust NLP tools. These meaningful, semantic, annotations hold the promise to significantly enhance information access, by en- hancing the depth of analysis of today?s systems. Currently, we have only started exploring the possibilities and only begin to understand how these valuable semantic cues can be put to fruitful use. The workshop had an interactive for...
Prokopidis, Prokopis; Papavassiliou, Vassilis; Toral, Antonio; Poch Riera, Marc; Frontini, Francesca; Rubino, Francesco; Thurmair, Gregor
PANACEA WP4 targets the creation of a Corpus Acquisition and Annotation (CAA) subsystem for the acquisition and processing of monolingual and bilingual language resources (LRs). The CAA subsystem consists of tools that have been integrated as web services in the PANACEA platform of LR production. D4.2 Initial functional prototype and documentation in T13 and D4.4 Report on the revised Corpus Acquisition & Annotation subsystem and its components in T23 provided initial and updated documentatio...
Prokopidis, Prokopis; Papavassiliou, Vassilis; Toral, Antonio; Poch, Marc; Frontini, Francesca; Rubino, Francesco; Thurmair, Gregor
PANACEA WP4 targets the creation of a Corpus Acquisition and Annotation (CAA) subsystem for the acquisition and processing of monolingual and bilingual language resources (LRs). The CAA subsystem consists of tools that have been integrated as web services in the PANACEA platform of LR production. D4.2 Initial functional prototype and documentation in T13 and D4.4 Report on the revised Corpus Acquisition & Annotation subsystem and its components in T23 provided initial and updated documenta...
Legeai, Fabrice; Shigenobu, Shuji; Gauthier, Jean-Pierre; Colbourne, John; Rispe, Claude; Collin, Olivier; Richards, Stephen; Wilson, Alex C. C.; Tagu, Denis
AphidBase is a centralized bioinformatic resource that was developed to facilitate community annotation of the pea aphid genome by the International Aphid Genomics Consortium (IAGC). The AphidBase Information System designed to organize and distribute genomic data and annotations for a large international community was constructed using open source software tools from the Generic Model Organism Database (GMOD). The system includes Apollo and GBrowse utilities as well as a wiki, blast search c...
Ball, Elaine C
The paper examines hand-written annotation, its many features, difficulties and strengths as a feedback tool. It extends and clarifies what modest evidence is in the public domain and offers an evaluation of how to use annotation effectively in the support of student feedback [Marshall, C.M., 1998a. The Future of Annotation in a Digital (paper) World. Presented at the 35th Annual GLSLIS Clinic: Successes and Failures of Digital Libraries, June 20-24, University of Illinois at Urbana-Champaign, March 24, pp. 1-20; Marshall, C.M., 1998b. Toward an ecology of hypertext annotation. Hypertext. In: Proceedings of the Ninth ACM Conference on Hypertext and Hypermedia, June 20-24, Pittsburgh Pennsylvania, US, pp. 40-49; Wolfe, J.L., Nuewirth, C.M., 2001. From the margins to the centre: the future of annotation. Journal of Business and Technical Communication, 15(3), 333-371; Diyanni, R., 2002. One Hundred Great Essays. Addison-Wesley, New York; Wolfe, J.L., 2002. Marginal pedagogy: how annotated texts affect writing-from-source texts. Written Communication, 19(2), 297-333; Liu, K., 2006. Annotation as an index to critical writing. Urban Education, 41, 192-207; Feito, A., Donahue, P., 2008. Minding the gap annotation as preparation for discussion. Arts and Humanities in Higher Education, 7(3), 295-307; Ball, E., 2009. A participatory action research study on handwritten annotation feedback and its impact on staff and students. Systemic Practice and Action Research, 22(2), 111-124; Ball, E., Franks, H., McGrath, M., Leigh, J., 2009. Annotation is a valuable tool to enhance learning and assessment in student essays. Nurse Education Today, 29(3), 284-291]. Although a significant number of studies examine annotation, this is largely related to on-line tools and computer mediated communication and not hand-written annotation as comment, phrase or sign written on the student essay to provide critique. Little systematic research has been conducted to consider how this latter form
Full Text Available Abstract Background Analysis of any newly sequenced bacterial genome starts with the identification of protein-coding genes. Despite the accumulation of multiple complete genome sequences, which provide useful comparisons with close relatives among other organisms during the annotation process, accurate gene prediction remains quite difficult. A major reason for this situation is that genes are tightly packed in prokaryotes, resulting in frequent overlap. Thus, detection of translation initiation sites and/or selection of the correct coding regions remain difficult unless appropriate biological knowledge (about the structure of a gene is imbedded in the approach. Results We have developed a new program that automatically identifies biologically significant candidate genes in a bacterial genome. Twenty-six complete prokaryotic genomes were analyzed using this tool, and the accuracy of gene finding was assessed by comparison with existing annotations. This analysis revealed that, despite the enormous effort of genome program annotators, a small but not negligible number of genes annotated within the framework of sequencing projects are likely to be partially inaccurate or plainly wrong. Moreover, the analysis of several putative new genes shows that, as expected, many short genes have escaped annotation. In most cases, these new genes revealed frameshifts that could be either artifacts or genuine frameshifts. Some entirely unexpected new genes have also been identified. This allowed us to get a more complete picture of prokaryotic genomes. The results of this procedure are progressively integrated into the SWISS-PROT reference databank. Conclusions The results described in the present study show that our procedure is very satisfactory in terms of gene finding accuracy. Except in few cases, discrepancies between our results and annotations provided by individual authors can be accounted for by the nature of each annotation process or by specific
Kaptein, R.; Koot, G.; Huis in 't Veld, M.A.A.; Broek, E.L. van den
Web search engines are optimized for early precision, which makes it difficult to perform recall-oriented tasks using these search engines. In this article, we present our tool Needle Custom Search. This tool exploits semantic annotations of Web search results and, thereby, increase the efficiency o
Kaptein, Rianne; Koot, Gijs; Huis in 't Veld, Mirjam A.A.; Broek, van den Egon L.; Rijke, de Maarten; Kenter, Tom; Vries, de Arjen P.; Zhai, Chen Xiang; Jong, de Franciska; Radinsky, Kira; Hofmann, Katja
Web search engines are optimized for early precision, which makes it difficult to perform recall-oriented tasks using these search engines. In this article, we present our tool Needle Custom Search. This tool exploits semantic annotations of Web search results and, thereby, increase the efficiency o
Kaptein, Rianne; Koot, Gijs; Huis in 't Veld, Mirjam A.A.; van den Broek, Egon L.
Web search engines are optimized for early precision, which makes it difficult to perform recall-oriented tasks using these search engines. In this article, we present our tool Needle Custom Search. This tool exploits semantic annotations of Web search results and, thereby, increase the efficiency o
Harel, Arye; Dalah, Irina; Pietrokovski, Shmuel; Safran, Marilyn; Lancet, Doron
Technological Omics breakthroughs, including next generation sequencing, bring avalanches of data which need to undergo effective data management to ensure integrity, security, and maximal knowledge-gleaning. Data management system requirements include flexible input formats, diverse data entry mechanisms and views, user friendliness, attention to standards, hardware and software platform definition, as well as robustness. Relevant solutions elaborated by the scientific community include Laboratory Information Management Systems (LIMS) and standardization protocols facilitating data sharing and managing. In project planning, special consideration has to be made when choosing relevant Omics annotation sources, since many of them overlap and require sophisticated integration heuristics. The data modeling step defines and categorizes the data into objects (e.g., genes, articles, disorders) and creates an application flow. A data storage/warehouse mechanism must be selected, such as file-based systems and relational databases, the latter typically used for larger projects. Omics project life cycle considerations must include the definition and deployment of new versions, incorporating either full or partial updates. Finally, quality assurance (QA) procedures must validate data and feature integrity, as well as system performance expectations. We illustrate these data management principles with examples from the life cycle of the GeneCards Omics project (http://www.genecards.org), a comprehensive, widely used compendium of annotative information about human genes. For example, the GeneCards infrastructure has recently been changed from text files to a relational database, enabling better organization and views of the growing data. Omics data handling benefits from the wealth of Web-based information, the vast amount of public domain software, increasingly affordable hardware, and effective use of data management and annotation principles as outlined in this chapter
Mueen, A.; Zainuddin, R.; Baba, M. Sapiyan
Image retrieval at the semantic level mostly depends on image annotation or image classification. Image annotation performance largely depends on three issues: (1) automatic image feature extraction; (2) a semantic image concept modeling; (3) algorithm for semantic image annotation. To address first issue, multilevel features are extracted to construct the feature vector, which represents the contents of the image. To address second issue, domain-dependent concept hierarchy is constructed for...
This chapter from The Library Marketing Toolkit introduces several essential tools which every library needs as part of their marketing toolkit. These include developing the library website (including mobile version), utilising word-of-mouth promotion, obtaining and interpreting user feedback, perfecting the elevator pitch, and signs and displays. It includes case studies from David Lee King, and Aaron Tay, Rebecca Jones.
Did you ever read something on a book, felt the need to comment, took up a pencil and scribbled something on the books' text'? If you did, you just annotated a book. But that process has now become something fundamental and revolutionary in these days of computing. Annotation is all about adding further information to text, pictures, movies and even to physical objects. In practice, anything which can be identified either virtually or physically can be annotated. In this book, we will delve into what makes annotations, and analyse their significance for the future evolutions of the web. We wil
Senk, D.; Babich, A.; Gudenau, H.W. [Rhein Westfal TH Aachen, Aachen (Germany)
Wastes and dusts from steel industry, non-ferrous metallurgy and other branches can be utilised e.g. in agglomeration processes (sintering, pelletising or briquetting) and by injection into shaft furnaces. This paper deals with the second way. Combustion and reduction behaviour of iron- and carbon-rich metallurgical dusts and sludges containing lead, zinc and alkali as well as other wastes with and without pulverised coal (PC) has been studied when injecting into shaft furnaces. Following shaft furnaces have been examined: blast furnace, cupola furnace, OxiCup furnace and imperial-smelting furnace. Investigations have been done at laboratory and industrial scale. Some dusts and wastes under certain conditions can be not only reused but can also improve combustion efficiency at the tuyeres as well as furnace performance and productivity.
The climate impact from the use of peat for energy production in Sweden has been evaluated in terms of contribution to atmospheric radiative forcing. This was done by attempting to answer the question 'What will be the climate impact if one would use 1 m2 of mire for peat extraction during 20 years?'. Two different methods of after-treatment were studied: afforestation and restoration of wetland. The climate impact from a peatland - wetland energy scenario and a peatland - forestry energy scenario was compared to the climate impact from coal, natural gas and forest residues. Sensitivity analyses were performed to evaluate which parameters that are important to take into consideration in order to minimize the climate impact from peat utilisation
Satoshi Tamaki; Kazuharu Arakawa; Nobuaki Kono; Masaru Tomita
Annotations of complete genome sequences submitted directly from sequencing projects are diverse in terms of annotation strategies and update frequencies. These inconsistencies make comparative studies difficult. To allow rapid data preparation of a large number of complete genomes, automation and speed are important for genome re-annotation. Here we introduce an open-source rapid genome re-annotation software system, Restauro-G, specialized for bacterial genomes. Restauro-G re-annotates a genome by similarity searches utilizing the BLAST-Like Alignment Tool, referring to protein databases such as UniProt KB, NCBI nr, NCBI COGs, Pfam, and PSORTb. Re-annotation by Restauro-G achieved over 98% accuracy for most bacterial chromosomes in comparison with the original manually curated annotation of EMBL releases. Restauro-G was developed in the generic bioinformatics workbench G-language Genome Analysis Environment and is distributed at http://restauro-g.iab.keio.ac.jp/ under the GNU General Public License.
Brown Alfred L
Full Text Available Abstract Background Annotations that describe the function of sequences are enormously important to researchers during laboratory investigations and when making computational inferences. However, there has been little investigation into the data quality of sequence function annotations. Here we have developed a new method of estimating the error rate of curated sequence annotations, and applied this to the Gene Ontology (GO sequence database (GOSeqLite. This method involved artificially adding errors to sequence annotations at known rates, and used regression to model the impact on the precision of annotations based on BLAST matched sequences. Results We estimated the error rate of curated GO sequence annotations in the GOSeqLite database (March 2006 at between 28% and 30%. Annotations made without use of sequence similarity based methods (non-ISS had an estimated error rate of between 13% and 18%. Annotations made with the use of sequence similarity methodology (ISS had an estimated error rate of 49%. Conclusion While the overall error rate is reasonably low, it would be prudent to treat all ISS annotations with caution. Electronic annotators that use ISS annotations as the basis of predictions are likely to have higher false prediction rates, and for this reason designers of these systems should consider avoiding ISS annotations where possible. Electronic annotators that use ISS annotations to make predictions should be viewed sceptically. We recommend that curators thoroughly review ISS annotations before accepting them as valid. Overall, users of curated sequence annotations from the GO database should feel assured that they are using a comparatively high quality source of information.
Ge, Dongliang; Ruzzo, Elizabeth K.; Shianna, Kevin V.; He, Min; Pelak, Kimberly; Heinzen, Erin L.; Need, Anna C.; Cirulli, Elizabeth T.; Maia, Jessica M.; Dickson, Samuel P.; Zhu, Mingfu; Singh, Abanish; Allen, Andrew S.; Goldstein, David B.
Summary: Here we present Sequence Variant Analyzer (SVA), a software tool that assigns a predicted biological function to variants identified in next-generation sequencing studies and provides a browser to visualize the variants in their genomic contexts. SVA also provides for flexible interaction with software implementing variant association tests allowing users to consider both the bioinformatic annotation of identified variants and the strength of their associations with studied traits. We illustrate the annotation features of SVA using two simple examples of sequenced genomes that harbor Mendelian mutations. Availability and implementation: Freely available on the web at http://www.svaproject.org. Contact: firstname.lastname@example.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21624899
Full Text Available Abstract Background Uncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools. Protein functions, often referred to as its annotations, are believed to manifest themselves through topology of the networks of inter-proteins interactions. In particular, there is a growing body of evidence that proteins performing the same function are more likely to interact with each other than with proteins with other functions. However, since functional annotation and protein network topology are often studied separately, the direct relationship between them has not been comprehensively demonstrated. In addition to having the general biological significance, such demonstration would further validate the data extraction and processing methods used to compose protein annotation and protein-protein interactions datasets. Results We developed a method for automatic extraction of protein functional annotation from scientific text based on the Natural Language Processing (NLP technology. For the protein annotation extracted from the entire PubMed, we evaluated the precision and recall rates, and compared the performance of the automatic extraction technology to that of manual curation used in public Gene Ontology (GO annotation. In the second part of our presentation, we reported a large-scale investigation into the correspondence between communities in the literature-based protein networks and GO annotation groups of functionally related proteins. We found a comprehensive two-way match: proteins within biological annotation groups form significantly denser linked network clusters than expected by chance and, conversely, densely linked network communities exhibit a pronounced non-random overlap with GO groups. We also expanded the publicly available GO biological process annotation using the relations extracted by our NLP technology
Symeou, V; Leinonen, I; Kyriazakis, I
Simulation models of nutrient utilisation ignore that variation in pig system components can influence the predicted mean and variance of the performance of a group of pigs. The objective of this study was to develop a methodology to investigate how variation in feed composition would (a) affect the outputs of a nutrient utilisation model and (b) interact with variation that arises from the traits of individual pigs. We used a P intake and utilisation model to address these characteristics. Introduction of stochasticity gave rise to a number of methodological challenges--for example, how to generate variation in both feed composition and pigs and account for correlations between ingredients when modelling variation associated with feed mixing efficiency. Introducing variation in feed composition and pig phenotype resulted in moderate decreases in mean digested, retained and excreted P predicted for a population of pigs and an increase in their associated CV. A lower percentage of pigs in the population were predicted to meet their requirements during the feeding period considered, by comparison with the no-variation scenario. Variation in feed ingredient composition contributed more to performance variation than variation due to mixing efficiency. When variations in both feed composition and pig traits were considered, it was the former rather than the latter that had the dominant influence on variability in pig performance. The developed framework emphasises the consequences of random variability on the predictions of nutrient utilisation models. Such consequences will have a significant impact on decisions about management strategies such as feeding that are subject to variation. PMID:26608351
Full Text Available The increasing complexity of software engineering requires effective methods and tools to support requirements analysts' activities. While much of a company's knowledge can be found in text repositories, current content management systems have limited capabilities for structuring and interpreting documents. In this context, we propose a tool for transforming text documents describing users' requirements to an UML model. The presented tool uses Natural Language Processing (NLP and semantic rules to generate an UML class diagram. The main contribution of our tool is to provide assistance to designers facilitating the transition from a textual description of user requirements to their UML diagrams based on GATE (General Architecture of Text by formulating necessary rules that generate new semantic annotations.
We compare the costs of semantic annotation of textual documents to its benefits for information processing tasks. Semantic annotation can improve the performance of retrieval tasks and facilitates an improved search experience through faceted search, focused retrieval, better document summaries, an
Full Text Available Towards an event annotated corpus of PolishThe paper presents a typology of events built on the basis of TimeML specification adapted to Polish language. Some changes were introduced to the definition of the event categories and a motivation for event categorization was formulated. The event annotation task is presented on two levels – ontology level (language independent and text mentions (language dependant. The various types of event mentions in Polish text are discussed. A procedure for annotation of event mentions in Polish texts is presented and evaluated. In the evaluation a randomly selected set of documents from the Corpus of Wrocław University of Technology (called KPWr was annotated by two linguists and the annotator agreement was calculated. The evaluation was done in two iterations. After the first evaluation we revised and improved the annotation procedure. The second evaluation showed a significant improvement of the agreement between annotators. The current work was focused on annotation and categorisation of event mentions in text. The future work will be focused on description of event with a set of attributes, arguments and relations.
Kügler, Frank; Smolibocki, Bernadett; Arnold, Denis;
easier since German intonation is currently annotated according to different models. To this end, we aim to provide guidelines that are easy to learn. The guidelines were evaluated running an inter-annotator reliability study on three different speech styles (read speech, monologue and dialogue...
Mardanbeigi, Diako; Qvarfordt, Pernilla
To facilitate distributed communication in mobile settings, we developed GazeNote for creating and sharing gaze annotations in head mounted displays (HMDs). With gaze annotations it possible to point out objects of interest within an image and add a verbal description. To create an annota- tion, ...
Fromreide, Hege; Hovy, Dirk; Søgaard, Anders
We present two new NER datasets for Twitter; a manually annotated set of 1,467 tweets (kappa=0.942) and a set of 2,975 expert-corrected, crowdsourced NER annotated tweets from the dataset described in Finin et al. (2010). In our experiments with these datasets, we observe two important points: (a...
Martínez Alonso, Héctor; Pedersen, Bolette Sandford; Bel, Núria
We present the result of an annotation task on regular polysemy for a series of seman- tic classes or dot types in English, Dan- ish and Spanish. This article describes the annotation process, the results in terms of inter-encoder agreement, and the sense distributions obtained with two methods...
Lin, Jian-Wei; Lai, Yuan-Cheng
This paper harnesses collaborative annotations by students as learning feedback on online formative assessments to improve the learning achievements of students. Through the developed Web platform, students can conduct formative assessments, collaboratively annotate, and review historical records in a convenient way, while teachers can generate…
Full Text Available Abstract Background Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text. Results This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released. Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement. Conclusions As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens, our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection, the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are
Huntley, Rachael P; Sawford, Tony; Mutowo-Meullenet, Prudence; Shypitsyna, Aleksandra; Bonilla, Carlos; Martin, Maria J.; O'Donovan, Claire
The Gene Ontology Annotation (GOA) resource (http://www.ebi.ac.uk/GOA) provides evidence-based Gene Ontology (GO) annotations to proteins in the UniProt Knowledgebase (UniProtKB). Manual annotations provided by UniProt curators are supplemented by manual and automatic annotations from model organism databases and specialist annotation groups. GOA currently supplies 368 million GO annotations to almost 54 million proteins in more than 480 000 taxonomic groups. The resource now provides annotat...
Muiste, P.; Tullus, H.; Uri, V. [Estonian Agricultural University, Tartu (Estonia)
In the end of the Soviet period in the 1980s, a long-term energy programme for Estonia was worked out. The energy system was planned to be based on nuclear power and the share of domestic alternative sources of energy was low. The situation has greatly changed after the re-establishment of the Estonian independence, and now wood and peat fuels play an important role in the energy system. Energy consumption in Estonia decreased during the period 1970-1993, but this process has less influenced the consumption of domestic renewable fuels - peat and wood. It means that the share of these fuels has grown. The investment on substitution of imported fossil fuels and on conversion of boiler plants from fossil fuels to domestic fuels has reached the level of USD 100 million. The perspectives of the wood energy depend mainly on two factors; the resources and the price of wood energy compared with other fuels. The situation in wood market influences both the possible quantities and the price. It is typical that the quickly growing cost of labour power in Estonia is greatly affecting the price of energy wood. Though the price level of fuel peat and wood chips is lower than the world market price today, the conditions for using biofuels could be more favourable, if higher environmental fees were introduced. In conjunction with increasing utilisation of biofuels it is important to evaluate possible emissions or removal of greenhouse gases from Estonian forests 3 refs.
In the end of the Soviet period in the 1980s, a long-term energy programme for Estonia was worked out. The energy system was planned to be based on nuclear power and the share of domestic alternative sources of energy was low. The situation has greatly changed after the re-establishment of the Estonian independence, and now wood and peat fuels play an important role in the energy system. Energy consumption in Estonia decreased during the period 1970-1993, but this process has less influenced the consumption of domestic renewable fuels - peat and wood. It means that the share of these fuels has grown. The investment on substitution of imported fossil fuels and on conversion of boiler plants from fossil fuels to domestic fuels has reached the level of USD 100 million. The perspectives of the wood energy depend mainly on two factors; the resources and the price of wood energy compared with other fuels. The situation in wood market influences both the possible quantities and the price. It is typical that the quickly growing cost of labour power in Estonia is greatly affecting the price of energy wood. Though the price level of fuel peat and wood chips is lower than the world market price today, the conditions for using biofuels could be more favourable, if higher environmental fees were introduced. In conjunction with increasing utilisation of biofuels it is important to evaluate possible emissions or removal of greenhouse gases from Estonian forests 3 refs
Abha S Bais; Steffen Grossmann; Martin Vingron
Identifying transcription factor binding sites (TFBSs) is essential to elucidate putative regulatory mechanisms. A common strategy is to combine cross-species conservation with single sequence TFBS annotation to yield ``conserved TFBSs”. Most current methods in this field adopt a multi-step approach that segregates the two aspects. Again, it is widely accepted that the evolutionary dynamics of binding sites differ from those of the surrounding sequence. Hence, it is desirable to have an approach that explicitly takes this factor into account. Although a plethora of approaches have been proposed for the prediction of conserved TFBSs, very few explicitly model TFBS evolutionary properties, while additionally being multi-step. Recently, we introduced a novel approach to simultaneously align and annotate conserved TFBSs in a pair of sequences. Building upon the standard Smith-Waterman algorithm for local alignments, SimAnn introduces additional states for profiles to output extended alignments or annotated alignments. That is, alignments with parts annotated as gaplessly aligned TFBSs (pair-profile hits) are generated. Moreover, the pair-profile related parameters are derived in a sound statistical framework. In this article, we extend this approach to explicitly incorporate evolution of binding sites in the SimAnn framework. We demonstrate the extension in the theoretical derivations through two position-specific evolutionary models, previously used for modelling TFBS evolution. In a simulated setting, we provide a proof of concept that the approach works given the underlying assumptions, as compared to the original work. Finally, using a real dataset of experimentally verified binding sites in human-mouse sequence pairs, we compare the new approach (eSimAnn) to an existing multi-step tool that also considers TFBS evolution. Although it is widely accepted that binding sites evolve differently from the surrounding sequences, most comparative TFBS identification
Full Text Available Abstract Background Olea europaea L. is a traditional tree crop of the Mediterranean basin with a worldwide economical high impact. Differently from other fruit tree species, little is known about the physiological and molecular basis of the olive fruit development and a few sequences of genes and gene products are available for olive in public databases. This study deals with the identification of large sets of differentially expressed genes in developing olive fruits and the subsequent computational annotation by means of different software. Results mRNA from fruits of the cv. Leccino sampled at three different stages [i.e., initial fruit set (stage 1, completed pit hardening (stage 2 and veraison (stage 3] was used for the identification of differentially expressed genes putatively involved in main processes along fruit development. Four subtractive hybridization libraries were constructed: forward and reverse between stage 1 and 2 (libraries A and B, and 2 and 3 (libraries C and D. All sequenced clones (1,132 in total were analyzed through BlastX against non-redundant NCBI databases and about 60% of them showed similarity to known proteins. A total of 89 out of 642 differentially expressed unique sequences was further investigated by Real-Time PCR, showing a validation of the SSH results as high as 69%. Library-specific cDNA repertories were annotated according to the three main vocabularies of the gene ontology (GO: cellular component, biological process and molecular function. BlastX analysis, GO terms mapping and annotation analysis were performed using the Blast2GO software, a research tool designed with the main purpose of enabling GO based data mining on sequence sets for which no GO annotation is yet available. Bioinformatic analysis pointed out a significantly different distribution of the annotated sequences for each GO category, when comparing the three fruit developmental stages. The olive fruit-specific transcriptome dataset was
Ide, Nancy; Erjavec, Tomaz
It is widely recognized that the proliferation of annotation schemes runs counter to the need to re-use language resources, and that standards for linguistic annotation are becoming increasingly mandatory. To answer this need, we have developed a framework comprised of an abstract model for a variety of different annotation types (e.g., morpho-syntactic tagging, syntactic annotation, co-reference annotation, etc.), which can be instantiated in different ways depending on the annotator's approach and goals. In this paper we provide an overview of the framework, demonstrate its applicability to syntactic annotation, and show how it can contribute to comparative evaluation of parser output and diverse syntactic annotation schemes.
Seyed, P.; Chastain, K.; McGuinness, D. L.
library of vocabularies to assist the user in locating terms to describe observed entities, their properties, and relationships. The Annotator leverages vocabulary definitions of these concepts to guide the user in describing data in a logically consistent manner. The vocabularies made available through the Annotator are open, as is the Annotator itself. We have taken a step towards making semantic annotation/translation of data more accessible. Our vision for the Annotator is as a tool that can be integrated into a semantic data 'workbench' environment, which would allow semantic annotation of a variety of data formats, using standard vocabularies. These vocabularies involved enable search for similar datasets, and integration with any semantically-enabled applications for analysis and visualization.
Kiani, Saad Liaquat; Antonopoulos, Nick; Knappmeyer, Michael; Baker, Nigel; McClatchey, Richard
Ubiquitous computing environments are characterised by smart, interconnected artefacts embedded in our physical world that are projected to provide useful services to human inhabitants unobtrusively. Mobile devices are becoming the primary tools of human interaction with these embedded artefacts and utilisation of services available in smart computing environments such as clouds. Advancements in capabilities of mobile devices allow a number of user and environment related context consumers to be hosted on these devices. Without a coordinating component, these context consumers and providers are a potential burden on device resources; specifically the effect of uncoordinated computation and communication with cloud-enabled services can negatively impact the battery life. Therefore energy conservation is a major concern in realising the collaboration and utilisation of mobile device based context-aware applications and cloud based services. This paper presents the concept of a context-brokering component to aid...
Reactome describes biological pathways as chemical reactions that closely mirror the actual physical interactions that occur in the cell. Recent extensions of our data model accommodate the annotation of cancer and other disease processes. First, we have extended our class of protein modifications to accommodate annotation of changes in amino acid sequence and the formation of fusion proteins to describe the proteins involved in disease processes. Second, we have added a disease attribute to reaction, pathway, and physical entity classes that uses disease ontology terms. To support the graphical representation of “cancer” pathways, we have adapted our Pathway Browser to display disease variants and events in a way that allows comparison with the wild type pathway, and shows connections between perturbations in cancer and other biological pathways. The curation of pathways associated with cancer, coupled with our efforts to create other disease-specific pathways, will interoperate with our existing pathway and network analysis tools. Using the Epidermal Growth Factor Receptor (EGFR) signaling pathway as an example, we show how Reactome annotates and presents the altered biological behavior of EGFR variants due to their altered kinase and ligand-binding properties, and the mode of action and specificity of anti-cancer therapeutics
Oropesa Morales, Lester Arturo; Montoya Obeso, Abraham; Hernández García, Rosaura; Cocolán Almeda, Sara Ivonne; García Vázquez, Mireya Saraí; Benois-Pineau, Jenny; Zamudio Fuentes, Luis Miguel; Martinez Nuño, Jesús A.; Ramírez Acosta, Alejandro Alvaro
Multimedia content production and storage in repositories are now an increasingly widespread practice. Indexing concepts for search in multimedia libraries are very useful for users of the repositories. However the search tools of content-based retrieval and automatic video tagging, still do not have great consistency. Regardless of how these systems are implemented, it is of vital importance to possess lots of videos that have concepts tagged with ground truth (training and testing sets). This paper describes a novel methodology to make complex annotations on video resources through ELAN software. The concepts are annotated and related to Mexican nature in a High Level Features (HLF) from development set of TRECVID 2014 in a collaborative environment. Based on this set, each nature concept observed is tagged on each video shot using concepts of the TRECVid 2014 dataset. We also propose new concepts, -like tropical settings, urban scenes, actions, events, weather, places for name a few. We also propose specific concepts that best describe video content of Mexican culture. We have been careful to get the database tagged with concepts of nature and ground truth. It is evident that a collaborative environment is more suitable for annotation of concepts related to ground truth and nature. As a result a Mexican nature database was built. It also is the basis for testing and training sets to automatically classify new multimedia content of Mexican nature.
Kuwahara, Noriaki; Kuwabara, Kazuhiro; Abe, Shinji; Susami, Kenji; Yasuda, Kiyoshi
Providing good home-based care to people with dementia is becoming an important issue as the size of the elderly population increases. One of the main problems in providing such care is that it must be constantly provided without interruption, and this puts a great burden on caregivers, who are often family members. Networked Interaction Therapy is the name we call our methods designed to relieve the stress of people suffering from dementia as well as that of their family members. This therapy aims to provide a system that interacts with people with dementia by utilizing various engaging stimuli. One such stimulus is a reminiscence video created from old photo albums, which is a promising way to hold a dementia sufferer's attention for a long time. In this paper, we present an authoring tool to assist in the production of a reminiscence video by using photo annotations. We conducted interviews with several video creators on how they used photo annotations such as date, title and subject of photos when they produced the reminiscence videos. According to the creators' comments, we have defined an ontology for representing the creators' knowledge of how to add visual effects to a reminiscence video. Subsequently, we developed an authoring tool that automatically produces a reminiscence video from the annotated photos. Subjective evaluation of the quality of reminiscence videos produced with our tool indicates that they give impressions similar to those produced by creators using conventional video editing software. The effectiveness of presenting such a video to people with dementia is also discussed.
Sanderson, Robert [Los Alamos National Laboratory; Van De Sompel, Herbert [Los Alamos National Laboratory
As Digital Libraries (DL) become more aligned with the web architecture, their functional components need to be fundamentally rethought in terms of URIs and HTTP. Annotation, a core scholarly activity enabled by many DL solutions, exhibits a clearly unacceptable characteristic when existing models are applied to the web: due to the representations of web resources changing over time, an annotation made about a web resource today may no longer be relevant to the representation that is served from that same resource tomorrow. We assume the existence of archived versions of resources, and combine the temporal features of the emerging Open Annotation data model with the capability offered by the Memento framework that allows seamless navigation from the URI of a resource to archived versions of that resource, and arrive at a solution that provides guarantees regarding the persistence of web annotations over time. More specifically, we provide theoretical solutions and proof-of-concept experimental evaluations for two problems: reconstructing an existing annotation so that the correct archived version is displayed for all resources involved in the annotation, and retrieving all annotations that involve a given archived version of a web resource.
Full Text Available Glass is a widely used product throughout the world; it is versatile, durable and reliable. The uses of glass ranges drastically, therefore waste glass is discarded, stockpiled or land filled. About million tons of waste glass is generated and around large percent of this glass is disposed of in landfills. This pattern has influenced environmental organizations to pressure the professional community to lower the amount of glass being discarded as well as find use to the non-recycled glass in new applications. In relation, the recycling of waste glass as a component in concrete gives waste glass a sustainable alternative to land filling and therefore makes it economically viable.The proposed study of utilising waste glass powder(GLP in concrete as partial replacement of cement as well as the use of crushed glass particles(CGP retained on 1.18mm & 2.36mm IS sieve as a partial replacement to sand, which offers important benefits related to strength of concrete as well as it is eco-friendly. Recycling of mixed-colour waste glass possesses major problems for municipalities, and this problem can be greatly eliminated by re-using waste glass as sand/cement replacement in concrete. Moreover, re-using waste materials in construction can reduce the demand on the sources of primary materials.In this project the attempts have been made to partially replace the cement as well as sand by waste glass powder and crushed glass particles with equal combination by 5% interval up to 20% replacement and observe its effect on the strength of concrete after 7 days and 28 days of curing.
Although coal may be viewed as a dirty fuel due to its high greenhouse emissions when combusted, a strong case can be made for coal to be a major world source of clean H2 energy. Apart from the fact that resources of coal will outlast oil and natural gas by centuries, there is a shift towards developing environmentally benign coal technologies, which can lead to high energy conversion efficiencies and low air pollution emissions as compared to conventional coal fired power generation plant. There are currently several world research and industrial development projects in the areas of Integrated Gasification Combined Cycles (IGCC) and Integrated Gasification Fuel Cell (IGFC) systems. In such systems, there is a need to integrate complex unit operations including gasifiers, gas separation and cleaning units, water gas shift reactors, turbines, heat exchangers, steam generators and fuel cells. IGFC systems tested in the USA, Europe and Japan employing gasifiers (Texaco, Lurgi and Eagle) and fuel cells have resulted in energy conversions at efficiency of 47.5% (HHV) which is much higher than the 30-35% efficiency of conventional coal fired power generation. Solid oxide fuel cells (SOFC) and molten carbonate fuel cells (MCFC) are the front runners in energy production from coal gases. These fuel cells can operate at high temperatures and are robust to gas poisoning impurities. IGCC and IGFC technologies are expensive and currently economically uncompetitive as compared to established and mature power generation technology. However, further efficiency and technology improvements coupled with world pressures on limitation of greenhouse gases and other gaseous pollutants could make IGCC/IGFC technically and economically viable for hydrogen production and utilisation in clean and environmentally benign energy systems. (author)
Full Text Available Abstract Background Visualization of sequence annotation is a common feature in many bioinformatics tools. For many applications it is desirable to restrict the display of such annotation according to a score cutoff, as biological interpretation can be difficult in the presence of the entire data. Unfortunately, many visualisation solutions are somewhat static in the way they handle such score cutoffs. Results We present BioSAVE, a sequence annotation viewer with on-the-fly selection of visualisation thresholds for each feature. BioSAVE is a versatile OS X program for visual display of scored features (annotation within a sequence context. The program reads sequence and additional supplementary annotation data (e.g., position weight matrix matches, conservation scores, structural domains from a variety of commonly used file formats and displays them graphically. Onscreen controls then allow for live customisation of these graphics, including on-the-fly selection of visualisation thresholds for each feature. Conclusion Possible applications of the program include display of transcription factor binding sites in a genomic context or the visualisation of structural domain assignments in protein sequences and many more. The dynamic visualisation of these annotations is useful, e.g., for the determination of cutoff values of predicted features to match experimental data. Program, source code and exemplary files are freely available at the BioSAVE homepage.
Pafilis, Evangelos; Buttigieg, Pier Luigi; Ferrell, Barbra; Pereira, Emiliano; Schnetzer, Julia; Arvanitidis, Christos; Jensen, Lars Juhl
The microbial and molecular ecology research communities have made substantial progress on developing standards for annotating samples with environment metadata. However, sample manual annotation is a highly labor intensive process and requires familiarity with the terminologies used. We have therefore developed an interactive annotation tool, EXTRACT, which helps curators identify and extract standard-compliant terms for annotation of metagenomic records and other samples. Behind its web-based user interface, the system combines published methods for named entity recognition of environment, organism, tissue and disease terms. The evaluators in the BioCreative V Interactive Annotation Task found the system to be intuitive, useful, well documented and sufficiently accurate to be helpful in spotting relevant text passages and extracting organism and environment terms. Comparison of fully manual and text-mining-assisted curation revealed that EXTRACT speeds up annotation by 15-25% and helps curators to detect terms that would otherwise have been missed. Database URL: https://extract.hcmr.gr/. PMID:26896844
Steven N. Hart
Full Text Available Objective. Bringing together genomics, transcriptomics, proteomics, and other -omics technologies is an important step towards developing highly personalized medicine. However, instrumentation has advances far beyond expectations and now we are able to generate data faster than it can be interpreted. Materials and Methods. We have developed PANDA (Pathway AND Annotation Explorer, a visualization tool that integrates gene-level annotation in the context of biological pathways to help interpret complex data from disparate sources. PANDA is a web-based application that displays data in the context of well-studied pathways like KEGG, BioCarta, and PharmGKB. PANDA represents data/annotations as icons in the graph while maintaining the other data elements (i.e., other columns for the table of annotations. Custom pathways from underrepresented diseases can be imported when existing data sources are inadequate. PANDA also allows sharing annotations among collaborators. Results. In our first use case, we show how easy it is to view supplemental data from a manuscript in the context of a user’s own data. Another use-case is provided describing how PANDA was leveraged to design a treatment strategy from the somatic variants found in the tumor of a patient with metastatic sarcomatoid renal cell carcinoma. Conclusion. PANDA facilitates the interpretation of gene-centric annotations by visually integrating this information with context of biological pathways. The application can be downloaded or used directly from our website: http://bioinformaticstools.mayo.edu/research/panda-viewer/.
Quinlan, D; Schordan, M; Vuduc, R; Yi, Q
This paper discusses the features of an annotation language that we believe to be essential for optimizing user-defined abstractions. These features should capture semantics of function, data, and object-oriented abstractions, express abstraction equivalence (e.g., a class represents an array abstraction), and permit extension of traditional compiler optimizations to user-defined abstractions. Our future work will include developing a comprehensive annotation language for describing the semantics of general object-oriented abstractions, as well as automatically verifying and inferring the annotated semantics.
Firth Andrew E
Full Text Available Abstract Background The many Hepadnaviridae sequences available have widely varied functional annotation. The genomes are very compact (~3.2 kb but contain multiple layers of functional regulatory elements in addition to coding regions. Key regions are subject to purifying selection, as mutations in these regions will produce non-functional viruses. Results These genomic sequences have been organized into a structured database to facilitate research at the molecular level. HBVRegDB is a comparative genomic analysis tool with an integrated underlying sequence database. The database contains genomic sequence data from representative viruses. In addition to INSDC and RefSeq annotation, HBVRegDB also contains expert and systematically calculated annotations (e.g. promoters and comparative genome analysis results (e.g. blastn, tblastx. It also contains analyses based on curated HBV alignments. Information about conserved regions – including primary conservation (e.g. CDS-Plotcon and RNA secondary structure predictions (e.g. Alidot – is integrated into the database. A large amount of data is graphically presented using the GBrowse (Generic Genome Browser adapted for analysis of viral genomes. Flexible query access is provided based on any annotated genomic feature. Novel regulatory motifs can be found by analysing the annotated sequences. Conclusion HBVRegDB serves as a knowledge database and as a comparative genomic analysis tool for molecular biologists investigating HBV. It is publicly available and complementary to other viral and HBV focused datasets and tools http://hbvregdb.otago.ac.nz. The availability of multiple and highly annotated sequences of viral genomes in one database combined with comparative analysis tools facilitates detection of novel genomic elements.
Bromberg, Yana; Yachdav, Guy; Ofran, Yanay; Schneider, Reinhard; Rost, Burkhard
The rapidly increasing quantity of protein sequence data continues to widen the gap between available sequences and annotations. Comparative modeling suggests some aspects of the 3D structures of approximately half of all known proteins; homology- and network-based inferences annotate some aspect of function for a similar fraction of the proteome. For most known protein sequences, however, there is detailed knowledge about neither their function nor their structure. Comprehensive efforts towards the expert curation of sequence annotations have failed to meet the demand of the rapidly increasing number of available sequences. Only the automated prediction of protein function in the absence of homology can close the gap between available sequences and annotations in the foreseeable future. This review focuses on two novel methods for automated annotation, and briefly presents an outlook on how modern web software may revolutionize the field of protein sequence annotation. First, predictions of protein binding sites and functional hotspots, and the evolution of these into the most successful type of prediction of protein function from sequence will be discussed. Second, a new tool, comprehensive in silico mutagenesis, which contributes important novel predictions of function and at the same time prepares for the onset of the next sequencing revolution, will be described. While these two new sub-fields of protein prediction represent the breakthroughs that have been achieved methodologically, it will then be argued that a different development might further change the way biomedical researchers benefit from annotations: modern web software can connect the worldwide web in any browser with the 'Deep Web' (ie, proprietary data resources). The availability of this direct connection, and the resulting access to a wealth of data, may impact drug discovery and development more than any existing method that contributes to protein annotation. PMID:19396742
The bibliography contains 845 separate numbered items which deal with uranium mining in Australia during the period 1970-1987, which it was feasible to annotate, which are publicly available, and which are not of a highly technical nature. The bibliography is not restricted to material originating in Australia. The items are organised into nine major subject areas on the basis of their principal subject matter, with cross references being added in cases where more than one subject area is dealt with. The nine sections deal with the development and structure of the Australian uranium industry; the uranium debate; uranium policies; uranium and Aborigines; economic issues; domestic processing and utilisation of Australian uranium; environmental issues; nuclear proliferation and safeguards; and the major individual uranium projects. The bibliography is preceded by a chapter on its scope, organisation and sources and by an overview providing background information on the nuclear fuel cycle, uranium in Australia and Australian uranium policy and is followed by an author index
Juan José Monedero Moya; Daniel Cebrián Robles; Philip Desenne
The worldwide boom in digital video may be one of the reasons behind the exponential growth of MOOC courses. The evaluation of a MOOC requires a great degree of multimedia and collaborative interaction. Given that videos are one of the main elements in these courses, it would be interesting to work on innovations that would allow users to interact with multimedia and collaborative activities within the videos. This paper is part of a collaboration project whose main objective is ...
Kronk, Gary W
Meteor showers are among the most spectacular celestial events that may be observed by the naked eye, and have been the object of fascination throughout human history. In “Meteor Showers: An Annotated Catalog,” the interested observer can access detailed research on over 100 annual and periodic meteor streams in order to capitalize on these majestic spectacles. Each meteor shower entry includes details of their discovery, important observations and orbits, and gives a full picture of duration, location in the sky, and expected hourly rates. Armed with a fuller understanding, the amateur observer can better view and appreciate the shower of their choice. The original book, published in 1988, has been updated with over 25 years of research in this new and improved edition. Almost every meteor shower study is expanded, with some original minor showers being dropped while new ones are added. The book also includes breakthroughs in the study of meteor showers, such as accurate predictions of outbursts as well ...
Yuan, Pingpeng; Wang, Guoyin; Zhang, Qin; Jin, Hai
Due to ambiguity, search engines for scientific literatures may not return right search results. One efficient solution to the problems is to automatically annotate literatures and attach the semantic information to them. Generally, semantic annotation requires identifying entities before attaching semantic information to them. However, due to abbreviation and other reasons, it is very difficult to identify entities correctly. The paper presents a Semantic Annotation System for Literature (SASL), which utilizes Wikipedia as knowledge base to annotate literatures. SASL mainly attaches semantic to terminology, academic institutions, conferences, and journals etc. Many of them are usually abbreviations, which induces ambiguity. Here, SASL uses regular expressions to extract the mapping between full name of entities and their abbreviation. Since full names of several entities may map to a single abbreviation, SASL introduces Hidden Markov Model to implement name disambiguation. Finally, the paper presents the experimental results, which confirm SASL a good performance.
Collaborative tagging systems, such as del.icio.us, CiteULike, and others, allow users to annotate objects, e.g., Web pages or scientific papers, with descriptive labels called tags. The social annotations, contributed by thousands of users, can potentially be used to infer categorical knowledge, classify documents or recommend new relevant information. Traditional text inference methods do not make best use of socially-generated data, since they do not take into account variations in individual users' perspectives and vocabulary. In a previous work, we introduced a simple probabilistic model that takes interests of individual annotators into account in order to find hidden topics of annotated objects. Unfortunately, our proposed approach had a number of shortcomings, including overfitting, local maxima and the requirement to specify values for some parameters. In this paper we address these shortcomings in two ways. First, we extend the model to a fully Bayesian framework. Second, we describe an infinite ver...
HEISS, ANN M.; AND OTHERS
THIS ANNOTATED BIBLIOGRAPHY CONTAINS REFERENCES TO GENERAL GRADUATE EDUCATION AND TO EDUCATION FOR THE FOLLOWING PROFESSIONAL FIELDS--ARCHITECTURE, BUSINESS, CLINICAL PSYCHOLOGY, DENTISTRY, ENGINEERING, LAW, LIBRARY SCIENCE, MEDICINE, NURSING, SOCIAL WORK, TEACHING, AND THEOLOGY. (HW)
Full Text Available Electronic health records and scientific articles possess differing linguistic characteristics that may impact the performance of natural language processing tools developed for one or the other. In this paper, we investigate the performance of four extant concept recognition tools: the clinical Text Analysis and Knowledge Extraction System (cTAKES, the National Center for Biomedical Ontology (NCBO Annotator, the Biomedical Concept Annotation System (BeCAS and MetaMap. Each of the four concept recognition systems is applied to four different corpora: the i2b2 corpus of clinical documents, a PubMed corpus of Medline abstracts, a clinical trails corpus and the ShARe/CLEF corpus. In addition, we assess the individual system performances with respect to one gold standard annotation set, available for the ShARe/CLEF corpus. Furthermore, we built a silver standard annotation set from the individual systems' output and assess the quality as well as the contribution of individual systems to the quality of the silver standard. Our results demonstrate that mainly the NCBO annotator and cTAKES contribute to the silver standard corpora (F1-measures in the range of 21% to 74% and their quality (best F1-measure of 33%, independent from the type of text investigated. While BeCAS and MetaMap can contribute to the precision of silver standard annotations (precision of up to 42%, the F1-measure drops when combined with NCBO Annotator and cTAKES due to a low recall. In conclusion, the performances of individual systems need to be improved independently from the text types, and the leveraging strategies to best take advantage of individual systems' annotations need to be revised. The textual content of the PubMed corpus, accession numbers for the clinical trials corpus, and assigned annotations of the four concept recognition systems as well as the generated silver standard annotation sets are available from http://purl.org/phenotype/resources. The textual content
Oellrich, Anika; Collier, Nigel; Smedley, Damian; Groza, Tudor
Electronic health records and scientific articles possess differing linguistic characteristics that may impact the performance of natural language processing tools developed for one or the other. In this paper, we investigate the performance of four extant concept recognition tools: the clinical Text Analysis and Knowledge Extraction System (cTAKES), the National Center for Biomedical Ontology (NCBO) Annotator, the Biomedical Concept Annotation System (BeCAS) and MetaMap. Each of the four concept recognition systems is applied to four different corpora: the i2b2 corpus of clinical documents, a PubMed corpus of Medline abstracts, a clinical trails corpus and the ShARe/CLEF corpus. In addition, we assess the individual system performances with respect to one gold standard annotation set, available for the ShARe/CLEF corpus. Furthermore, we built a silver standard annotation set from the individual systems' output and assess the quality as well as the contribution of individual systems to the quality of the silver standard. Our results demonstrate that mainly the NCBO annotator and cTAKES contribute to the silver standard corpora (F1-measures in the range of 21% to 74%) and their quality (best F1-measure of 33%), independent from the type of text investigated. While BeCAS and MetaMap can contribute to the precision of silver standard annotations (precision of up to 42%), the F1-measure drops when combined with NCBO Annotator and cTAKES due to a low recall. In conclusion, the performances of individual systems need to be improved independently from the text types, and the leveraging strategies to best take advantage of individual systems' annotations need to be revised. The textual content of the PubMed corpus, accession numbers for the clinical trials corpus, and assigned annotations of the four concept recognition systems as well as the generated silver standard annotation sets are available from http://purl.org/phenotype/resources. The textual content of the Sh
Full Text Available In classical proteome analyses, final experimental data are (a images of 2D protein separations obtained by gel electrophoresis and (b corresponding lists of proteins which were identified by mass spectrometry (MS. For data annotation, software tools were developed which allow to link protein identity data directly to 2D gels (clickable gels. GelMap is a new online software tool to annotate 2D protein maps. It allows (i functional annotation of all identified proteins according to biological categories defined by the user, e.g. subcellular localization, metabolic pathway, or assignment to a protein complex and (ii annotation of several proteins per analyzed protein spot according to MS primary data. Options to differentially display proteins of functional categories offer new opportunities for data evaluation. For instance, if used for the annotation of 2D Blue native / SDS gels, GelMap allows identifying protein complexes of low abundance. A web portal has been established for presentation and evaluation of protein identity data related to 2D gels and is freely accessible at http://www.gelmap.de/.
Kveiborg, Ole; Abate, Megersa Abera
Purpose This chapter discusses a central aspect of freight transport – capacity utilisation with a link to empty running of commercial freight vehicles. Methodology/approach The paper provides an overview of the literature on these topics and groups the contributions into two segments according to...... their analytical approach and origin of research. Findings The first approach looks at utilisation based on economic theories such as the firms’ objective to maximise profitability and considers how various firm and haul (market) characteristics influence utilisation. The second approach stems from the...... transport modelling literature and its main aim is analysing vehicle movement and usage in transport demand modelling context. A strand of this second group of contributions is the modelling of trip-chain and its implication on the level of capacity utilisation. Research limitations The review is not a...
Optimal radiotherapy utilisation rate (RTU) is the proportion of all cancer cases that should receive radiotherapy. Optimal RTU was estimated for 9 Middle Income Countries as part of a larger IAEA project to better understand RTU and stage distribution
Stamou, G.; Ossenbruggen, J.R.; Pan, J; Schreiber, A.T.
Multimedia in all forms (images, video, graphics, music, speech) is exploding on the Web. The content needs to be annotated and indexed to enable effective search and retrieval. However, recent standards and best practices for multimedia metadata don't provide semantically rich descriptions of multimedia content. On the other hand, the World Wide Web Consortium's (W3C's) Semantic Web effort has been making great progress in advancing techniques for annotating semantics of Web resources. To br...