annotation landscape analysis: Topics by WorldWideScience.org

Sample records for annotation landscape analysis

Semantic annotation in biomedicine: the current landscape.

Science.gov (United States)

Jovanović, Jelena; Bagheri, Ebrahim

2017-09-22

The abundance and unstructured nature of biomedical texts, be it clinical or research content, impose significant challenges for the effective and efficient use of information and knowledge stored in such texts. Annotation of biomedical documents with machine intelligible semantics facilitates advanced, semantics-based text management, curation, indexing, and search. This paper focuses on annotation of biomedical entity mentions with concepts from relevant biomedical knowledge bases such as UMLS. As a result, the meaning of those mentions is unambiguously and explicitly defined, and thus made readily available for automated processing. This process is widely known as semantic annotation, and the tools that perform it are known as semantic annotators.Over the last dozen years, the biomedical research community has invested significant efforts in the development of biomedical semantic annotation technology. Aiming to establish grounds for further developments in this area, we review a selected set of state of the art biomedical semantic annotators, focusing particularly on general purpose annotators, that is, semantic annotation tools that can be customized to work with texts from any area of biomedicine. We also examine potential directions for further improvements of today's annotators which could make them even more capable of meeting the needs of real-world applications. To motivate and encourage further developments in this area, along the suggested and/or related directions, we review existing and potential practical applications and benefits of semantic annotators.
Annotate-it: a Swiss-knife approach to annotation, analysis and interpretation of single nucleotide variation in human disease.

Science.gov (United States)

Sifrim, Alejandro; Van Houdt, Jeroen Kj; Tranchevent, Leon-Charles; Nowakowska, Beata; Sakai, Ryo; Pavlopoulos, Georgios A; Devriendt, Koen; Vermeesch, Joris R; Moreau, Yves; Aerts, Jan

2012-01-01

The increasing size and complexity of exome/genome sequencing data requires new tools for clinical geneticists to discover disease-causing variants. Bottlenecks in identifying the causative variation include poor cross-sample querying, constantly changing functional annotation and not considering existing knowledge concerning the phenotype. We describe a methodology that facilitates exploration of patient sequencing data towards identification of causal variants under different genetic hypotheses. Annotate-it facilitates handling, analysis and interpretation of high-throughput single nucleotide variant data. We demonstrate our strategy using three case studies. Annotate-it is freely available and test data are accessible to all users at http://www.annotate-it.org.
Analysis of LYSA-calculus with explicit confidentiality annotations

DEFF Research Database (Denmark)

Gao, Han; Nielson, Hanne Riis

2006-01-01

Recently there has been an increased research interest in applying process calculi in the verification of cryptographic protocols due to their ability to formally model protocols. This work presents LYSA with explicit confidentiality annotations for indicating the expected behavior of target...... malicious activities performed by attackers as specified by the confidentiality annotations. The proposed analysis approach is fully automatic without the need of human intervention and has been applied successfully to a number of protocols....
Annotating spatio-temporal datasets for meaningful analysis in the Web

Science.gov (United States)

Stasch, Christoph; Pebesma, Edzer; Scheider, Simon

2014-05-01

More and more environmental datasets that vary in space and time are available in the Web. This comes along with an advantage of using the data for other purposes than originally foreseen, but also with the danger that users may apply inappropriate analysis procedures due to lack of important assumptions made during the data collection process. In order to guide towards a meaningful (statistical) analysis of spatio-temporal datasets available in the Web, we have developed a Higher-Order-Logic formalism that captures some relevant assumptions in our previous work [1]. It allows to proof on meaningful spatial prediction and aggregation in a semi-automated fashion. In this poster presentation, we will present a concept for annotating spatio-temporal datasets available in the Web with concepts defined in our formalism. Therefore, we have defined a subset of the formalism as a Web Ontology Language (OWL) pattern. It allows capturing the distinction between the different spatio-temporal variable types, i.e. point patterns, fields, lattices and trajectories, that in turn determine whether a particular dataset can be interpolated or aggregated in a meaningful way using a certain procedure. The actual annotations that link spatio-temporal datasets with the concepts in the ontology pattern are provided as Linked Data. In order to allow data producers to add the annotations to their datasets, we have implemented a Web portal that uses a triple store at the backend to store the annotations and to make them available in the Linked Data cloud. Furthermore, we have implemented functions in the statistical environment R to retrieve the RDF annotations and, based on these annotations, to support a stronger typing of spatio-temporal datatypes guiding towards a meaningful analysis in R. [1] Stasch, C., Scheider, S., Pebesma, E., Kuhn, W. (2014): "Meaningful spatial prediction and aggregation", Environmental Modelling & Software, 51, 149-165.
annot8r: GO, EC and KEGG annotation of EST datasets

Directory of Open Access Journals (Sweden)

Schmid Ralf

2008-04-01

Full Text Available Abstract Background The expressed sequence tag (EST methodology is an attractive option for the generation of sequence data for species for which no completely sequenced genome is available. The annotation and comparative analysis of such datasets poses a formidable challenge for research groups that do not have the bioinformatics infrastructure of major genome sequencing centres. Therefore, there is a need for user-friendly tools to facilitate the annotation of non-model species EST datasets with well-defined ontologies that enable meaningful cross-species comparisons. To address this, we have developed annot8r, a platform for the rapid annotation of EST datasets with GO-terms, EC-numbers and KEGG-pathways. Results annot8r automatically downloads all files relevant for the annotation process and generates a reference database that stores UniProt entries, their associated Gene Ontology (GO, Enzyme Commission (EC and Kyoto Encyclopaedia of Genes and Genomes (KEGG annotation and additional relevant data. For each of GO, EC and KEGG, annot8r extracts a specific sequence subset from the UniProt dataset based on the information stored in the reference database. These three subsets are then formatted for BLAST searches. The user provides the protein or nucleotide sequences to be annotated and annot8r runs BLAST searches against these three subsets. The BLAST results are parsed and the corresponding annotations retrieved from the reference database. The annotations are saved both as flat files and also in a relational postgreSQL results database to facilitate more advanced searches within the results. annot8r is integrated with the PartiGene suite of EST analysis tools. Conclusion annot8r is a tool that assigns GO, EC and KEGG annotations for data sets resulting from EST sequencing projects both rapidly and efficiently. The benefits of an underlying relational database, flexibility and the ease of use of the program make it ideally suited for non
Gene coexpression network analysis as a source of functional annotation for rice genes.

Directory of Open Access Journals (Sweden)

Kevin L Childs

Full Text Available With the existence of large publicly available plant gene expression data sets, many groups have undertaken data analyses to construct gene coexpression networks and functionally annotate genes. Often, a large compendium of unrelated or condition-independent expression data is used to construct gene networks. Condition-dependent expression experiments consisting of well-defined conditions/treatments have also been used to create coexpression networks to help examine particular biological processes. Gene networks derived from either condition-dependent or condition-independent data can be difficult to interpret if a large number of genes and connections are present. However, algorithms exist to identify modules of highly connected and biologically relevant genes within coexpression networks. In this study, we have used publicly available rice (Oryza sativa gene expression data to create gene coexpression networks using both condition-dependent and condition-independent data and have identified gene modules within these networks using the Weighted Gene Coexpression Network Analysis method. We compared the number of genes assigned to modules and the biological interpretability of gene coexpression modules to assess the utility of condition-dependent and condition-independent gene coexpression networks. For the purpose of providing functional annotation to rice genes, we found that gene modules identified by coexpression analysis of condition-dependent gene expression experiments to be more useful than gene modules identified by analysis of a condition-independent data set. We have incorporated our results into the MSU Rice Genome Annotation Project database as additional expression-based annotation for 13,537 genes, 2,980 of which lack a functional annotation description. These results provide two new types of functional annotation for our database. Genes in modules are now associated with groups of genes that constitute a collective functional
Re-annotation and re-analysis of the Campylobacter jejuni NCTC11168 genome sequence

Directory of Open Access Journals (Sweden)

Dorrell Nick

2007-06-01

Full Text Available Abstract Background Campylobacter jejuni is the leading bacterial cause of human gastroenteritis in the developed world. To improve our understanding of this important human pathogen, the C. jejuni NCTC11168 genome was sequenced and published in 2000. The original annotation was a milestone in Campylobacter research, but is outdated. We now describe the complete re-annotation and re-analysis of the C. jejuni NCTC11168 genome using current database information, novel tools and annotation techniques not used during the original annotation. Results Re-annotation was carried out using sequence database searches such as FASTA, along with programs such as TMHMM for additional support. The re-annotation also utilises sequence data from additional Campylobacter strains and species not available during the original annotation. Re-annotation was accompanied by a full literature search that was incorporated into the updated EMBL file [EMBL: AL111168]. The C. jejuni NCTC11168 re-annotation reduced the total number of coding sequences from 1654 to 1643, of which 90.0% have additional information regarding the identification of new motifs and/or relevant literature. Re-annotation has led to 18.2% of coding sequence product functions being revised. Conclusions Major updates were made to genes involved in the biosynthesis of important surface structures such as lipooligosaccharide, capsule and both O- and N-linked glycosylation. This re-annotation will be a key resource for Campylobacter research and will also provide a prototype for the re-annotation and re-interpretation of other bacterial genomes.
AIGO: Towards a unified framework for the Analysis and the Inter-comparison of GO functional annotations

Directory of Open Access Journals (Sweden)

Defoin-Platel Michael

2011-11-01

Full Text Available Abstract Background In response to the rapid growth of available genome sequences, efforts have been made to develop automatic inference methods to functionally characterize them. Pipelines that infer functional annotation are now routinely used to produce new annotations at a genome scale and for a broad variety of species. These pipelines differ widely in their inference algorithms, confidence thresholds and data sources for reasoning. This heterogeneity makes a comparison of the relative merits of each approach extremely complex. The evaluation of the quality of the resultant annotations is also challenging given there is often no existing gold-standard against which to evaluate precision and recall. Results In this paper, we present a pragmatic approach to the study of functional annotations. An ensemble of 12 metrics, describing various aspects of functional annotations, is defined and implemented in a unified framework, which facilitates their systematic analysis and inter-comparison. The use of this framework is demonstrated on three illustrative examples: analysing the outputs of state-of-the-art inference pipelines, comparing electronic versus manual annotation methods, and monitoring the evolution of publicly available functional annotations. The framework is part of the AIGO library (http://code.google.com/p/aigo for the Analysis and the Inter-comparison of the products of Gene Ontology (GO annotation pipelines. The AIGO library also provides functionalities to easily load, analyse, manipulate and compare functional annotations and also to plot and export the results of the analysis in various formats. Conclusions This work is a step toward developing a unified framework for the systematic study of GO functional annotations. This framework has been designed so that new metrics on GO functional annotations can be added in a very straightforward way.
Semi-supervised learning based probabilistic latent semantic analysis for automatic image annotation

Institute of Scientific and Technical Information of China (English)

Tian Dongping

2017-01-01

In recent years, multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas, especially for automatic image annotation, whose purpose is to provide an efficient and effective searching environment for users to query their images more easily.In this paper, a semi-supervised learning based probabilistic latent semantic analysis ( PL-SA) model for automatic image annotation is presenred.Since it' s often hard to obtain or create la-beled images in large quantities while unlabeled ones are easier to collect, a transductive support vector machine ( TSVM) is exploited to enhance the quality of the training image data.Then, differ-ent image features with different magnitudes will result in different performance for automatic image annotation.To this end, a Gaussian normalization method is utilized to normalize different features extracted from effective image regions segmented by the normalized cuts algorithm so as to reserve the intrinsic content of images as complete as possible.Finally, a PLSA model with asymmetric mo-dalities is constructed based on the expectation maximization( EM) algorithm to predict a candidate set of annotations with confidence scores.Extensive experiments on the general-purpose Corel5k dataset demonstrate that the proposed model can significantly improve performance of traditional PL-SA for the task of automatic image annotation.
Essential Requirements for Digital Annotation Systems

Directory of Open Access Journals (Sweden)

ADRIANO, C. M.

2012-06-01

Full Text Available Digital annotation systems are usually based on partial scenarios and arbitrary requirements. Accidental and essential characteristics are usually mixed in non explicit models. Documents and annotations are linked together accidentally according to the current technology, allowing for the development of disposable prototypes, but not to the support of non-functional requirements such as extensibility, robustness and interactivity. In this paper we perform a careful analysis on the concept of annotation, studying the scenarios supported by digital annotation tools. We also derived essential requirements based on a classification of annotation systems applied to existing tools. The analysis performed and the proposed classification can be applied and extended to other type of collaborative systems.
Analysis and comparison of very large metagenomes with fast clustering and functional annotation

Directory of Open Access Journals (Sweden)

Li Weizhong

2009-10-01

Full Text Available Abstract Background The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely time-consuming, and there are often many novel sequences in these metagenomes that are not fully utilized. The number of available metagenomes is rapidly increasing, so fast and efficient metagenome comparison methods are in great demand. Results The new metagenomic data analysis method Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (RAMMCAP was developed using an ultra-fast sequence clustering algorithm, fast protein family annotation tools, and a novel statistical metagenome comparison method that employs a unique graphic interface. RAMMCAP processes extremely large datasets with only moderate computational effort. It identifies raw read clusters and protein clusters that may include novel gene families, and compares metagenomes using clusters or functional annotations calculated by RAMMCAP. In this study, RAMMCAP was applied to the two largest available metagenomic collections, the "Global Ocean Sampling" and the "Metagenomic Profiling of Nine Biomes". Conclusion RAMMCAP is a very fast method that can cluster and annotate one million metagenomic reads in only hundreds of CPU hours. It is available from http://tools.camera.calit2.net/camera/rammcap/.
Methods for integrated modeling of landscape change: Interior Northwest Landscape Analysis System.

Science.gov (United States)

Jane L. Hayes; Alan. A. Ager; R. James Barbour

2004-01-01

The Interior Northwest Landscape Analysis System (INLAS) links a number of resource, disturbance, and landscape simulations models to examine the interactions of vegetative succession, management, and disturbance with policy goals. The effects of natural disturbance like wildfire, herbivory, forest insects and diseases, as well as specific management actions are...
Algal Functional Annotation Tool: a web-based analysis suite to functionally interpret large gene lists using integrated annotation and expression data

Directory of Open Access Journals (Sweden)

Merchant Sabeeha S

2011-07-01

Full Text Available Abstract Background Progress in genome sequencing is proceeding at an exponential pace, and several new algal genomes are becoming available every year. One of the challenges facing the community is the association of protein sequences encoded in the genomes with biological function. While most genome assembly projects generate annotations for predicted protein sequences, they are usually limited and integrate functional terms from a limited number of databases. Another challenge is the use of annotations to interpret large lists of 'interesting' genes generated by genome-scale datasets. Previously, these gene lists had to be analyzed across several independent biological databases, often on a gene-by-gene basis. In contrast, several annotation databases, such as DAVID, integrate data from multiple functional databases and reveal underlying biological themes of large gene lists. While several such databases have been constructed for animals, none is currently available for the study of algae. Due to renewed interest in algae as potential sources of biofuels and the emergence of multiple algal genome sequences, a significant need has arisen for such a database to process the growing compendiums of algal genomic data. Description The Algal Functional Annotation Tool is a web-based comprehensive analysis suite integrating annotation data from several pathway, ontology, and protein family databases. The current version provides annotation for the model alga Chlamydomonas reinhardtii, and in the future will include additional genomes. The site allows users to interpret large gene lists by identifying associated functional terms, and their enrichment. Additionally, expression data for several experimental conditions were compiled and analyzed to provide an expression-based enrichment search. A tool to search for functionally-related genes based on gene expression across these conditions is also provided. Other features include dynamic visualization of
Lynx web services for annotations and systems analysis of multi-gene disorders.

Science.gov (United States)

Sulakhe, Dinanath; Taylor, Andrew; Balasubramanian, Sandhya; Feng, Bo; Xie, Bingqing; Börnigen, Daniela; Dave, Utpal J; Foster, Ian T; Gilliam, T Conrad; Maltsev, Natalia

2014-07-01

Lynx is a web-based integrated systems biology platform that supports annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Lynx has integrated multiple classes of biomedical data (genomic, proteomic, pathways, phenotypic, toxicogenomic, contextual and others) from various public databases as well as manually curated data from our group and collaborators (LynxKB). Lynx provides tools for gene list enrichment analysis using multiple functional annotations and network-based gene prioritization. Lynx provides access to the integrated database and the analytical tools via REST based Web Services (http://lynx.ci.uchicago.edu/webservices.html). This comprises data retrieval services for specific functional annotations, services to search across the complete LynxKB (powered by Lucene), and services to access the analytical tools built within the Lynx platform. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
GSV Annotated Bibliography

Energy Technology Data Exchange (ETDEWEB)

Roberts, Randy S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Pope, Paul A. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Jiang, Ming [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Trucano, Timothy G. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Aragon, Cecilia R. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Ni, Kevin [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Wei, Thomas [Argonne National Lab. (ANL), Argonne, IL (United States); Chilton, Lawrence K. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Bakel, Alan [Argonne National Lab. (ANL), Argonne, IL (United States)

2010-09-14

The following annotated bibliography was developed as part of the geospatial algorithm verification and validation (GSV) project for the Simulation, Algorithms and Modeling program of NA-22. Verification and Validation of geospatial image analysis algorithms covers a wide range of technologies. Papers in the bibliography are thus organized into the following five topic areas: Image processing and analysis, usability and validation of geospatial image analysis algorithms, image distance measures, scene modeling and image rendering, and transportation simulation models. Many other papers were studied during the course of the investigation including. The annotations for these articles can be found in the paper "On the verification and validation of geospatial image analysis algorithms".
Objective-guided image annotation.

Science.gov (United States)

Mao, Qi; Tsang, Ivor Wai-Hung; Gao, Shenghua

2013-04-01

Automatic image annotation, which is usually formulated as a multi-label classification problem, is one of the major tools used to enhance the semantic understanding of web images. Many multimedia applications (e.g., tag-based image retrieval) can greatly benefit from image annotation. However, the insufficient performance of image annotation methods prevents these applications from being practical. On the other hand, specific measures are usually designed to evaluate how well one annotation method performs for a specific objective or application, but most image annotation methods do not consider optimization of these measures, so that they are inevitably trapped into suboptimal performance of these objective-specific measures. To address this issue, we first summarize a variety of objective-guided performance measures under a unified representation. Our analysis reveals that macro-averaging measures are very sensitive to infrequent keywords, and hamming measure is easily affected by skewed distributions. We then propose a unified multi-label learning framework, which directly optimizes a variety of objective-specific measures of multi-label learning tasks. Specifically, we first present a multilayer hierarchical structure of learning hypotheses for multi-label problems based on which a variety of loss functions with respect to objective-guided measures are defined. And then, we formulate these loss functions as relaxed surrogate functions and optimize them by structural SVMs. According to the analysis of various measures and the high time complexity of optimizing micro-averaging measures, in this paper, we focus on example-based measures that are tailor-made for image annotation tasks but are seldom explored in the literature. Experiments show consistency with the formal analysis on two widely used multi-label datasets, and demonstrate the superior performance of our proposed method over state-of-the-art baseline methods in terms of example-based measures on four
Technology Trends Analysis Using Patent Landscaping

Directory of Open Access Journals (Sweden)

Sergey Vsevolodovich Kortov

2017-09-01

Full Text Available The article is devoted to the analysis and the choice of the priorities in technology development and, particularly, to the use of patent landscaping as a tool for the study of technology trends. Currently, patent activity indicators are often used for technology foresight and for competitive intelligence as well. Nevertheless, causal relationship between these indicators, on the one hand, and strategic and tactical decisions in the sphere of technological development on meso- and microeconomic level, on the other hand, are not adequately investigated to solve practical tasks. The goal of the work is to systemize the challenges of technology trends analysis, which could be effectively solved on the base of patent landscape analysis. The article analyses the patent landscaping methodology and tools, as well as their use for evaluating the current competitive environment and technology foresight. The authors formulated the generalized classification for the criteria of promising technologies for a selected region. To assess the compliance of a technology with these criteria, we propose a system of corresponding indicators of patenting activity. Using the proposed methodology, we have analysed the patent landscape to select promising technologies for the Sverdlovsk region. The research confirmed the hypothesis of the patent landscapes performance in evaluating such technology indicators as stages of the life cycle stage, universality (applicability in different industries, pace of worldwide development, innovations and science availability in the region and potential possibilities for scientific collaboration with international research institutions and universities. The results of the research may be useful to the wide audience, including representatives small and medium enterprises, large companies and regional authorities for the tasks concerned with the technology trends analysis and technology strategy design
Annotating non-coding regions of the genome.

Science.gov (United States)

Alexander, Roger P; Fang, Gang; Rozowsky, Joel; Snyder, Michael; Gerstein, Mark B

2010-08-01

Most of the human genome consists of non-protein-coding DNA. Recently, progress has been made in annotating these non-coding regions through the interpretation of functional genomics experiments and comparative sequence analysis. One can conceptualize functional genomics analysis as involving a sequence of steps: turning the output of an experiment into a 'signal' at each base pair of the genome; smoothing this signal and segmenting it into small blocks of initial annotation; and then clustering these small blocks into larger derived annotations and networks. Finally, one can relate functional genomics annotations to conserved units and measures of conservation derived from comparative sequence analysis.
Landscape function analysis as a base of rural development strategies

Directory of Open Access Journals (Sweden)

Filepné Kovács Krisztina

2017-11-01

Full Text Available Research on ecosystem services and landscape functions are highly important in landscape ecology, landscape planning and open space design. The terms of ecosystem service and landscape function have been evolved parallel to each other in the scientific literature but have different focus. The term of landscape functions evolved from the scientific field of landscape ecology; it reflects the goods and services provided by regions, landscapes where the cultural, economic factors are important as well. As a framework assessment method with additional economic assessment, a landscape function analysis could be an additional tool of rural development, as it gives a complex analysis of multiple aspects, thus it is highly appropriate to explore, analyze the potentials, resources and limits of landscapes and land use systems. In the current research a landscape function analysis was compared with the rural development strategies in Hungarian micro-regions. We focused on the level of landscape functions and the objectives of the rural development strategies of the study areas. The local development strategies do not focus on territorial differences nor potentials evolving from natural, cultural resources or local constrains. The only exception is tourism development, where in some cases there is a holistic spatial approach which intends to develop the region as a whole.
Annotating Protein Functional Residues by Coupling High-Throughput Fitness Profile and Homologous-Structure Analysis

Directory of Open Access Journals (Sweden)

Yushen Du

2016-11-01

Full Text Available Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp, we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available.

MIPS: analysis and annotation of genome information in 2007.

Science.gov (United States)

Mewes, H W; Dietmann, S; Frishman, D; Gregory, R; Mannhaupt, G; Mayer, K F X; Münsterkötter, M; Ruepp, A; Spannagl, M; Stümpflen, V; Rattei, T

2008-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) combines automatic processing of large amounts of sequences with manual annotation of selected model genomes. Due to the massive growth of the available data, the depth of annotation varies widely between independent databases. Also, the criteria for the transfer of information from known to orthologous sequences are diverse. To cope with the task of global in-depth genome annotation has become unfeasible. Therefore, our efforts are dedicated to three levels of annotation: (i) the curation of selected genomes, in particular from fungal and plant taxa (e.g. CYGD, MNCDB, MatDB), (ii) the comprehensive, consistent, automatic annotation employing exhaustive methods for the computation of sequence similarities and sequence-related attributes as well as the classification of individual sequences (SIMAP, PEDANT and FunCat) and (iii) the compilation of manually curated databases for protein interactions based on scrutinized information from the literature to serve as an accepted set of reliable annotated interaction data (MPACT, MPPI, CORUM). All databases and tools described as well as the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).
Grass buffers for playas in agricultural landscapes: An annotated bibliography

Science.gov (United States)

Melcher, Cynthia P.; Skagen, Susan K.

2005-01-01

This bibliography and associated literature synthesis (Melcher and Skagen, 2005) was developed for the Playa Lakes Joint Venture (PLJV). The PLJV sought compilation and annotation of the literature on grass buffers for protecting playas from runoff containing sediments, nutrients, pesticides, and other contaminants. In addition, PLJV sought information regarding the extent to which buffers may attenuate the precipitation runoff needed to fill playas, and avian use of buffers. We emphasize grass buffers, but we also provide information on other buffer types.
Annotation of rule-based models with formal semantics to enable creation, analysis, reuse and visualization

Science.gov (United States)

Misirli, Goksel; Cavaliere, Matteo; Waites, William; Pocock, Matthew; Madsen, Curtis; Gilfellon, Owen; Honorato-Zimmer, Ricardo; Zuliani, Paolo; Danos, Vincent; Wipat, Anil

2016-01-01

Motivation: Biological systems are complex and challenging to model and therefore model reuse is highly desirable. To promote model reuse, models should include both information about the specifics of simulations and the underlying biology in the form of metadata. The availability of computationally tractable metadata is especially important for the effective automated interpretation and processing of models. Metadata are typically represented as machine-readable annotations which enhance programmatic access to information about models. Rule-based languages have emerged as a modelling framework to represent the complexity of biological systems. Annotation approaches have been widely used for reaction-based formalisms such as SBML. However, rule-based languages still lack a rich annotation framework to add semantic information, such as machine-readable descriptions, to the components of a model. Results: We present an annotation framework and guidelines for annotating rule-based models, encoded in the commonly used Kappa and BioNetGen languages. We adapt widely adopted annotation approaches to rule-based models. We initially propose a syntax to store machine-readable annotations and describe a mapping between rule-based modelling entities, such as agents and rules, and their annotations. We then describe an ontology to both annotate these models and capture the information contained therein, and demonstrate annotating these models using examples. Finally, we present a proof of concept tool for extracting annotations from a model that can be queried and analyzed in a uniform way. The uniform representation of the annotations can be used to facilitate the creation, analysis, reuse and visualization of rule-based models. Although examples are given, using specific implementations the proposed techniques can be applied to rule-based models in general. Availability and implementation: The annotation ontology for rule-based models can be found at http
Fuzzy Emotional Semantic Analysis and Automated Annotation of Scene Images

Directory of Open Access Journals (Sweden)

Jianfang Cao

2015-01-01

Full Text Available With the advances in electronic and imaging techniques, the production of digital images has rapidly increased, and the extraction and automated annotation of emotional semantics implied by images have become issues that must be urgently addressed. To better simulate human subjectivity and ambiguity for understanding scene images, the current study proposes an emotional semantic annotation method for scene images based on fuzzy set theory. A fuzzy membership degree was calculated to describe the emotional degree of a scene image and was implemented using the Adaboost algorithm and a back-propagation (BP neural network. The automated annotation method was trained and tested using scene images from the SUN Database. The annotation results were then compared with those based on artificial annotation. Our method showed an annotation accuracy rate of 91.2% for basic emotional values and 82.4% after extended emotional values were added, which correspond to increases of 5.5% and 8.9%, respectively, compared with the results from using a single BP neural network algorithm. Furthermore, the retrieval accuracy rate based on our method reached approximately 89%. This study attempts to lay a solid foundation for the automated emotional semantic annotation of more types of images and therefore is of practical significance.
Urban thermal landscape characterization and analysis

International Nuclear Information System (INIS)

Xue, Y; Fung, T; Tsou, J

2014-01-01

Urban warming is sensitive to the nature (thermal properties, including albedo, water content, heat capacity and thermal conductivity) and the placement (surface geometry or urban topography) of urban surface. In this research, the pattern and variation of urban surface temperature is regarded as one kind of landscape, urban thermal landscape, which is assumed as the presentation of local surface heating process upon urban landscape. The goal of this research is to develop a research framework incorporating geospatial statistics, thermal infrared remote sensing and landscape ecology to study the urban effect on local surface thermal landscape regarding both the pattern and process. This research chose Hong Kong as the case study. Within the study area, urban and rural area coexists upon a hilly topography. In order to probe the possibility of local surface warming mechanism discrepancy between urban and rural area, the sample points are grouped into urban and rural categories in according with the land use map taken into a linear regression model separately to examine the possible difference in local warming mechanism. Global regression analysis confirmed the relationship between environmental factors and surface temperature and the urban-rural distinctive mechanism of dominating diurnal surface warming is uncovered
Community annotation and bioinformatics workforce development in concert--Little Skate Genome Annotation Workshops and Jamborees.

Science.gov (United States)

Wang, Qinghua; Arighi, Cecilia N; King, Benjamin L; Polson, Shawn W; Vincent, James; Chen, Chuming; Huang, Hongzhan; Kingham, Brewster F; Page, Shallee T; Rendino, Marc Farnum; Thomas, William Kelley; Udwary, Daniel W; Wu, Cathy H

2012-01-01

Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome.
Community annotation and bioinformatics workforce development in concert—Little Skate Genome Annotation Workshops and Jamborees

Science.gov (United States)

Wang, Qinghua; Arighi, Cecilia N.; King, Benjamin L.; Polson, Shawn W.; Vincent, James; Chen, Chuming; Huang, Hongzhan; Kingham, Brewster F.; Page, Shallee T.; Farnum Rendino, Marc; Thomas, William Kelley; Udwary, Daniel W.; Wu, Cathy H.

2012-01-01

Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome. PMID:22434832
A landscape analysis plan

Science.gov (United States)

Nancy E. Fleenor

2002-01-01

A Landscape Analysis Plan (LAP) sets out broad guidelines for project development within boundaries of the Kings River Sustainable Forest Ecosystems Project. The plan must be a dynamic, living document, subject to change as new information arises over the course of this very long-term project (several decades). Two watersheds, each of 32,000 acres, were dedicated to...
MIPS: analysis and annotation of proteins from whole genomes.

Science.gov (United States)

Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

2004-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).
ComPath: comparative enzyme analysis and annotation in pathway/subsystem contexts

Directory of Open Access Journals (Sweden)

Kim Sun

2008-03-01

Full Text Available Abstract Background Once a new genome is sequenced, one of the important questions is to determine the presence and absence of biological pathways. Analysis of biological pathways in a genome is a complicated task since a number of biological entities are involved in pathways and biological pathways in different organisms are not identical. Computational pathway identification and analysis thus involves a number of computational tools and databases and typically done in comparison with pathways in other organisms. This computational requirement is much beyond the capability of biologists, so information systems for reconstructing, annotating, and analyzing biological pathways are much needed. We introduce a new comparative pathway analysis workbench, ComPath, which integrates various resources and computational tools using an interactive spreadsheet-style web interface for reliable pathway analyses. Results ComPath allows users to compare biological pathways in multiple genomes using a spreadsheet style web interface where various sequence-based analysis can be performed either to compare enzymes (e.g. sequence clustering and pathways (e.g. pathway hole identification, to search a genome for de novo prediction of enzymes, or to annotate a genome in comparison with reference genomes of choice. To fill in pathway holes or make de novo enzyme predictions, multiple computational methods such as FASTA, Whole-HMM, CSR-HMM (a method of our own introduced in this paper, and PDB-domain search are integrated in ComPath. Our experiments show that FASTA and CSR-HMM search methods generally outperform Whole-HMM and PDB-domain search methods in terms of sensitivity, but FASTA search performs poorly in terms of specificity, detecting more false positive as E-value cutoff increases. Overall, CSR-HMM search method performs best in terms of both sensitivity and specificity. Gene neighborhood and pathway neighborhood (global network visualization tools can be used
Solar Tutorial and Annotation Resource (STAR)

Science.gov (United States)

Showalter, C.; Rex, R.; Hurlburt, N. E.; Zita, E. J.

2009-12-01

We have written a software suite designed to facilitate solar data analysis by scientists, students, and the public, anticipating enormous datasets from future instruments. Our “STAR" suite includes an interactive learning section explaining 15 classes of solar events. Users learn software tools that exploit humans’ superior ability (over computers) to identify many events. Annotation tools include time slice generation to quantify loop oscillations, the interpolation of event shapes using natural cubic splines (for loops, sigmoids, and filaments) and closed cubic splines (for coronal holes). Learning these tools in an environment where examples are provided prepares new users to comfortably utilize annotation software with new data. Upon completion of our tutorial, users are presented with media of various solar events and asked to identify and annotate the images, to test their mastery of the system. Goals of the project include public input into the data analysis of very large datasets from future solar satellites, and increased public interest and knowledge about the Sun. In 2010, the Solar Dynamics Observatory (SDO) will be launched into orbit. SDO’s advancements in solar telescope technology will generate a terabyte per day of high-quality data, requiring innovation in data management. While major projects develop automated feature recognition software, so that computers can complete much of the initial event tagging and analysis, still, that software cannot annotate features such as sigmoids, coronal magnetic loops, coronal dimming, etc., due to large amounts of data concentrated in relatively small areas. Previously, solar physicists manually annotated these features, but with the imminent influx of data it is unrealistic to expect specialized researchers to examine every image that computers cannot fully process. A new approach is needed to efficiently process these data. Providing analysis tools and data access to students and the public have proven
Forest landscape analysis and design: a process for developing and implementing land management objectives for landscape patterns.

Science.gov (United States)

Nancy Diaz; Dean. Apostol

1992-01-01

This publication presents a Landscape Design and Analysis Process, along with some simple methods and tools for describing landscapes and their function. The information is qualitative in nature and highlights basic concepts, but does not address landscape ecology in great depth. Readers are encouraged to consult the list of selected references in Chapter 2 if they...
Pipeline to upgrade the genome annotations

Directory of Open Access Journals (Sweden)

Lijin K. Gopi

2017-12-01

Full Text Available Current era of functional genomics is enriched with good quality draft genomes and annotations for many thousands of species and varieties with the support of the advancements in the next generation sequencing technologies (NGS. Around 25,250 genomes, of the organisms from various kingdoms, are submitted in the NCBI genome resource till date. Each of these genomes was annotated using various tools and knowledge-bases that were available during the period of the annotation. It is obvious that these annotations will be improved if the same genome is annotated using improved tools and knowledge-bases. Here we present a new genome annotation pipeline, strengthened with various tools and knowledge-bases that are capable of producing better quality annotations from the consensus of the predictions from different tools. This resource also perform various additional annotations, apart from the usual gene predictions and functional annotations, which involve SSRs, novel repeats, paralogs, proteins with transmembrane helices, signal peptides etc. This new annotation resource is trained to evaluate and integrate all the predictions together to resolve the overlaps and ambiguities of the boundaries. One of the important highlights of this resource is the capability of predicting the phylogenetic relations of the repeats using the evolutionary trace analysis and orthologous gene clusters. We also present a case study, of the pipeline, in which we upgrade the genome annotation of Nelumbo nucifera (sacred lotus. It is demonstrated that this resource is capable of producing an improved annotation for a better understanding of the biology of various organisms.
Unified Sequence-Based Association Tests Allowing for Multiple Functional Annotations and Meta-analysis of Noncoding Variation in Metabochip Data.

Science.gov (United States)

He, Zihuai; Xu, Bin; Lee, Seunggeun; Ionita-Laza, Iuliana

2017-09-07

Substantial progress has been made in the functional annotation of genetic variation in the human genome. Integrative analysis that incorporates such functional annotations into sequencing studies can aid the discovery of disease-associated genetic variants, especially those with unknown function and located outside protein-coding regions. Direct incorporation of one functional annotation as weight in existing dispersion and burden tests can suffer substantial loss of power when the functional annotation is not predictive of the risk status of a variant. Here, we have developed unified tests that can utilize multiple functional annotations simultaneously for integrative association analysis with efficient computational techniques. We show that the proposed tests significantly improve power when variant risk status can be predicted by functional annotations. Importantly, when functional annotations are not predictive of risk status, the proposed tests incur only minimal loss of power in relation to existing dispersion and burden tests, and under certain circumstances they can even have improved power by learning a weight that better approximates the underlying disease model in a data-adaptive manner. The tests can be constructed with summary statistics of existing dispersion and burden tests for sequencing data, therefore allowing meta-analysis of multiple studies without sharing individual-level data. We applied the proposed tests to a meta-analysis of noncoding rare variants in Metabochip data on 12,281 individuals from eight studies for lipid traits. By incorporating the Eigen functional score, we detected significant associations between noncoding rare variants in SLC22A3 and low-density lipoprotein and total cholesterol, associations that are missed by standard dispersion and burden tests. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Estimating the annotation error rate of curated GO database sequence annotations

Directory of Open Access Journals (Sweden)

Brown Alfred L

2007-05-01

Full Text Available Abstract Background Annotations that describe the function of sequences are enormously important to researchers during laboratory investigations and when making computational inferences. However, there has been little investigation into the data quality of sequence function annotations. Here we have developed a new method of estimating the error rate of curated sequence annotations, and applied this to the Gene Ontology (GO sequence database (GOSeqLite. This method involved artificially adding errors to sequence annotations at known rates, and used regression to model the impact on the precision of annotations based on BLAST matched sequences. Results We estimated the error rate of curated GO sequence annotations in the GOSeqLite database (March 2006 at between 28% and 30%. Annotations made without use of sequence similarity based methods (non-ISS had an estimated error rate of between 13% and 18%. Annotations made with the use of sequence similarity methodology (ISS had an estimated error rate of 49%. Conclusion While the overall error rate is reasonably low, it would be prudent to treat all ISS annotations with caution. Electronic annotators that use ISS annotations as the basis of predictions are likely to have higher false prediction rates, and for this reason designers of these systems should consider avoiding ISS annotations where possible. Electronic annotators that use ISS annotations to make predictions should be viewed sceptically. We recommend that curators thoroughly review ISS annotations before accepting them as valid. Overall, users of curated sequence annotations from the GO database should feel assured that they are using a comparatively high quality source of information.
Extending in silico mechanism-of-action analysis by annotating targets with pathways: application to cellular cytotoxicity readouts.

Science.gov (United States)

Liggi, Sonia; Drakakis, Georgios; Koutsoukas, Alexios; Cortes-Ciriano, Isidro; Martínez-Alonso, Patricia; Malliavin, Thérèse E; Velazquez-Campoy, Adrian; Brewerton, Suzanne C; Bodkin, Michael J; Evans, David A; Glen, Robert C; Carrodeguas, José Alberto; Bender, Andreas

2014-01-01

An in silico mechanism-of-action analysis protocol was developed, comprising molecule bioactivity profiling, annotation of predicted targets with pathways and calculation of enrichment factors to highlight targets and pathways more likely to be implicated in the studied phenotype. The method was applied to a cytotoxicity phenotypic endpoint, with enriched targets/pathways found to be statistically significant when compared with 100 random datasets. Application on a smaller apoptotic set (10 molecules) did not allowed to obtain statistically relevant results, suggesting that the protocol requires modification such as analysis of the most frequently predicted targets/annotated pathways. Pathway annotations improved the mechanism-of-action information gained by target prediction alone, allowing a better interpretation of the predictions and providing better mapping of targets onto pathways.
[Applications of 2D and 3D landscape pattern indices in landscape pattern analysis of mountainous area at county level].

Science.gov (United States)

Lu, Chao; Qi, Wei; Li, Le; Sun, Yao; Qin, Tian-Tian; Wang, Na-Na

2012-05-01

Landscape pattern indices are the commonly used tools for the quantitative analysis of landscape pattern. However, the traditional 2D landscape pattern indices neglect the effects of terrain on landscape, existing definite limitations in quantitatively describing the landscape patterns in mountains areas. Taking the Qixia City, a typical mountainous and hilly region in Shandong Province of East China, as a case, this paper compared the differences between 2D and 3D landscape pattern indices in quantitatively describing the landscape patterns and their dynamic changes in mountainous areas. On the basis of terrain structure analysis, a set of landscape pattern indices were selected, including area and density (class area and mean patch size), edge and shape (edge density, landscape shape index, and fractal dimension of mean patch), diversity (Shannon's diversity index and evenness index) , and gathering and spread (contagion index). There existed obvious differences between the 3D class area, mean patch area, and edge density and the corresponding 2D indices, but no significant differences between the 3D landscape shape index, fractal dimension of mean patch, and Shannon' s diversity index and evenness index and the corresponding 2D indices. The 3D contagion index and 2D contagion index had no difference. Because the 3D landscape pattern indices were calculated by using patch surface area and surface perimeter whereas the 2D landscape pattern indices were calculated by adopting patch projective area and projective perimeter, the 3D landscape pattern indices could be relative accurate and efficient in describing the landscape area, density and borderline, in mountainous areas. However, there were no distinct differences in describing landscape shape, diversity, and gathering and spread between the 3D and 2D landscape pattern indices. Generally, by introducing 3D landscape pattern indices to topographic pattern, the description of landscape pattern and its dynamic
Multitemporal spatial pattern analysis of Tulum's tropical coastal landscape

Science.gov (United States)

Ramírez-Forero, Sandra Carolina; López-Caloca, Alejandra; Silván-Cárdenas, José Luis

2011-11-01

The tropical coastal landscape of Tulum in Quintana Roo, Mexico has a high ecological, economical, social and cultural value, it provides environmental and tourism services at global, national, regional and local levels. The landscape of the area is heterogeneous and presents random fragmentation patterns. In recent years, tourist services of the region has been increased promoting an accelerate expansion of hotels, transportation and recreation infrastructure altering the complex landscape. It is important to understand the environmental dynamics through temporal changes on the spatial patterns and to propose a better management of this ecological area to the authorities. This paper addresses a multi-temporal analysis of land cover changes from 1993 to 2000 in Tulum using Thematic Mapper data acquired by Landsat-5. Two independent methodologies were applied for the analysis of changes in the landscape and for the definition of fragmentation patterns. First, an Iteratively Multivariate Alteration Detection (IR-MAD) algorithm was used to detect and localize land cover change/no-change areas. Second, the post-classification change detection evaluated using the Support Vector Machine (SVM) algorithm. Landscape metrics were calculated from the results of IR-MAD and SVM. The analysis of the metrics indicated, among other things, a higher fragmentation pattern along roadways.
Annotating Protein Functional Residues by Coupling High-Throughput Fitness Profile and Homologous-Structure Analysis.

Science.gov (United States)

Du, Yushen; Wu, Nicholas C; Jiang, Lin; Zhang, Tianhao; Gong, Danyang; Shu, Sara; Wu, Ting-Ting; Sun, Ren

2016-11-01

Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp), we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available. To fully comprehend the diverse functions of a protein, it is essential to understand the functionality of individual residues. Current methods are highly dependent on evolutionary sequence conservation, which is
Annotation of two large contiguous regions from the Haemonchus contortus genome using RNA-seq and comparative analysis with Caenorhabditis elegans.

Directory of Open Access Journals (Sweden)

Roz Laing

Full Text Available The genomes of numerous parasitic nematodes are currently being sequenced, but their complexity and size, together with high levels of intra-specific sequence variation and a lack of reference genomes, makes their assembly and annotation a challenging task. Haemonchus contortus is an economically significant parasite of livestock that is widely used for basic research as well as for vaccine development and drug discovery. It is one of many medically and economically important parasites within the strongylid nematode group. This group of parasites has the closest phylogenetic relationship with the model organism Caenorhabditis elegans, making comparative analysis a potentially powerful tool for genome annotation and functional studies. To investigate this hypothesis, we sequenced two contiguous fragments from the H. contortus genome and undertook detailed annotation and comparative analysis with C. elegans. The adult H. contortus transcriptome was sequenced using an Illumina platform and RNA-seq was used to annotate a 409 kb overlapping BAC tiling path relating to the X chromosome and a 181 kb BAC insert relating to chromosome I. In total, 40 genes and 12 putative transposable elements were identified. 97.5% of the annotated genes had detectable homologues in C. elegans of which 60% had putative orthologues, significantly higher than previous analyses based on EST analysis. Gene density appears to be less in H. contortus than in C. elegans, with annotated H. contortus genes being an average of two-to-three times larger than their putative C. elegans orthologues due to a greater intron number and size. Synteny appears high but gene order is generally poorly conserved, although areas of conserved microsynteny are apparent. C. elegans operons appear to be partially conserved in H. contortus. Our findings suggest that a combination of RNA-seq and comparative analysis with C. elegans is a powerful approach for the annotation and analysis of strongylid

Enabling histopathological annotations on immunofluorescent images through virtualization of hematoxylin and eosin

Directory of Open Access Journals (Sweden)

Amal Lahiani

2018-01-01

Full Text Available Context: Medical diagnosis and clinical decisions rely heavily on the histopathological evaluation of tissue samples, especially in oncology. Historically, classical histopathology has been the gold standard for tissue evaluation and assessment by pathologists. The most widely and commonly used dyes in histopathology are hematoxylin and eosin (H&E as most malignancies diagnosis is largely based on this protocol. H&E staining has been used for more than a century to identify tissue characteristics and structures morphologies that are needed for tumor diagnosis. In many cases, as tissue is scarce in clinical studies, fluorescence imaging is necessary to allow staining of the same specimen with multiple biomarkers simultaneously. Since fluorescence imaging is a relatively new technology in the pathology landscape, histopathologists are not used to or trained in annotating or interpreting these images. Aims, Settings and Design: To allow pathologists to annotate these images without the need for additional training, we designed an algorithm for the conversion of fluorescence images to brightfield H&E images. Subjects and Methods: In this algorithm, we use fluorescent nuclei staining to reproduce the hematoxylin information and natural tissue autofluorescence to reproduce the eosin information avoiding the necessity to specifically stain the proteins or intracellular structures with an additional fluorescence stain. Statistical Analysis Used: Our method is based on optimizing a transform function from fluorescence to H&E images using least mean square optimization. Results: It results in high quality virtual H&E digital images that can easily and efficiently be analyzed by pathologists. We validated our results with pathologists by making them annotate tumor in real and virtual H&E whole slide images and we obtained promising results. Conclusions: Hence, we provide a solution that enables pathologists to assess tissue and annotate specific structures
The role of automated speech and audio analysis in semantic multimedia annotation

NARCIS (Netherlands)

de Jong, Franciska M.G.; Ordelman, Roeland J.F.; van Hessen, Adrianus J.

This paper overviews the various ways in which automatic speech and audio analysis can be deployed to enhance the semantic annotation of multimedia content, and as a consequence to improve the effectiveness of conceptual access tools. A number of techniques will be presented, including the alignment
Data for constructing insect genome content matrices for phylogenetic analysis and functional annotation

Directory of Open Access Journals (Sweden)

Jeffrey Rosenfeld

2016-03-01

Full Text Available Twenty one fully sequenced and well annotated insect genomes were used to construct genome content matrices for phylogenetic analysis and functional annotation of insect genomes. To examine the role of e-value cutoff in ortholog determination we used scaled e-value cutoffs and a single linkage clustering approach.. The present communication includes (1 a list of the genomes used to construct the genome content phylogenetic matrices, (2 a nexus file with the data matrices used in phylogenetic analysis, (3 a nexus file with the Newick trees generated by phylogenetic analysis, (4 an excel file listing the Core (CORE genes and Unique (UNI genes found in five insect groups, and (5 a figure showing a plot of consistency index (CI versus percent of unannotated genes that are apomorphies in the data set for gene losses and gains and bar plots of gains and losses for four consistency index (CI cutoffs.
Vital analysis: annotating sensed physiological signals with the stress levels of first responders in action.

Science.gov (United States)

Gomes, P; Kaiseler, M; Queirós, C; Oliveira, M; Lopes, B; Coimbra, M

2012-01-01

First responders such as firefighters are exposed to extreme stress and fatigue situations during their work routines. It is thus desirable to monitor their health using wearable sensing but this is a complex and still unsolved research challenge that requires large amounts of properly annotated physiological signals data. In this paper we show that the information gathered by our Vital Analysis Framework can support the annotation of these vital signals with the stress levels perceived by the target user, confirmed by the analysis of more than 4600 hours of data collected from real firefighters in action, including 717 answers to event questionnaires from a total of 454 different events.
DAVID Knowledgebase: a gene-centered database integrating heterogeneous gene annotation resources to facilitate high-throughput gene functional analysis

Directory of Open Access Journals (Sweden)

Baseler Michael W

2007-11-01

Full Text Available Abstract Background Due to the complex and distributed nature of biological research, our current biological knowledge is spread over many redundant annotation databases maintained by many independent groups. Analysts usually need to visit many of these bioinformatics databases in order to integrate comprehensive annotation information for their genes, which becomes one of the bottlenecks, particularly for the analytic task associated with a large gene list. Thus, a highly centralized and ready-to-use gene-annotation knowledgebase is in demand for high throughput gene functional analysis. Description The DAVID Knowledgebase is built around the DAVID Gene Concept, a single-linkage method to agglomerate tens of millions of gene/protein identifiers from a variety of public genomic resources into DAVID gene clusters. The grouping of such identifiers improves the cross-reference capability, particularly across NCBI and UniProt systems, enabling more than 40 publicly available functional annotation sources to be comprehensively integrated and centralized by the DAVID gene clusters. The simple, pair-wise, text format files which make up the DAVID Knowledgebase are freely downloadable for various data analysis uses. In addition, a well organized web interface allows users to query different types of heterogeneous annotations in a high-throughput manner. Conclusion The DAVID Knowledgebase is designed to facilitate high throughput gene functional analysis. For a given gene list, it not only provides the quick accessibility to a wide range of heterogeneous annotation data in a centralized location, but also enriches the level of biological information for an individual gene. Moreover, the entire DAVID Knowledgebase is freely downloadable or searchable at http://david.abcc.ncifcrf.gov/knowledgebase/.
MetaStorm: A Public Resource for Customizable Metagenomics Annotation.

Directory of Open Access Journals (Sweden)

Gustavo Arango-Argoty

Full Text Available Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/, which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution.
MetaStorm: A Public Resource for Customizable Metagenomics Annotation.

Science.gov (United States)

Arango-Argoty, Gustavo; Singh, Gargi; Heath, Lenwood S; Pruden, Amy; Xiao, Weidong; Zhang, Liqing

2016-01-01

Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/), which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution.
MetaStorm: A Public Resource for Customizable Metagenomics Annotation

Science.gov (United States)

Arango-Argoty, Gustavo; Singh, Gargi; Heath, Lenwood S.; Pruden, Amy; Xiao, Weidong; Zhang, Liqing

2016-01-01

Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/), which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution. PMID:27632579
GEOSPATIAL ANALYSIS OF ATMOSPHERIC HAZE EFFECT BY SOURCE AND SINK LANDSCAPE

Directory of Open Access Journals (Sweden)

T. Yu

2017-09-01

Full Text Available Based on geospatial analysis model, this paper analyzes the relationship between the landscape patterns of source and sink in urban areas and atmospheric haze pollution. Firstly, the classification result and aerosol optical thickness (AOD of Wuhan are divided into a number of square grids with the side length of 6 km, and the category level landscape indices (PLAND, PD, COHESION, LPI, FRAC_MN and AOD of each grid are calculated. Then the source and sink landscapes of atmospheric haze pollution are selected based on the analysis of the correlation between landscape indices and AOD. Next, to make the following analysis more efficient, the indices selected before should be determined through the correlation coefficient between them. Finally, due to the spatial dependency and spatial heterogeneity of the data used in this paper, spatial autoregressive model and geo-weighted regression model are used to analyze atmospheric haze effect by source and sink landscape from the global and local level. The results show that the source landscape of atmospheric haze pollution is the building, and the sink landscapes are shrub and woodland. PLAND, PD and COHESION are suitable for describing the atmospheric haze effect by source and sink landscape. Comparing these models, the fitting effect of SLM, SEM and GWR is significantly better than that of OLS model. The SLM model is superior to the SEM model in this paper. Although the fitting effect of GWR model is more unsuited than that of SLM, the influence degree of influencing factors on atmospheric haze of different geography can be expressed clearer. Through the analysis results of these models, following conclusions can be summarized: Reducing the proportion of source landscape area and increasing the degree of fragmentation could cut down aerosol optical thickness; And distributing the source and sink landscape evenly and interspersedly could effectively reduce aerosol optical thickness which represents
Geospatial Analysis of Atmospheric Haze Effect by Source and Sink Landscape

Science.gov (United States)

Yu, T.; Xu, K.; Yuan, Z.

2017-09-01

Based on geospatial analysis model, this paper analyzes the relationship between the landscape patterns of source and sink in urban areas and atmospheric haze pollution. Firstly, the classification result and aerosol optical thickness (AOD) of Wuhan are divided into a number of square grids with the side length of 6 km, and the category level landscape indices (PLAND, PD, COHESION, LPI, FRAC_MN) and AOD of each grid are calculated. Then the source and sink landscapes of atmospheric haze pollution are selected based on the analysis of the correlation between landscape indices and AOD. Next, to make the following analysis more efficient, the indices selected before should be determined through the correlation coefficient between them. Finally, due to the spatial dependency and spatial heterogeneity of the data used in this paper, spatial autoregressive model and geo-weighted regression model are used to analyze atmospheric haze effect by source and sink landscape from the global and local level. The results show that the source landscape of atmospheric haze pollution is the building, and the sink landscapes are shrub and woodland. PLAND, PD and COHESION are suitable for describing the atmospheric haze effect by source and sink landscape. Comparing these models, the fitting effect of SLM, SEM and GWR is significantly better than that of OLS model. The SLM model is superior to the SEM model in this paper. Although the fitting effect of GWR model is more unsuited than that of SLM, the influence degree of influencing factors on atmospheric haze of different geography can be expressed clearer. Through the analysis results of these models, following conclusions can be summarized: Reducing the proportion of source landscape area and increasing the degree of fragmentation could cut down aerosol optical thickness; And distributing the source and sink landscape evenly and interspersedly could effectively reduce aerosol optical thickness which represents atmospheric haze
Publishing Landscape Archaeology in the Digital World

Directory of Open Access Journals (Sweden)

Howry Jeffrey C.

2017-12-01

Full Text Available The challenge of presenting micro- and macro-scale scale data in landscape archaeology studies is facilitated by a diversity of GIS technologies. Specific to scholarly research is the need to selectively share certain types of data with collaborators and academic researchers while also publishing general information in the public domain. This article presents a general model for scholarly online collaboration and teaching while providing examples of the kinds of landscape archaeology that can be published online. Specifically illustrated is WorldMap, an interactive mapping platform based upon open-source software which uses browsers built to open source standards. The various features of this platform allow tight user viewing control, views with URL referencing, commenting and certification of layers, as well as user annotation. Illustration of WorldMap features and its value for scholarly research and teaching is provided in the context of landscape archaeology studies.
PANDORA: keyword-based analysis of protein sets by integration of annotation sources.

Science.gov (United States)

Kaplan, Noam; Vaaknin, Avishay; Linial, Michal

2003-10-01

Recent advances in high-throughput methods and the application of computational tools for automatic classification of proteins have made it possible to carry out large-scale proteomic analyses. Biological analysis and interpretation of sets of proteins is a time-consuming undertaking carried out manually by experts. We have developed PANDORA (Protein ANnotation Diagram ORiented Analysis), a web-based tool that provides an automatic representation of the biological knowledge associated with any set of proteins. PANDORA uses a unique approach of keyword-based graphical analysis that focuses on detecting subsets of proteins that share unique biological properties and the intersections of such sets. PANDORA currently supports SwissProt keywords, NCBI Taxonomy, InterPro entries and the hierarchical classification terms from ENZYME, SCOP and GO databases. The integrated study of several annotation sources simultaneously allows a representation of biological relations of structure, function, cellular location, taxonomy, domains and motifs. PANDORA is also integrated into the ProtoNet system, thus allowing testing thousands of automatically generated clusters. We illustrate how PANDORA enhances the biological understanding of large, non-uniform sets of proteins originating from experimental and computational sources, without the need for prior biological knowledge on individual proteins.
MitoBamAnnotator: A web-based tool for detecting and annotating heteroplasmy in human mitochondrial DNA sequences.

Science.gov (United States)

Zhidkov, Ilia; Nagar, Tal; Mishmar, Dan; Rubin, Eitan

2011-11-01

The use of Next-Generation Sequencing of mitochondrial DNA is becoming widespread in biological and clinical research. This, in turn, creates a need for a convenient tool that detects and analyzes heteroplasmy. Here we present MitoBamAnnotator, a user friendly web-based tool that allows maximum flexibility and control in heteroplasmy research. MitoBamAnnotator provides the user with a comprehensively annotated overview of mitochondrial genetic variation, allowing for an in-depth analysis with no prior knowledge in programming. Copyright Â© 2011 Elsevier B.V. and Mitochondria Research Society. All rights reserved. All rights reserved.
Analysis and Segmentation of Face Images using Point Annotations and Linear Subspace Techniques

DEFF Research Database (Denmark)

Stegmann, Mikkel Bille

2002-01-01

This report provides an analysis of 37 annotated frontal face images. All results presented have been obtained using our freely available Active Appearance Model (AAM) implementation. To ensure the reproducibility of the presented experiments, the data set has also been made available. As such...
Landscape metrics application in ecological and visual landscape assessment

Directory of Open Access Journals (Sweden)

Gavrilović Suzana

2017-01-01

Full Text Available The development of landscape-ecological approach application in spatial planning provides exact theoretical and empirical evidence for monitoring ecological consequences of natural and/or anthropogenic factors, particularly changes in spatial structures caused by them. Landscape pattern which feature diverse landscape values is the holder of the unique landscape character at different spatial levels and represents a perceptual domain for its users. Using the landscape metrics, the parameters of landscape composition and configuration are mathematical algorithms that quantify the specific spatial characteristics used for interpretation of landscape features and processes (physical and ecological aspect, as well as forms (visual aspect and the meaning (cognitive aspect of the landscape. Landscape metrics has been applied mostly in the ecological and biodiversity assessments as well as in the determination of the level of structural change of landscape, but more and more applied in the assessment of the visual character of the landscape. Based on a review of relevant literature, the aim of this work is to show the main trends of landscape metrics within the aspect of ecological and visual assessments. The research methodology is based on the analysis, classification and systematization of the research studies published from 2000 to 2016, where the landscape metrics is applied: (1 the analysis of landscape pattern and its changes, (2 the analysis of biodiversity and habitat function and (3 a visual landscape assessment. By selecting representative metric parameters for the landscape composition and configuration, for each category is formed the basis for further landscape metrics research and application for the integrated ecological and visual assessment of the landscape values. Contemporary conceptualization of the landscape is seen holistically, and the future research should be directed towards the development of integrated landscape assessment
The fast changing landscape of sequencing technologies and their impact on microbial genome assemblies and annotation.

Science.gov (United States)

Mavromatis, Konstantinos; Land, Miriam L; Brettin, Thomas S; Quest, Daniel J; Copeland, Alex; Clum, Alicia; Goodwin, Lynne; Woyke, Tanja; Lapidus, Alla; Klenk, Hans Peter; Cottingham, Robert W; Kyrpides, Nikos C

2012-01-01

The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation. In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis. These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).
Ubiquitous Annotation Systems

DEFF Research Database (Denmark)

Hansen, Frank Allan

2006-01-01

Ubiquitous annotation systems allow users to annotate physical places, objects, and persons with digital information. Especially in the field of location based information systems much work has been done to implement adaptive and context-aware systems, but few efforts have focused on the general...... requirements for linking information to objects in both physical and digital space. This paper surveys annotation techniques from open hypermedia systems, Web based annotation systems, and mobile and augmented reality systems to illustrate different approaches to four central challenges ubiquitous annotation...... systems have to deal with: anchoring, structuring, presentation, and authoring. Through a number of examples each challenge is discussed and HyCon, a context-aware hypermedia framework developed at the University of Aarhus, Denmark, is used to illustrate an integrated approach to ubiquitous annotations...
Current and future trends in marine image annotation software

Science.gov (United States)

Gomes-Pereira, Jose Nuno; Auger, Vincent; Beisiegel, Kolja; Benjamin, Robert; Bergmann, Melanie; Bowden, David; Buhl-Mortensen, Pal; De Leo, Fabio C.; Dionísio, Gisela; Durden, Jennifer M.; Edwards, Luke; Friedman, Ariell; Greinert, Jens; Jacobsen-Stout, Nancy; Lerner, Steve; Leslie, Murray; Nattkemper, Tim W.; Sameoto, Jessica A.; Schoening, Timm; Schouten, Ronald; Seager, James; Singh, Hanumant; Soubigou, Olivier; Tojeira, Inês; van den Beld, Inge; Dias, Frederico; Tempera, Fernando; Santos, Ricardo S.

2016-12-01

Given the need to describe, analyze and index large quantities of marine imagery data for exploration and monitoring activities, a range of specialized image annotation tools have been developed worldwide. Image annotation - the process of transposing objects or events represented in a video or still image to the semantic level, may involve human interactions and computer-assisted solutions. Marine image annotation software (MIAS) have enabled over 500 publications to date. We review the functioning, application trends and developments, by comparing general and advanced features of 23 different tools utilized in underwater image analysis. MIAS requiring human input are basically a graphical user interface, with a video player or image browser that recognizes a specific time code or image code, allowing to log events in a time-stamped (and/or geo-referenced) manner. MIAS differ from similar software by the capability of integrating data associated to video collection, the most simple being the position coordinates of the video recording platform. MIAS have three main characteristics: annotating events in real time, posteriorly to annotation and interact with a database. These range from simple annotation interfaces, to full onboard data management systems, with a variety of toolboxes. Advanced packages allow to input and display data from multiple sensors or multiple annotators via intranet or internet. Posterior human-mediated annotation often include tools for data display and image analysis, e.g. length, area, image segmentation, point count; and in a few cases the possibility of browsing and editing previous dive logs or to analyze the annotations. The interaction with a database allows the automatic integration of annotations from different surveys, repeated annotation and collaborative annotation of shared datasets, browsing and querying of data. Progress in the field of automated annotation is mostly in post processing, for stable platforms or still images
BisQue: cloud-based system for management, annotation, visualization, analysis and data mining of underwater and remote sensing imagery

Science.gov (United States)

Fedorov, D.; Miller, R. J.; Kvilekval, K. G.; Doheny, B.; Sampson, S.; Manjunath, B. S.

2016-02-01

Logistical and financial limitations of underwater operations are inherent in marine science, including biodiversity observation. Imagery is a promising way to address these challenges, but the diversity of organisms thwarts simple automated analysis. Recent developments in computer vision methods, such as convolutional neural networks (CNN), are promising for automated classification and detection tasks but are typically very computationally expensive and require extensive training on large datasets. Therefore, managing and connecting distributed computation, large storage and human annotations of diverse marine datasets is crucial for effective application of these methods. BisQue is a cloud-based system for management, annotation, visualization, analysis and data mining of underwater and remote sensing imagery and associated data. Designed to hide the complexity of distributed storage, large computational clusters, diversity of data formats and inhomogeneous computational environments behind a user friendly web-based interface, BisQue is built around an idea of flexible and hierarchical annotations defined by the user. Such textual and graphical annotations can describe captured attributes and the relationships between data elements. Annotations are powerful enough to describe cells in fluorescent 4D images, fish species in underwater videos and kelp beds in aerial imagery. Presently we are developing BisQue-based analysis modules for automated identification of benthic marine organisms. Recent experiments with drop-out and CNN based classification of several thousand annotated underwater images demonstrated an overall accuracy above 70% for the 15 best performing species and above 85% for the top 5 species. Based on these promising results, we have extended bisque with a CNN-based classification system allowing continuous training on user-provided data.
Spatial Characterization of Landscapes through Multifractal Analysis of DEM

Directory of Open Access Journals (Sweden)

P. L. Aguado

2014-01-01

Full Text Available Landscape evolution is driven by abiotic, biotic, and anthropic factors. The interactions among these factors and their influence at different scales create a complex dynamic. Landscapes have been shown to exhibit numerous scaling laws, from Horton’s laws to more sophisticated scaling of heights in topography and river network topology. This scaling and multiscaling analysis has the potential to characterise the landscape in terms of the statistical signature of the measure selected. The study zone is a matrix obtained from a digital elevation model (DEM (map 10 × 10 m, and height 1 m that corresponds to homogeneous region with respect to soil characteristics and climatology known as “Monte El Pardo” although the water level of a reservoir and the topography play a main role on its organization and evolution. We have investigated whether the multifractal analysis of a DEM shows common features that can be used to reveal the underlying patterns and information associated with the landscape of the DEM mapping and studied the influence of the water level of the reservoir on the applied analysis. The results show that the use of the multifractal approach with mean absolute gradient data is a useful tool for analysing the topography represented by the DEM.

Annotating long intergenic non-coding RNAs under artificial selection during chicken domestication.

Science.gov (United States)

Wang, Yun-Mei; Xu, Hai-Bo; Wang, Ming-Shan; Otecko, Newton Otieno; Ye, Ling-Qun; Wu, Dong-Dong; Zhang, Ya-Ping

2017-08-15

Numerous biological functions of long intergenic non-coding RNAs (lincRNAs) have been identified. However, the contribution of lincRNAs to the domestication process has remained elusive. Following domestication from their wild ancestors, animals display substantial changes in many phenotypic traits. Therefore, it is possible that diverse molecular drivers play important roles in this process. We analyzed 821 transcriptomes in this study and annotated 4754 lincRNA genes in the chicken genome. Our population genomic analysis indicates that 419 lincRNAs potentially evolved during artificial selection related to the domestication of chicken, while a comparative transcriptomic analysis identified 68 lincRNAs that were differentially expressed under different conditions. We also found 47 lincRNAs linked to special phenotypes. Our study provides a comprehensive view of the genome-wide landscape of lincRNAs in chicken. This will promote a better understanding of the roles of lincRNAs in domestication, and the genetic mechanisms associated with the artificial selection of domestic animals.
Genome-wide Annotation, Identification, and Global Transcriptomic Analysis of Regulatory or Small RNA Gene Expression in Staphylococcus aureus.

Science.gov (United States)

Carroll, Ronan K; Weiss, Andy; Broach, William H; Wiemels, Richard E; Mogen, Austin B; Rice, Kelly C; Shaw, Lindsey N

2016-02-09

In Staphylococcus aureus, hundreds of small regulatory or small RNAs (sRNAs) have been identified, yet this class of molecule remains poorly understood and severely understudied. sRNA genes are typically absent from genome annotation files, and as a consequence, their existence is often overlooked, particularly in global transcriptomic studies. To facilitate improved detection and analysis of sRNAs in S. aureus, we generated updated GenBank files for three commonly used S. aureus strains (MRSA252, NCTC 8325, and USA300), in which we added annotations for >260 previously identified sRNAs. These files, the first to include genome-wide annotation of sRNAs in S. aureus, were then used as a foundation to identify novel sRNAs in the community-associated methicillin-resistant strain USA300. This analysis led to the discovery of 39 previously unidentified sRNAs. Investigating the genomic loci of the newly identified sRNAs revealed a surprising degree of inconsistency in genome annotation in S. aureus, which may be hindering the analysis and functional exploration of these elements. Finally, using our newly created annotation files as a reference, we perform a global analysis of sRNA gene expression in S. aureus and demonstrate that the newly identified tsr25 is the most highly upregulated sRNA in human serum. This study provides an invaluable resource to the S. aureus research community in the form of our newly generated annotation files, while at the same time presenting the first examination of differential sRNA expression in pathophysiologically relevant conditions. Despite a large number of studies identifying regulatory or small RNA (sRNA) genes in Staphylococcus aureus, their annotation is notably lacking in available genome files. In addition to this, there has been a considerable lack of cross-referencing in the wealth of studies identifying these elements, often leading to the same sRNA being identified multiple times and bearing multiple names. In this work
An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets.

Science.gov (United States)

Hosseini, Parsa; Tremblay, Arianne; Matthews, Benjamin F; Alkharouf, Nadim W

2010-07-02

The data produced by an Illumina flow cell with all eight lanes occupied, produces well over a terabyte worth of images with gigabytes of reads following sequence alignment. The ability to translate such reads into meaningful annotation is therefore of great concern and importance. Very easily, one can get flooded with such a great volume of textual, unannotated data irrespective of read quality or size. CASAVA, a optional analysis tool for Illumina sequencing experiments, enables the ability to understand INDEL detection, SNP information, and allele calling. To not only extract from such analysis, a measure of gene expression in the form of tag-counts, but furthermore to annotate such reads is therefore of significant value. We developed TASE (Tag counting and Analysis of Solexa Experiments), a rapid tag-counting and annotation software tool specifically designed for Illumina CASAVA sequencing datasets. Developed in Java and deployed using jTDS JDBC driver and a SQL Server backend, TASE provides an extremely fast means of calculating gene expression through tag-counts while annotating sequenced reads with the gene's presumed function, from any given CASAVA-build. Such a build is generated for both DNA and RNA sequencing. Analysis is broken into two distinct components: DNA sequence or read concatenation, followed by tag-counting and annotation. The end result produces output containing the homology-based functional annotation and respective gene expression measure signifying how many times sequenced reads were found within the genomic ranges of functional annotations. TASE is a powerful tool to facilitate the process of annotating a given Illumina Solexa sequencing dataset. Our results indicate that both homology-based annotation and tag-count analysis are achieved in very efficient times, providing researchers to delve deep in a given CASAVA-build and maximize information extraction from a sequencing dataset. TASE is specially designed to translate sequence data
Landscape Ecology

DEFF Research Database (Denmark)

Christensen, Andreas Aagaard; Brandt, Jesper; Svenningsen, Stig Roar

2017-01-01

Landscape ecology is an interdisciplinary field of research and practice that deals with the mutual association between the spatial configuration and ecological functioning of landscapes, exploring and describing processes involved in the differentiation of spaces within landscapes......, and the ecological significance of the patterns which are generated by such processes. In landscape ecology, perspectives drawn from existing academic disciplines are integrated based on a common, spatially explicit mode of analysis developed from classical holistic geography, emphasizing spatial and landscape...... pattern analysis and ecological interaction of land units. The landscape is seen as a holon: an assemblage of interrelated phenomena, both cultural and biophysical, that together form a complex whole. Enduring challenges to landscape ecology include the need to develop a systematic approach able...
BEACON: automated tool for Bacterial GEnome Annotation ComparisON

KAUST Repository

Kalkatawi, Manal M.

2015-08-18

Background Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). Results The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON’s utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27 %, while the number of genes without any function assignment is reduced. Conclusions We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/
BEACON: automated tool for Bacterial GEnome Annotation ComparisON.

Science.gov (United States)

Kalkatawi, Manal; Alam, Intikhab; Bajic, Vladimir B

2015-08-18

Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON's utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27%, while the number of genes without any function assignment is reduced. We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/ .
Protein sequence annotation in the genome era: the annotation concept of SWISS-PROT+TREMBL.

Science.gov (United States)

Apweiler, R; Gateau, A; Contrino, S; Martin, M J; Junker, V; O'Donovan, C; Lang, F; Mitaritonna, N; Kappus, S; Bairoch, A

1997-01-01

SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotation, a minimal level of redundancy and high level of integration with other databases. Ongoing genome sequencing projects have dramatically increased the number of protein sequences to be incorporated into SWISS-PROT. Since we do not want to dilute the quality standards of SWISS-PROT by incorporating sequences without proper sequence analysis and annotation, we cannot speed up the incorporation of new incoming data indefinitely. However, as we also want to make the sequences available as fast as possible, we introduced TREMBL (TRanslation of EMBL nucleotide sequence database), a supplement to SWISS-PROT. TREMBL consists of computer-annotated entries in SWISS-PROT format derived from the translation of all coding sequences (CDS) in the EMBL nucleotide sequence database, except for CDS already included in SWISS-PROT. While TREMBL is already of immense value, its computer-generated annotation does not match the quality of SWISS-PROTs. The main difference is in the protein functional information attached to sequences. With this in mind, we are dedicating substantial effort to develop and apply computer methods to enhance the functional information attached to TREMBL entries.
An Informally Annotated Bibliography of Sociolinguistics.

Science.gov (United States)

Tannen, Deborah

This annotated bibliography of sociolinguistics is divided into the following sections: speech events, ethnography of speaking and anthropological approaches to analysis of conversation; discourse analysis (including analysis of conversation and narrative), ethnomethodology and nonverbal communication; sociolinguistics; pragmatics (including…
The GATO gene annotation tool for research laboratories

Directory of Open Access Journals (Sweden)

A. Fujita

2005-11-01

Full Text Available Large-scale genome projects have generated a rapidly increasing number of DNA sequences. Therefore, development of computational methods to rapidly analyze these sequences is essential for progress in genomic research. Here we present an automatic annotation system for preliminary analysis of DNA sequences. The gene annotation tool (GATO is a Bioinformatics pipeline designed to facilitate routine functional annotation and easy access to annotated genes. It was designed in view of the frequent need of genomic researchers to access data pertaining to a common set of genes. In the GATO system, annotation is generated by querying some of the Web-accessible resources and the information is stored in a local database, which keeps a record of all previous annotation results. GATO may be accessed from everywhere through the internet or may be run locally if a large number of sequences are going to be annotated. It is implemented in PHP and Perl and may be run on any suitable Web server. Usually, installation and application of annotation systems require experience and are time consuming, but GATO is simple and practical, allowing anyone with basic skills in informatics to access it without any special training. GATO can be downloaded at [http://mariwork.iq.usp.br/gato/]. Minimum computer free space required is 2 MB.
Analysis of sea use landscape pattern based on GIS: a case study in Huludao, China.

Science.gov (United States)

Suo, Anning; Wang, Chen; Zhang, Minghui

2016-01-01

This study aims to analyse sea use landscape patterns on a regional scale based on methods of landscape ecology integrated with sea use spatial characteristics. Several landscape-level analysis indices, such as the dominance index, complex index, intensivity index, diversity index and sea congruency index, were established using Geographic Information System (GIS) and applied in Huludao, China. The results indicated that sea use landscape analysis indices, which were created based on the characteristics of sea use spatial patterns using GIS, are suitable to quantitatively describe the landscape patterns of sea use. They are operable tools for the landscape analysis of sea use. The sea use landscape in Huludao was dominated by fishing use with a landscape dominance index of 0.724. The sea use landscape is a complex mosaic with high diversity and plenty of fishing areas, as shown by the landscape complex index of 27.21 and the landscape diversity index of 1.25. Most sea use patches correspond to the marine functional zonation plan and the sea use congruency index is 0.89 in the fishing zone and 0.92 in the transportation zone.
Jannovar: a java library for exome annotation.

Science.gov (United States)

Jäger, Marten; Wang, Kai; Bauer, Sebastian; Smedley, Damian; Krawitz, Peter; Robinson, Peter N

2014-05-01

Transcript-based annotation and pedigree analysis are two basic steps in the computational analysis of whole-exome sequencing experiments in genetic diagnostics and disease-gene discovery projects. Here, we present Jannovar, a stand-alone Java application as well as a Java library designed to be used in larger software frameworks for exome and genome analysis. Jannovar uses an interval tree to identify all transcripts affected by a given variant, and provides Human Genome Variation Society-compliant annotations both for variants affecting coding sequences and splice junctions as well as untranslated regions and noncoding RNA transcripts. Jannovar can also perform family-based pedigree analysis with Variant Call Format (VCF) files with data from members of a family segregating a Mendelian disorder. Using a desktop computer, Jannovar requires a few seconds to annotate a typical VCF file with exome data. Jannovar is freely available under the BSD2 license. Source code as well as the Java application and library file can be downloaded from http://compbio.charite.de (with tutorial) and https://github.com/charite/jannovar. © 2014 WILEY PERIODICALS, INC.
GSV Annotated Bibliography

Energy Technology Data Exchange (ETDEWEB)

Roberts, Randy S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Pope, Paul A. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Jiang, Ming [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Trucano, Timothy G. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Aragon, Cecilia R. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Ni, Kevin [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Wei, Thomas [Argonne National Lab. (ANL), Argonne, IL (United States); Chilton, Lawrence K. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Bakel, Alan [Argonne National Lab. (ANL), Argonne, IL (United States)

2011-06-14

The following annotated bibliography was developed as part of the Geospatial Algorithm Veri cation and Validation (GSV) project for the Simulation, Algorithms and Modeling program of NA-22. Veri cation and Validation of geospatial image analysis algorithms covers a wide range of technologies. Papers in the bibliography are thus organized into the following ve topic areas: Image processing and analysis, usability and validation of geospatial image analysis algorithms, image distance measures, scene modeling and image rendering, and transportation simulation models.
Vital analysis: field validation of a framework for annotating biological signals of first responders in action.

Science.gov (United States)

Gomes, P; Lopes, B; Coimbra, M

2012-01-01

First responders are professionals that are exposed to extreme stress and fatigue during extended periods of time. That is why it is necessary to research and develop technological solutions based on wearable sensors that can continuously monitor the health of these professionals in action, namely their stress and fatigue levels. In this paper we present the Vital Analysis smartphone-based framework, integrated into the broader Vital Responder project, that allows the annotation and contextualization of the signals collected during real action. After a contextual study we have implemented and deployed this framework in a firefighter team with 5 elements, from where we have collected over 3300 hours of annotations during 174 days, covering 382 different events. Results are analysed and discussed, validating the framework as a useful and usable tool for annotating biological signals of first responders in action.
Annotated chemical patent corpus: a gold standard for text mining.

Directory of Open Access Journals (Sweden)

Saber A Akhondi

Full Text Available Exploring the chemical and biological space covered by patent applications is crucial in early-stage medicinal chemistry activities. Patent analysis can provide understanding of compound prior art, novelty checking, validation of biological assays, and identification of new starting points for chemical exploration. Extracting chemical and biological entities from patents through manual extraction by expert curators can take substantial amount of time and resources. Text mining methods can help to ease this process. To validate the performance of such methods, a manually annotated patent corpus is essential. In this study we have produced a large gold standard chemical patent corpus. We developed annotation guidelines and selected 200 full patents from the World Intellectual Property Organization, United States Patent and Trademark Office, and European Patent Office. The patents were pre-annotated automatically and made available to four independent annotator groups each consisting of two to ten annotators. The annotators marked chemicals in different subclasses, diseases, targets, and modes of action. Spelling mistakes and spurious line break due to optical character recognition errors were also annotated. A subset of 47 patents was annotated by at least three annotator groups, from which harmonized annotations and inter-annotator agreement scores were derived. One group annotated the full set. The patent corpus includes 400,125 annotations for the full set and 36,537 annotations for the harmonized set. All patents and annotated entities are publicly available at www.biosemantics.org.
Towards Viral Genome Annotation Standards, Report from the 2010 NCBI Annotation Workshop.

Science.gov (United States)

Brister, James Rodney; Bao, Yiming; Kuiken, Carla; Lefkowitz, Elliot J; Le Mercier, Philippe; Leplae, Raphael; Madupu, Ramana; Scheuermann, Richard H; Schobel, Seth; Seto, Donald; Shrivastava, Susmita; Sterk, Peter; Zeng, Qiandong; Klimke, William; Tatusova, Tatiana

2010-10-01

Improvements in DNA sequencing technologies portend a new era in virology and could possibly lead to a giant leap in our understanding of viral evolution and ecology. Yet, as viral genome sequences begin to fill the world's biological databases, it is critically important to recognize that the scientific promise of this era is dependent on consistent and comprehensive genome annotation. With this in mind, the NCBI Genome Annotation Workshop recently hosted a study group tasked with developing sequence, function, and metadata annotation standards for viral genomes. This report describes the issues involved in viral genome annotation and reviews policy recommendations presented at the NCBI Annotation Workshop.
Towards Viral Genome Annotation Standards, Report from the 2010 NCBI Annotation Workshop

Directory of Open Access Journals (Sweden)

Qiandong Zeng

2010-10-01

Full Text Available Improvements in DNA sequencing technologies portend a new era in virology and could possibly lead to a giant leap in our understanding of viral evolution and ecology. Yet, as viral genome sequences begin to fill the world’s biological databases, it is critically important to recognize that the scientific promise of this era is dependent on consistent and comprehensive genome annotation. With this in mind, the NCBI Genome Annotation Workshop recently hosted a study group tasked with developing sequence, function, and metadata annotation standards for viral genomes. This report describes the issues involved in viral genome annotation and reviews policy recommendations presented at the NCBI Annotation Workshop.
Laughter annotations in conversational speech corpora - possibilities and limitations for phonetic analysis

NARCIS (Netherlands)

Truong, Khiet Phuong; Trouvain, Jürgen

Existing laughter annotations provided with several publicly available conversational speech corpora (both multiparty and dyadic conversations) were investigated and compared. We discuss the possibilities and limitations of these rather coarse and shallow laughter annotations. There are definition
MicroScope: a platform for microbial genome annotation and comparative genomics.

Science.gov (United States)

Vallenet, D; Engelen, S; Mornico, D; Cruveiller, S; Fleury, L; Lajus, A; Rouy, Z; Roche, D; Salvignol, G; Scarpelli, C; Médigue, C

2009-01-01

The initial outcome of genome sequencing is the creation of long text strings written in a four letter alphabet. The role of in silico sequence analysis is to assist biologists in the act of associating biological knowledge with these sequences, allowing investigators to make inferences and predictions that can be tested experimentally. A wide variety of software is available to the scientific community, and can be used to identify genomic objects, before predicting their biological functions. However, only a limited number of biologically interesting features can be revealed from an isolated sequence. Comparative genomics tools, on the other hand, by bringing together the information contained in numerous genomes simultaneously, allow annotators to make inferences based on the idea that evolution and natural selection are central to the definition of all biological processes. We have developed the MicroScope platform in order to offer a web-based framework for the systematic and efficient revision of microbial genome annotation and comparative analysis (http://www.genoscope.cns.fr/agc/microscope). Starting with the description of the flow chart of the annotation processes implemented in the MicroScope pipeline, and the development of traditional and novel microbial annotation and comparative analysis tools, this article emphasizes the essential role of expert annotation as a complement of automatic annotation. Several examples illustrate the use of implemented tools for the review and curation of annotations of both new and publicly available microbial genomes within MicroScope's rich integrated genome framework. The platform is used as a viewer in order to browse updated annotation information of available microbial genomes (more than 440 organisms to date), and in the context of new annotation projects (117 bacterial genomes). The human expertise gathered in the MicroScope database (about 280,000 independent annotations) contributes to improve the quality of
A framework for annotating human genome in disease context.

Science.gov (United States)

Xu, Wei; Wang, Huisong; Cheng, Wenqing; Fu, Dong; Xia, Tian; Kibbe, Warren A; Lin, Simon M

2012-01-01

Identification of gene-disease association is crucial to understanding disease mechanism. A rapid increase in biomedical literatures, led by advances of genome-scale technologies, poses challenge for manually-curated-based annotation databases to characterize gene-disease associations effectively and timely. We propose an automatic method-The Disease Ontology Annotation Framework (DOAF) to provide a comprehensive annotation of the human genome using the computable Disease Ontology (DO), the NCBO Annotator service and NCBI Gene Reference Into Function (GeneRIF). DOAF can keep the resulting knowledgebase current by periodically executing automatic pipeline to re-annotate the human genome using the latest DO and GeneRIF releases at any frequency such as daily or monthly. Further, DOAF provides a computable and programmable environment which enables large-scale and integrative analysis by working with external analytic software or online service platforms. A user-friendly web interface (doa.nubic.northwestern.edu) is implemented to allow users to efficiently query, download, and view disease annotations and the underlying evidences.
Semi-Semantic Annotation: A guideline for the URDU.KON-TB treebank POS annotation

Directory of Open Access Journals (Sweden)

Qaiser ABBAS

2016-12-01

Full Text Available This work elaborates the semi-semantic part of speech annotation guidelines for the URDU.KON-TB treebank: an annotated corpus. A hierarchical annotation scheme was designed to label the part of speech and then applied on the corpus. This raw corpus was collected from the Urdu Wikipedia and the Jang newspaper and then annotated with the proposed semi-semantic part of speech labels. The corpus contains text of local & international news, social stories, sports, culture, finance, religion, traveling, etc. This exercise finally contributed a part of speech annotation to the URDU.KON-TB treebank. Twenty-two main part of speech categories are divided into subcategories, which conclude the morphological, and semantical information encoded in it. This article reports the annotation guidelines in major; however, it also briefs the development of the URDU.KON-TB treebank, which includes the raw corpus collection, designing & employment of annotation scheme and finally, its statistical evaluation and results. The guidelines presented as follows, will be useful for linguistic community to annotate the sentences not only for the national language Urdu but for the other indigenous languages like Punjab, Sindhi, Pashto, etc., as well.

An open annotation ontology for science on web 3.0.

Science.gov (United States)

Ciccarese, Paolo; Ocana, Marco; Garcia Castro, Leyla Jael; Das, Sudeshna; Clark, Tim

2011-05-17

There is currently a gap between the rich and expressive collection of published biomedical ontologies, and the natural language expression of biomedical papers consumed on a daily basis by scientific researchers. The purpose of this paper is to provide an open, shareable structure for dynamic integration of biomedical domain ontologies with the scientific document, in the form of an Annotation Ontology (AO), thus closing this gap and enabling application of formal biomedical ontologies directly to the literature as it emerges. Initial requirements for AO were elicited by analysis of integration needs between biomedical web communities, and of needs for representing and integrating results of biomedical text mining. Analysis of strengths and weaknesses of previous efforts in this area was also performed. A series of increasingly refined annotation tools were then developed along with a metadata model in OWL, and deployed for feedback and additional requirements the ontology to users at a major pharmaceutical company and a major academic center. Further requirements and critiques of the model were also elicited through discussions with many colleagues and incorporated into the work. This paper presents Annotation Ontology (AO), an open ontology in OWL-DL for annotating scientific documents on the web. AO supports both human and algorithmic content annotation. It enables "stand-off" or independent metadata anchored to specific positions in a web document by any one of several methods. In AO, the document may be annotated but is not required to be under update control of the annotator. AO contains a provenance model to support versioning, and a set model for specifying groups and containers of annotation. AO is freely available under open source license at http://purl.org/ao/, and extensive documentation including screencasts is available on AO's Google Code page: http://code.google.com/p/annotation-ontology/ . The Annotation Ontology meets critical requirements for
An Annotated Bibliography of Articles in the "Journal of Speech and Language Pathology-Applied Behavior Analysis"

Science.gov (United States)

Esch, Barbara E.; Forbes, Heather J.

2017-01-01

The open-source "Journal of Speech and Language Pathology-Applied Behavior Analysis" ("JSLP-ABA") was published online from 2006 to 2010. We present an annotated bibliography of 80 articles published in the now-defunct journal with the aim of representing its scholarly content to readers of "The Analysis of Verbal…
Random versus Deterministic Descent in RNA Energy Landscape Analysis

Directory of Open Access Journals (Sweden)

Luke Day

2016-01-01

Full Text Available Identifying sets of metastable conformations is a major research topic in RNA energy landscape analysis, and recently several methods have been proposed for finding local minima in landscapes spawned by RNA secondary structures. An important and time-critical component of such methods is steepest, or gradient, descent in attraction basins of local minima. We analyse the speed-up achievable by randomised descent in attraction basins in the context of large sample sets where the size has an order of magnitude in the region of ~106. While the gain for each individual sample might be marginal, the overall run-time improvement can be significant. Moreover, for the two nongradient methods we analysed for partial energy landscapes induced by ten different RNA sequences, we obtained that the number of observed local minima is on average larger by 7.3% and 3.5%, respectively. The run-time improvement is approximately 16.6% and 6.8% on average over the ten partial energy landscapes. For the large sample size we selected for descent procedures, the coverage of local minima is very high up to energy values of the region where the samples were randomly selected from the partial energy landscapes; that is, the difference to the total set of local minima is mainly due to the upper area of the energy landscapes.
Multi-scale Analysis of High Resolution Topography: Feature Extraction and Identification of Landscape Characteristic Scales

Science.gov (United States)

Passalacqua, P.; Sangireddy, H.; Stark, C. P.

2015-12-01

With the advent of digital terrain data, detailed information on terrain characteristics and on scale and location of geomorphic features is available over extended areas. Our ability to observe landscapes and quantify topographic patterns has greatly improved, including the estimation of fluxes of mass and energy across landscapes. Challenges still remain in the analysis of high resolution topography data; the presence of features such as roads, for example, challenges classic methods for feature extraction and large data volumes require computationally efficient extraction and analysis methods. Moreover, opportunities exist to define new robust metrics of landscape characterization for landscape comparison and model validation. In this presentation we cover recent research in multi-scale and objective analysis of high resolution topography data. We show how the analysis of the probability density function of topographic attributes such as slope, curvature, and topographic index contains useful information for feature localization and extraction. The analysis of how the distributions change across scales, quantified by the behavior of modal values and interquartile range, allows the identification of landscape characteristic scales, such as terrain roughness. The methods are introduced on synthetic signals in one and two dimensions and then applied to a variety of landscapes of different characteristics. Validation of the methods includes the analysis of modeled landscapes where the noise distribution is known and features of interest easily measured.
BioAnnote: a software platform for annotating biomedical documents with application in medical learning environments.

Science.gov (United States)

López-Fernández, H; Reboiro-Jato, M; Glez-Peña, D; Aparicio, F; Gachet, D; Buenaga, M; Fdez-Riverola, F

2013-07-01

Automatic term annotation from biomedical documents and external information linking are becoming a necessary prerequisite in modern computer-aided medical learning systems. In this context, this paper presents BioAnnote, a flexible and extensible open-source platform for automatically annotating biomedical resources. Apart from other valuable features, the software platform includes (i) a rich client enabling users to annotate multiple documents in a user friendly environment, (ii) an extensible and embeddable annotation meta-server allowing for the annotation of documents with local or remote vocabularies and (iii) a simple client/server protocol which facilitates the use of our meta-server from any other third-party application. In addition, BioAnnote implements a powerful scripting engine able to perform advanced batch annotations. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Propagating annotations of molecular networks using in silico fragmentation.

Science.gov (United States)

da Silva, Ricardo R; Wang, Mingxun; Nothias, Louis-Félix; van der Hooft, Justin J J; Caraballo-Rodríguez, Andrés Mauricio; Fox, Evan; Balunas, Marcy J; Klassen, Jonathan L; Lopes, Norberto Peporine; Dorrestein, Pieter C

2018-04-18

The annotation of small molecules is one of the most challenging and important steps in untargeted mass spectrometry analysis, as most of our biological interpretations rely on structural annotations. Molecular networking has emerged as a structured way to organize and mine data from untargeted tandem mass spectrometry (MS/MS) experiments and has been widely applied to propagate annotations. However, propagation is done through manual inspection of MS/MS spectra connected in the spectral networks and is only possible when a reference library spectrum is available. One of the alternative approaches used to annotate an unknown fragmentation mass spectrum is through the use of in silico predictions. One of the challenges of in silico annotation is the uncertainty around the correct structure among the predicted candidate lists. Here we show how molecular networking can be used to improve the accuracy of in silico predictions through propagation of structural annotations, even when there is no match to a MS/MS spectrum in spectral libraries. This is accomplished through creating a network consensus of re-ranked structural candidates using the molecular network topology and structural similarity to improve in silico annotations. The Network Annotation Propagation (NAP) tool is accessible through the GNPS web-platform https://gnps.ucsd.edu/ProteoSAFe/static/gnps-theoretical.jsp.
Chado controller: advanced annotation management with a community annotation system.

Science.gov (United States)

Guignon, Valentin; Droc, Gaëtan; Alaux, Michael; Baurens, Franc-Christophe; Garsmeur, Olivier; Poiron, Claire; Carver, Tim; Rouard, Mathieu; Bocs, Stéphanie

2012-04-01

We developed a controller that is compliant with the Chado database schema, GBrowse and genome annotation-editing tools such as Artemis and Apollo. It enables the management of public and private data, monitors manual annotation (with controlled vocabularies, structural and functional annotation controls) and stores versions of annotation for all modified features. The Chado controller uses PostgreSQL and Perl. The Chado Controller package is available for download at http://www.gnpannot.org/content/chado-controller and runs on any Unix-like operating system, and documentation is available at http://www.gnpannot.org/content/chado-controller-doc The system can be tested using the GNPAnnot Sandbox at http://www.gnpannot.org/content/gnpannot-sandbox-form valentin.guignon@cirad.fr; stephanie.sidibe-bocs@cirad.fr Supplementary data are available at Bioinformatics online.
Software for computing and annotating genomic ranges.

Science.gov (United States)

Lawrence, Michael; Huber, Wolfgang; Pagès, Hervé; Aboyoun, Patrick; Carlson, Marc; Gentleman, Robert; Morgan, Martin T; Carey, Vincent J

2013-01-01

We describe Bioconductor infrastructure for representing and computing on annotated genomic ranges and integrating genomic data with the statistical computing features of R and its extensions. At the core of the infrastructure are three packages: IRanges, GenomicRanges, and GenomicFeatures. These packages provide scalable data structures for representing annotated ranges on the genome, with special support for transcript structures, read alignments and coverage vectors. Computational facilities include efficient algorithms for overlap and nearest neighbor detection, coverage calculation and other range operations. This infrastructure directly supports more than 80 other Bioconductor packages, including those for sequence analysis, differential expression analysis and visualization.
Roadmap for annotating transposable elements in eukaryote genomes.

Science.gov (United States)

Permal, Emmanuelle; Flutre, Timothée; Quesneville, Hadi

2012-01-01

Current high-throughput techniques have made it feasible to sequence even the genomes of non-model organisms. However, the annotation process now represents a bottleneck to genome analysis, especially when dealing with transposable elements (TE). Combined approaches, using both de novo and knowledge-based methods to detect TEs, are likely to produce reasonably comprehensive and sensitive results. This chapter provides a roadmap for researchers involved in genome projects to address this issue. At each step of the TE annotation process, from the identification of TE families to the annotation of TE copies, we outline the tools and good practices to be used.
Comparative Analysis of Methodologies for Landscape Ecological Aesthetics in Urban Planning

Directory of Open Access Journals (Sweden)

Maija Jankevica

2012-05-01

Full Text Available Areas with high level of urbanisation provoke frequent conflicts between nature and people. There is a lack of cooperation between planners and nature scientists in urban studies and planning process. Landscapes usually are studied using the ecological and aesthetical approaches separately. However, the future of urban planning depends on integration of these two approaches. This research study looks into different methods of landscape ecological aesthetics and presents a combined method for urban areas. The methods of landscape visual aesthetical assessment, biotope structure analysis, landscape ecology evaluation and multi-disciplinary expert level are compared in the article. A comparison of obtained values is summarized by making a comparative matrix. As a result, a multi-stage model for landscape ecological aesthetics evaluation in urban territories is presented. This ecological aesthetics model can be successfully used for development of urban territories.
Cloning, analysis and functional annotation of expressed sequence tags from the Earthworm Eisenia fetida

Science.gov (United States)

Pirooznia, Mehdi; Gong, Ping; Guan, Xin; Inouye, Laura S; Yang, Kuan; Perkins, Edward J; Deng, Youping

2007-01-01

Background Eisenia fetida, commonly known as red wiggler or compost worm, belongs to the Lumbricidae family of the Annelida phylum. Little is known about its genome sequence although it has been extensively used as a test organism in terrestrial ecotoxicology. In order to understand its gene expression response to environmental contaminants, we cloned 4032 cDNAs or expressed sequence tags (ESTs) from two E. fetida libraries enriched with genes responsive to ten ordnance related compounds using suppressive subtractive hybridization-PCR. Results A total of 3144 good quality ESTs (GenBank dbEST accession number EH669363–EH672369 and EL515444–EL515580) were obtained from the raw clone sequences after cleaning. Clustering analysis yielded 2231 unique sequences including 448 contigs (from 1361 ESTs) and 1783 singletons. Comparative genomic analysis showed that 743 or 33% of the unique sequences shared high similarity with existing genes in the GenBank nr database. Provisional function annotation assigned 830 Gene Ontology terms to 517 unique sequences based on their homology with the annotated genomes of four model organisms Drosophila melanogaster, Mus musculus, Saccharomyces cerevisiae, and Caenorhabditis elegans. Seven percent of the unique sequences were further mapped to 99 Kyoto Encyclopedia of Genes and Genomes pathways based on their matching Enzyme Commission numbers. All the information is stored and retrievable at a highly performed, web-based and user-friendly relational database called EST model database or ESTMD version 2. Conclusion The ESTMD containing the sequence and annotation information of 4032 E. fetida ESTs is publicly accessible at . PMID:18047730
Analysis of Employment Flow of Landscape Architecture Graduates in Agricultural Universities

Science.gov (United States)

Yao, Xia; He, Linchun

2012-01-01

A statistical analysis of employment flow of landscape architecture graduates was conducted on the employment data of graduates major in landscape architecture in 2008 to 2011. The employment flow of graduates was to be admitted to graduate students, industrial direction and regional distribution, etc. Then, the features of talent flow and factors…
An analysis on the entity annotations in biological corpora [v1; ref status: indexed, http://f1000r.es/2o0

Directory of Open Access Journals (Sweden)

Mariana Neves

2014-04-01

Full Text Available Collection of documents annotated with semantic entities and relationships are crucial resources to support development and evaluation of text mining solutions for the biomedical domain. Here I present an overview of 36 corpora and show an analysis on the semantic annotations they contain. Annotations for entity types were classified into six semantic groups and an overview on the semantic entities which can be found in each corpus is shown. Results show that while some semantic entities, such as genes, proteins and chemicals are consistently annotated in many collections, corpora available for diseases, variations and mutations are still few, in spite of their importance in the biological domain.
Improving Microbial Genome Annotations in an Integrated Database Context

Science.gov (United States)

Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; Anderson, Iain; Mavromatis, Konstantinos; Kyrpides, Nikos C.; Ivanova, Natalia N.

2013-01-01

Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG) family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/. PMID:23424620
Improving microbial genome annotations in an integrated database context.

Directory of Open Access Journals (Sweden)

I-Min A Chen

Full Text Available Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/.
Evaluating management risks using landscape trajectory analysis: a case study of California fisher

Science.gov (United States)

Craig M. Thompson; William J. Zielinski; Kathryn L. Purcell

2011-01-01

Ecosystem management requires an understanding of how landscapes vary in space and time, how this variation can be affected by management decisions or stochastic events, and the potential consequences for species. Landscape trajectory analysis, coupled with a basic knowledge of species habitat selection, offers a straightforward approach to ecological risk analysis and...
Thermokarst and thaw-related landscape dynamics -- an annotated bibliography with an emphasis on potential effects on habitat and wildlife

Science.gov (United States)

Jones, Benjamin M.; Amundson, Courtney L.; Koch, Joshua C.; Grosse, Guido

2013-01-01

Permafrost has warmed throughout much of the Northern Hemisphere since the 1980s, with colder permafrost sites warming more rapidly (Romanovsky and others, 2010; Smith and others, 2010). Warming of the near-surface permafrost may lead to widespread terrain instability in ice-rich permafrost in the Arctic and the Subarctic, and may result in thermokarst development and other thaw-related landscape features (Jorgenson and others, 2006; Gooseff and others, 2009). Thermokarst and other thaw-related landscape features result from varying modes and scales of permafrost thaw, subsidence, and removal of material. An increase in active-layer depth, water accumulation on the soil surface, permafrost degradation and associated retreat of the permafrost table, and changes to lake shores and coastal bluffs act and interact to create thermokarst and other thaw-related landscape features (Shur and Osterkamp, 2007). There is increasing interest in the spatial and temporal dynamics of thermokarst and other thaw-related features from diverse disciplines including landscape ecology, hydrology, engineering, and biogeochemistry. Therefore, there is a need to synthesize and disseminate knowledge on the current state of near-surface permafrost terrain. The term "thermokarst" originated in the Russian literature, and its scientific use has varied substantially over time (Shur and Osterkamp, 2007). The modern definition of thermokarst refers to the process by which characteristic landforms result from the thawing of ice-rich permafrost or the melting of massive ice (van Everdingen, 1998), or, more specifically, the thawing of ice-rich permafrost and (or) melting of massive ice that result in consolidation and deformation of the soil surface and formation of specific forms of relief (Shur, 1988). Jorgenson (2013) identifies 23 distinct thermokarst and other thaw-related features in the Arctic, Subarctic, and Antarctic based primarily on differences in terrain condition, ground-ice volume
Extracting Cross-Ontology Weighted Association Rules from Gene Ontology Annotations.

Science.gov (United States)

Agapito, Giuseppe; Milano, Marianna; Guzzi, Pietro Hiram; Cannataro, Mario

2016-01-01

Gene Ontology (GO) is a structured repository of concepts (GO Terms) that are associated to one or more gene products through a process referred to as annotation. The analysis of annotated data is an important opportunity for bioinformatics. There are different approaches of analysis, among those, the use of association rules (AR) which provides useful knowledge, discovering biologically relevant associations between terms of GO, not previously known. In a previous work, we introduced GO-WAR (Gene Ontology-based Weighted Association Rules), a methodology for extracting weighted association rules from ontology-based annotated datasets. We here adapt the GO-WAR algorithm to mine cross-ontology association rules, i.e., rules that involve GO terms present in the three sub-ontologies of GO. We conduct a deep performance evaluation of GO-WAR by mining publicly available GO annotated datasets, showing how GO-WAR outperforms current state of the art approaches.
Software for computing and annotating genomic ranges.

Directory of Open Access Journals (Sweden)

Michael Lawrence

Full Text Available We describe Bioconductor infrastructure for representing and computing on annotated genomic ranges and integrating genomic data with the statistical computing features of R and its extensions. At the core of the infrastructure are three packages: IRanges, GenomicRanges, and GenomicFeatures. These packages provide scalable data structures for representing annotated ranges on the genome, with special support for transcript structures, read alignments and coverage vectors. Computational facilities include efficient algorithms for overlap and nearest neighbor detection, coverage calculation and other range operations. This infrastructure directly supports more than 80 other Bioconductor packages, including those for sequence analysis, differential expression analysis and visualization.
PCAS – a precomputed proteome annotation database resource

Directory of Open Access Journals (Sweden)

Luo Jingchu

2003-11-01

Full Text Available Abstract Background Many model proteomes or "complete" sets of proteins of given organisms are now publicly available. Much effort has been invested in computational annotation of those "draft" proteomes. Motif or domain based algorithms play a pivotal role in functional classification of proteins. Employing most available computational algorithms, mainly motif or domain recognition algorithms, we set up to develop an online proteome annotation system with integrated proteome annotation data to complement existing resources. Results We report here the development of PCAS (ProteinCentric Annotation System as an online resource of pre-computed proteome annotation data. We applied most available motif or domain databases and their analysis methods, including hmmpfam search of HMMs in Pfam, SMART and TIGRFAM, RPS-PSIBLAST search of PSSMs in CDD, pfscan of PROSITE patterns and profiles, as well as PSI-BLAST search of SUPERFAMILY PSSMs. In addition, signal peptide and TM are predicted using SignalP and TMHMM respectively. We mapped SUPERFAMILY and COGs to InterPro, so the motif or domain databases are integrated through InterPro. PCAS displays table summaries of pre-computed data and a graphical presentation of motifs or domains relative to the protein. As of now, PCAS contains human IPI, mouse IPI, and rat IPI, A. thaliana, C. elegans, D. melanogaster, S. cerevisiae, and S. pombe proteome. PCAS is available at http://pak.cbi.pku.edu.cn/proteome/gca.php Conclusion PCAS gives better annotation coverage for model proteomes by employing a wider collection of available algorithms. Besides presenting the most confident annotation data, PCAS also allows customized query so users can inspect statistically less significant boundary information as well. Therefore, besides providing general annotation information, PCAS could be used as a discovery platform. We plan to update PCAS twice a year. We will upgrade PCAS when new proteome annotation algorithms

Progress with modeling activity landscapes in drug discovery.

Science.gov (United States)

Vogt, Martin

2018-04-19

Activity landscapes (ALs) are representations and models of compound data sets annotated with a target-specific activity. In contrast to quantitative structure-activity relationship (QSAR) models, ALs aim at characterizing structure-activity relationships (SARs) on a large-scale level encompassing all active compounds for specific targets. The popularity of AL modeling has grown substantially with the public availability of large activity-annotated compound data sets. AL modeling crucially depends on molecular representations and similarity metrics used to assess structural similarity. Areas covered: The concepts of AL modeling are introduced and its basis in quantitatively assessing molecular similarity is discussed. The different types of AL modeling approaches are introduced. AL designs can broadly be divided into three categories: compound-pair based, dimensionality reduction, and network approaches. Recent developments for each of these categories are discussed focusing on the application of mathematical, statistical, and machine learning tools for AL modeling. AL modeling using chemical space networks is covered in more detail. Expert opinion: AL modeling has remained a largely descriptive approach for the analysis of SARs. Beyond mere visualization, the application of analytical tools from statistics, machine learning and network theory has aided in the sophistication of AL designs and provides a step forward in transforming ALs from descriptive to predictive tools. To this end, optimizing representations that encode activity relevant features of molecules might prove to be a crucial step.
A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites.

Directory of Open Access Journals (Sweden)

Lex Overmars

Full Text Available The identification of translation initiation sites (TISs constitutes an important aspect of sequence-based genome analysis. An erroneous TIS annotation can impair the identification of regulatory elements and N-terminal signal peptides, and also may flaw the determination of descent, for any particular gene. We have formulated a reference-free method to score the TIS annotation quality. The method is based on a comparison of the observed and expected distribution of all TISs in a particular genome given prior gene-calling. We have assessed the TIS annotations for all available NCBI RefSeq microbial genomes and found that approximately 87% is of appropriate quality, whereas 13% needs substantial improvement. We have analyzed a number of factors that could affect TIS annotation quality such as GC-content, taxonomy, the fraction of genes with a Shine-Dalgarno sequence and the year of publication. The analysis showed that only the first factor has a clear effect. We have then formulated a straightforward Principle Component Analysis-based TIS identification strategy to self-organize and score potential TISs. The strategy is independent of reference data and a priori calculations. A representative set of 277 genomes was subjected to the analysis and we found a clear increase in TIS annotation quality for the genomes with a low quality score. The PCA-based annotation was also compared with annotation with the current tool of reference, Prodigal. The comparison for the model genome of Escherichia coli K12 showed that both methods supplement each other and that prediction agreement can be used as an indicator of a correct TIS annotation. Importantly, the data suggest that the addition of a PCA-based strategy to a Prodigal prediction can be used to 'flag' TIS annotations for re-evaluation and in addition can be used to evaluate a given annotation in case a Prodigal annotation is lacking.
The Long Noncoding RNA Landscape of the Mouse Eye.

Science.gov (United States)

Chen, Weiwei; Yang, Shuai; Zhou, Zhonglou; Zhao, Xiaoting; Zhong, Jiayun; Reinach, Peter S; Yan, Dongsheng

2017-12-01

Long noncoding RNAs (lncRNAs) are important regulators of diverse biological functions. However, an extensive in-depth analysis of their expression profile and function in mammalian eyes is still lacking. Here we describe comprehensive landscapes of stage-dependent and tissue-specific lncRNA expression in the mouse eye. Affymetrix transcriptome array profiled lncRNA signatures from six different ocular tissue subsets (i.e., cornea, lens, retina, RPE, choroid, and sclera) in newborn and 8-week-old mice. Quantitative RT-PCR analysis validated array findings. Cis analyses and Gene Ontology (GO) annotation of protein-coding genes adjacent to signature lncRNA loci clarified potential lncRNA roles in maintaining tissue identity and regulating eye maturation during the aforementioned phase. In newborn and 8-week-old mice, we identified 47,332 protein-coding and noncoding gene transcripts. LncRNAs comprise 19,313 of these transcripts annotated in public data banks. During this maturation phase of these six different tissue subsets, more than 1000 lncRNAs expression levels underwent ≥2-fold changes. qRT-PCR analysis confirmed part of the gene microarray analysis results. K-means clustering identified 910 lncRNAs in the P0 groups and 686 lncRNAs in the postnatal 8-week-old groups, suggesting distinct tissue-specific lncRNA clusters. GO analysis of protein-coding genes proximal to lncRNA signatures resolved close correlations with their tissue-specific functional maturation between P0 and 8 weeks of age in the 6 tissue subsets. Characterizating maturational changes in lncRNA expression patterns as well as tissue-specific lncRNA signatures in six ocular tissues suggest important contributions made by lncRNA to the control of developmental processes in the mouse eye.
AUTHOR’S ANNOTATION AS A MANIFESTATION OF THE COMPOSER’S CREATIVE CONCEPTION

Directory of Open Access Journals (Sweden)

CIOBANU GHENADIE

2015-06-01

Full Text Available Annotation to his own musical works is considered by the author as a form of analysis of these opuses. Designed to provide answers about works, these comments facilitate the perception of contemporary music by performers and the audience. The composer examines various forms of annotations basing himself on their goals and the context of use, and compares them to other genres with informative function, such as the interview, analytical essay, memoirs, personal diary, etc. The article illustrated some possible forms of annotations. Besides a purely informative character of the annotation, the author notes in the conclusions the value of genuine professional analysis, providing a wide circle of listeners and experts with a brief exegetical approach to his musical works.
Online Metacognitive Strategies, Hypermedia Annotations, and Motivation on Hypertext Comprehension

Science.gov (United States)

Shang, Hui-Fang

2016-01-01

This study examined the effect of online metacognitive strategies, hypermedia annotations, and motivation on reading comprehension in a Taiwanese hypertext environment. A path analysis model was proposed based on the assumption that if English as a foreign language learners frequently use online metacognitive strategies and hypermedia annotations,…
MixtureTree annotator: a program for automatic colorization and visual annotation of MixtureTree.

Directory of Open Access Journals (Sweden)

Shu-Chuan Chen

Full Text Available The MixtureTree Annotator, written in JAVA, allows the user to automatically color any phylogenetic tree in Newick format generated from any phylogeny reconstruction program and output the Nexus file. By providing the ability to automatically color the tree by sequence name, the MixtureTree Annotator provides a unique advantage over any other programs which perform a similar function. In addition, the MixtureTree Annotator is the only package that can efficiently annotate the output produced by MixtureTree with mutation information and coalescent time information. In order to visualize the resulting output file, a modified version of FigTree is used. Certain popular methods, which lack good built-in visualization tools, for example, MEGA, Mesquite, PHY-FI, TreeView, treeGraph and Geneious, may give results with human errors due to either manually adding colors to each node or with other limitations, for example only using color based on a number, such as branch length, or by taxonomy. In addition to allowing the user to automatically color any given Newick tree by sequence name, the MixtureTree Annotator is the only method that allows the user to automatically annotate the resulting tree created by the MixtureTree program. The MixtureTree Annotator is fast and easy-to-use, while still allowing the user full control over the coloring and annotating process.
ANALYSIS OF ENVIRONMENTAL FRAGILITY USING MULTI-CRITERIA ANALYSIS (MCE FOR INTEGRATED LANDSCAPE ASSESSMENT

Directory of Open Access Journals (Sweden)

Abimael Cereda Junior

2014-01-01

Full Text Available The Geographic Information Systems brought greater possibilitie s to the representation and interpretation of the landscap e as well as the integrated a nalysis. However, this approach does not dispense technical and methodological substan tiation for achieving the computational universe. This work is grounded in ecodynamic s and empirical analysis of natural and anthr opogenic environmental Fragility a nd aims to propose and present an integrated paradigm of Multi-criteria Analysis and F uzzy Logic Model of Environmental Fragility, taking as a case study of the Basin of Monjolinho Stream in São Carlos-SP. The use of this methodology allowed for a reduct ion in the subjectivism influences of decision criteria, which factors might have its cartographic expression, respecting the complex integrated landscape.
Thermal Infrared Remote Sensing for Analysis of Landscape Ecological Processes: Methods and Applications

Science.gov (United States)

Quattrochi, Dale A.; Luvall, Jeffrey C.

1998-01-01

Thermal Infrared (TIR) remote sensing data can provide important measurements of surface energy fluxes and temperatures, which are integral to understanding landscape processes and responses. One example of this is the successful application of TIR remote sensing data to estimate evapotranspiration and soil moisture, where results from a number of studies suggest that satellite-based measurements from TIR remote sensing data can lead to more accurate regional-scale estimates of daily evapotranspiration. With further refinement in analytical techniques and models, the use of TIR data from airborne and satellite sensors could be very useful for parameterizing surface moisture conditions and developing better simulations of landscape energy exchange over a variety of conditions and space and time scales. Thus, TIR remote sensing data can significantly contribute to the observation, measurement, and analysis of energy balance characteristics (i.e., the fluxes and redistribution of thermal energy within and across the land surface) as an implicit and important aspect of landscape dynamics and landscape functioning. The application of TIR remote sensing data in landscape ecological studies has been limited, however, for several fundamental reasons that relate primarily to the perceived difficulty in use and availability of these data by the landscape ecology community, and from the fragmentation of references on TIR remote sensing throughout the scientific literature. It is our purpose here to provide evidence from work that has employed TIR remote sensing for analysis of landscape characteristics to illustrate how these data can provide important data for the improved measurement of landscape energy response and energy flux relationships. We examine the direct or indirect use of TIR remote sensing data to analyze landscape biophysical characteristics, thereby offering some insight on how these data can be used more robustly to further the understanding and modeling of
RNA-seq analysis of Quercus pubescens Leaves: de novo transcriptome assembly, annotation and functional markers development.

Directory of Open Access Journals (Sweden)

Sara Torre

Full Text Available Quercus pubescens Willd., a species distributed from Spain to southwest Asia, ranks high for drought tolerance among European oaks. Q. pubescens performs a role of outstanding significance in most Mediterranean forest ecosystems, but few mechanistic studies have been conducted to explore its response to environmental constrains, due to the lack of genomic resources. In our study, we performed a deep transcriptomic sequencing in Q. pubescens leaves, including de novo assembly, functional annotation and the identification of new molecular markers. Our results are a pre-requisite for undertaking molecular functional studies, and may give support in population and association genetic studies. 254,265,700 clean reads were generated by the Illumina HiSeq 2000 platform, with an average length of 98 bp. De novo assembly, using CLC Genomics, produced 96,006 contigs, having a mean length of 618 bp. Sequence similarity analyses against seven public databases (Uniprot, NR, RefSeq and KOGs at NCBI, Pfam, InterPro and KEGG resulted in 83,065 transcripts annotated with gene descriptions, conserved protein domains, or gene ontology terms. These annotations and local BLAST allowed identify genes specifically associated with mechanisms of drought avoidance. Finally, 14,202 microsatellite markers and 18,425 single nucleotide polymorphisms (SNPs were, in silico, discovered in assembled and annotated sequences. We completed a successful global analysis of the Q. pubescens leaf transcriptome using RNA-seq. The assembled and annotated sequences together with newly discovered molecular markers provide genomic information for functional genomic studies in Q. pubescens, with special emphasis to response mechanisms to severe constrain of the Mediterranean climate. Our tools enable comparative genomics studies on other Quercus species taking advantage of large intra-specific ecophysiological differences.
Ontological Annotation with WordNet

Energy Technology Data Exchange (ETDEWEB)

Sanfilippo, Antonio P.; Tratz, Stephen C.; Gregory, Michelle L.; Chappell, Alan R.; Whitney, Paul D.; Posse, Christian; Paulson, Patrick R.; Baddeley, Bob; Hohimer, Ryan E.; White, Amanda M.

2006-06-06

Semantic Web applications require robust and accurate annotation tools that are capable of automating the assignment of ontological classes to words in naturally occurring text (ontological annotation). Most current ontologies do not include rich lexical databases and are therefore not easily integrated with word sense disambiguation algorithms that are needed to automate ontological annotation. WordNet provides a potentially ideal solution to this problem as it offers a highly structured lexical conceptual representation that has been extensively used to develop word sense disambiguation algorithms. However, WordNet has not been designed as an ontology, and while it can be easily turned into one, the result of doing this would present users with serious practical limitations due to the great number of concepts (synonym sets) it contains. Moreover, mapping WordNet to an existing ontology may be difficult and requires substantial labor. We propose to overcome these limitations by developing an analytical platform that (1) provides a WordNet-based ontology offering a manageable and yet comprehensive set of concept classes, (2) leverages the lexical richness of WordNet to give an extensive characterization of concept class in terms of lexical instances, and (3) integrates a class recognition algorithm that automates the assignment of concept classes to words in naturally occurring text. The ensuing framework makes available an ontological annotation platform that can be effectively integrated with intelligence analysis systems to facilitate evidence marshaling and sustain the creation and validation of inference models.
Supplementary Material for: BEACON: automated tool for Bacterial GEnome Annotation ComparisON

KAUST Repository

Kalkatawi, Manal M.; Alam, Intikhab; Bajic, Vladimir B.

2015-01-01

Abstract Background Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). Results The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACONâ s utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27Â %, while the number of genes without any function assignment is reduced. Conclusions We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/ .
Evaluating Hierarchical Structure in Music Annotations.

Science.gov (United States)

McFee, Brian; Nieto, Oriol; Farbood, Morwaread M; Bello, Juan Pablo

2017-01-01

Music exhibits structure at multiple scales, ranging from motifs to large-scale functional components. When inferring the structure of a piece, different listeners may attend to different temporal scales, which can result in disagreements when they describe the same piece. In the field of music informatics research (MIR), it is common to use corpora annotated with structural boundaries at different levels. By quantifying disagreements between multiple annotators, previous research has yielded several insights relevant to the study of music cognition. First, annotators tend to agree when structural boundaries are ambiguous. Second, this ambiguity seems to depend on musical features, time scale, and genre. Furthermore, it is possible to tune current annotation evaluation metrics to better align with these perceptual differences. However, previous work has not directly analyzed the effects of hierarchical structure because the existing methods for comparing structural annotations are designed for "flat" descriptions, and do not readily generalize to hierarchical annotations. In this paper, we extend and generalize previous work on the evaluation of hierarchical descriptions of musical structure. We derive an evaluation metric which can compare hierarchical annotations holistically across multiple levels. sing this metric, we investigate inter-annotator agreement on the multilevel annotations of two different music corpora, investigate the influence of acoustic properties on hierarchical annotations, and evaluate existing hierarchical segmentation algorithms against the distribution of inter-annotator agreement.
Evaluating Hierarchical Structure in Music Annotations

Directory of Open Access Journals (Sweden)

Brian McFee

2017-08-01

Full Text Available Music exhibits structure at multiple scales, ranging from motifs to large-scale functional components. When inferring the structure of a piece, different listeners may attend to different temporal scales, which can result in disagreements when they describe the same piece. In the field of music informatics research (MIR, it is common to use corpora annotated with structural boundaries at different levels. By quantifying disagreements between multiple annotators, previous research has yielded several insights relevant to the study of music cognition. First, annotators tend to agree when structural boundaries are ambiguous. Second, this ambiguity seems to depend on musical features, time scale, and genre. Furthermore, it is possible to tune current annotation evaluation metrics to better align with these perceptual differences. However, previous work has not directly analyzed the effects of hierarchical structure because the existing methods for comparing structural annotations are designed for “flat” descriptions, and do not readily generalize to hierarchical annotations. In this paper, we extend and generalize previous work on the evaluation of hierarchical descriptions of musical structure. We derive an evaluation metric which can compare hierarchical annotations holistically across multiple levels. sing this metric, we investigate inter-annotator agreement on the multilevel annotations of two different music corpora, investigate the influence of acoustic properties on hierarchical annotations, and evaluate existing hierarchical segmentation algorithms against the distribution of inter-annotator agreement.
Challenges in Whole-Genome Annotation of Pyrosequenced Eukaryotic Genomes

Energy Technology Data Exchange (ETDEWEB)

Kuo, Alan; Grigoriev, Igor

2009-04-17

Pyrosequencing technologies such as 454/Roche and Solexa/Illumina vastly lower the cost of nucleotide sequencing compared to the traditional Sanger method, and thus promise to greatly expand the number of sequenced eukaryotic genomes. However, the new technologies also bring new challenges such as shorter reads and new kinds and higher rates of sequencing errors, which complicate genome assembly and gene prediction. At JGI we are deploying 454 technology for the sequencing and assembly of ever-larger eukaryotic genomes. Here we describe our first whole-genome annotation of a purely 454-sequenced fungal genome that is larger than a yeast (>30 Mbp). The pezizomycotine (filamentous ascomycote) Aspergillus carbonarius belongs to the Aspergillus section Nigri species complex, members of which are significant as platforms for bioenergy and bioindustrial technology, as members of soil microbial communities and players in the global carbon cycle, and as agricultural toxigens. Application of a modified version of the standard JGI Annotation Pipeline has so far predicted ~;;10k genes. ~;;12percent of these preliminary annotations suffer a potential frameshift error, which is somewhat higher than the ~;;9percent rate in the Sanger-sequenced and conventionally assembled and annotated genome of fellow Aspergillus section Nigri member A. niger. Also,>90percent of A. niger genes have potential homologs in the A. carbonarius preliminary annotation. Weconclude, and with further annotation and comparative analysis expect to confirm, that 454 sequencing strategies provide a promising substrate for annotation of modestly sized eukaryotic genomes. We will also present results of annotation of a number of other pyrosequenced fungal genomes of bioenergy interest.
Automatically annotating topics in transcripts of patient-provider interactions via machine learning.

Science.gov (United States)

Wallace, Byron C; Laws, M Barton; Small, Kevin; Wilson, Ira B; Trikalinos, Thomas A

2014-05-01

Annotated patient-provider encounters can provide important insights into clinical communication, ultimately suggesting how it might be improved to effect better health outcomes. But annotating outpatient transcripts with Roter or General Medical Interaction Analysis System (GMIAS) codes is expensive, limiting the scope of such analyses. We propose automatically annotating transcripts of patient-provider interactions with topic codes via machine learning. We use a conditional random field (CRF) to model utterance topic probabilities. The model accounts for the sequential structure of conversations and the words comprising utterances. We assess predictive performance via 10-fold cross-validation over GMIAS-annotated transcripts of 360 outpatient visits (>230,000 utterances). We then use automated in place of manual annotations to reproduce an analysis of 116 additional visits from a randomized trial that used GMIAS to assess the efficacy of an intervention aimed at improving communication around antiretroviral (ARV) adherence. With respect to 6 topic codes, the CRF achieved a mean pairwise kappa compared with human annotators of 0.49 (range: 0.47-0.53) and a mean overall accuracy of 0.64 (range: 0.62-0.66). With respect to the RCT reanalysis, results using automated annotations agreed with those obtained using manual ones. According to the manual annotations, the median number of ARV-related utterances without and with the intervention was 49.5 versus 76, respectively (paired sign test P = 0.07). When automated annotations were used, the respective numbers were 39 versus 55 (P = 0.04). While moderately accurate, the predicted annotations are far from perfect. Conversational topics are intermediate outcomes, and their utility is still being researched. This foray into automated topic inference suggests that machine learning methods can classify utterances comprising patient-provider interactions into clinically relevant topics with reasonable accuracy.
Reasoning with Annotations of Texts

OpenAIRE

Ma , Yue; Lévy , François; Ghimire , Sudeep

2011-01-01

International audience; Linguistic and semantic annotations are important features for text-based applications. However, achieving and maintaining a good quality of a set of annotations is known to be a complex task. Many ad hoc approaches have been developed to produce various types of annotations, while comparing those annotations to improve their quality is still rare. In this paper, we propose a framework in which both linguistic and domain information can cooperate to reason with annotat...
AND LANDSCAPE-ECOLOGICAL ANALYSIS OF ITS DISTRIBUTION

OpenAIRE

S. M. Musaeva

2012-01-01

The article is devoted to the study of helminthofauna of the striped lizard in Lankaran natural region. The landscape and ecological analysis of distribution of the helminthofauna is provided. As a result of studies on 99 individuals of striped lizard totally 14 species of helminthes, including 1 trematode species, 1 species of cestode, 3 species of akantocefals and 9 species of nematodes were found.
AND LANDSCAPE-ECOLOGICAL ANALYSIS OF ITS DISTRIBUTION

Directory of Open Access Journals (Sweden)

S. M. Musaeva

2012-01-01

Full Text Available The article is devoted to the study of helminthofauna of the striped lizard in Lankaran natural region. The landscape and ecological analysis of distribution of the helminthofauna is provided. As a result of studies on 99 individuals of striped lizard totally 14 species of helminthes, including 1 trematode species, 1 species of cestode, 3 species of akantocefals and 9 species of nematodes were found.
GIS-based landscape design research: Stourhead landscape garden as a case study

Directory of Open Access Journals (Sweden)

Steffen Nijhuis

2017-11-01

Full Text Available Landscape design research is important for cultivating spatial intelligence in landscape architecture. This study explores GIS (geographic information systems as a tool for landscape design research - investigating landscape designs to understand them as architectonic compositions (architectonic plan analysis. The concept ‘composition’ refers to a conceivable arrangement, an architectural expression of a mental construct that is legible and open to interpretation. Landscape architectonic compositions and their representations embody a great wealth of design knowledge as objects of our material culture and reflect the possible treatment of the ground, space, image and program as a characteristic coherence. By exploring landscape architectonic compositions with GIS, design researchers can acquire design knowledge that can be used in the creation and refinement of a design. The research aims to identify and illustrate the potential role of GIS as a tool in landscape design research, so as to provide insight into the possibilities and limitations of using GIS in this capacity. The critical, information-oriented case of Stourhead landscape garden (Wiltshire, UK, an example of a designed landscape that covers the scope and remit of landscape architecture design, forms the heart of the study. The exploration of Stourhead by means of GIS can be understood as a plausibility probe. Here the case study is considered a form of ‘quasi-experiment’, testing the hypothesis and generating a learning process that constitutes a prerequisite for advanced understanding, while using an adjusted version of the framework for landscape design analysis by Steenbergen and Reh (2003. This is a theoretically informed analytical method based on the formal interpretation of the landscape architectonic composition addressing four landscape architectonic categories: the basic, the spatial, the symbolic and the programmatic form. This study includes new aspects to be
Analysis of adaptive walks on NK fitness landscapes with different interaction schemes

International Nuclear Information System (INIS)

Nowak, Stefan; Krug, Joachim

2015-01-01

Fitness landscapes are genotype to fitness mappings commonly used in evolutionary biology and computer science which are closely related to spin glass models. In this paper, we study the NK model for fitness landscapes where the interaction scheme between genes can be explicitly defined. The focus is on how this scheme influences the overall shape of the landscape. Our main tool for the analysis are adaptive walks, an idealized dynamics by which the population moves uphill in fitness and terminates at a local fitness maximum. We use three different types of walks and investigate how their length (the number of steps required to reach a local peak) and height (the fitness at the endpoint of the walk) depend on the dimensionality and structure of the landscape. We find that the distribution of local maxima over the landscape is particularly sensitive to the choice of interaction pattern. Most quantities that we measure are simply correlated to the rank of the scheme, which is equal to the number of nonzero coefficients in the expansion of the fitness landscape in terms of Walsh functions

MIPS: analysis and annotation of proteins from whole genomes in 2005.

Science.gov (United States)

Mewes, H W; Frishman, D; Mayer, K F X; Münsterkötter, M; Noubibou, O; Pagel, P; Rattei, T; Oesterheld, M; Ruepp, A; Stümpflen, V

2006-01-01

The Munich Information Center for Protein Sequences (MIPS at the GSF), Neuherberg, Germany, provides resources related to genome information. Manually curated databases for several reference organisms are maintained. Several of these databases are described elsewhere in this and other recent NAR database issues. In a complementary effort, a comprehensive set of >400 genomes automatically annotated with the PEDANT system are maintained. The main goal of our current work on creating and maintaining genome databases is to extend gene centered information to information on interactions within a generic comprehensive framework. We have concentrated our efforts along three lines (i) the development of suitable comprehensive data structures and database technology, communication and query tools to include a wide range of different types of information enabling the representation of complex information such as functional modules or networks Genome Research Environment System, (ii) the development of databases covering computable information such as the basic evolutionary relations among all genes, namely SIMAP, the sequence similarity matrix and the CABiNet network analysis framework and (iii) the compilation and manual annotation of information related to interactions such as protein-protein interactions or other types of relations (e.g. MPCDB, MPPI, CYGD). All databases described and the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.gsf.de).
Landscape and participation: construction of a PhD research problem and an analysis method. Towards the comparative analysis of participatory processes of landscape management projects design on a local scale in the Walloon region (Belgium)

OpenAIRE

Droeven, Emilie

2007-01-01

A preliminary reflection to the definition of a PhD research problem on the concepts of participation, landscape and project, led the student to be interested in the participatory processes of landscape management projects design, and in the inhabitants landscapes representations. The method includes the comparative analysis of local processes of projects design, and the direct observation of two Walloon landscape management projects design (investigation conducted with stakeholders implied i...
Identification and annotation of erotic film based on content analysis

Science.gov (United States)

Wang, Donghui; Zhu, Miaoliang; Yuan, Xin; Qian, Hui

2005-02-01

The paper brings forward a new method for identifying and annotating erotic films based on content analysis. First, the film is decomposed to video and audio stream. Then, the video stream is segmented into shots and key frames are extracted from each shot. We filter the shots that include potential erotic content by finding the nude human body in key frames. A Gaussian model in YCbCr color space for detecting skin region is presented. An external polygon that covered the skin regions is used for the approximation of the human body. Last, we give the degree of the nudity by calculating the ratio of skin area to whole body area with weighted parameters. The result of the experiment shows the effectiveness of our method.
A meta-analysis of crop pest and natural enemy response to landscape complexity.

Science.gov (United States)

Chaplin-Kramer, Rebecca; O'Rourke, Megan E; Blitzer, Eleanor J; Kremen, Claire

2011-09-01

Many studies in recent years have investigated the relationship between landscape complexity and pests, natural enemies and/or pest control. However, no quantitative synthesis of this literature beyond simple vote-count methods yet exists. We conducted a meta-analysis of 46 landscape-level studies, and found that natural enemies have a strong positive response to landscape complexity. Generalist enemies show consistent positive responses to landscape complexity across all scales measured, while specialist enemies respond more strongly to landscape complexity at smaller scales. Generalist enemy response to natural habitat also tends to occur at larger spatial scales than for specialist enemies, suggesting that land management strategies to enhance natural pest control should differ depending on whether the dominant enemies are generalists or specialists. The positive response of natural enemies does not necessarily translate into pest control, since pest abundances show no significant response to landscape complexity. Very few landscape-scale studies have estimated enemy impact on pest populations, however, limiting our understanding of the effects of landscape on pest control. We suggest focusing future research efforts on measuring population dynamics rather than static counts to better characterise the relationship between landscape complexity and pest control services from natural enemies. © 2011 Blackwell Publishing Ltd/CNRS.
Semantic annotation of consumer health questions.

Science.gov (United States)

Kilicoglu, Halil; Ben Abacha, Asma; Mrabet, Yassine; Shooshan, Sonya E; Rodriguez, Laritza; Masterton, Kate; Demner-Fushman, Dina

2018-02-06

Consumers increasingly use online resources for their health information needs. While current search engines can address these needs to some extent, they generally do not take into account that most health information needs are complex and can only fully be expressed in natural language. Consumer health question answering (QA) systems aim to fill this gap. A major challenge in developing consumer health QA systems is extracting relevant semantic content from the natural language questions (question understanding). To develop effective question understanding tools, question corpora semantically annotated for relevant question elements are needed. In this paper, we present a two-part consumer health question corpus annotated with several semantic categories: named entities, question triggers/types, question frames, and question topic. The first part (CHQA-email) consists of relatively long email requests received by the U.S. National Library of Medicine (NLM) customer service, while the second part (CHQA-web) consists of shorter questions posed to MedlinePlus search engine as queries. Each question has been annotated by two annotators. The annotation methodology is largely the same between the two parts of the corpus; however, we also explain and justify the differences between them. Additionally, we provide information about corpus characteristics, inter-annotator agreement, and our attempts to measure annotation confidence in the absence of adjudication of annotations. The resulting corpus consists of 2614 questions (CHQA-email: 1740, CHQA-web: 874). Problems are the most frequent named entities, while treatment and general information questions are the most common question types. Inter-annotator agreement was generally modest: question types and topics yielded highest agreement, while the agreement for more complex frame annotations was lower. Agreement in CHQA-web was consistently higher than that in CHQA-email. Pairwise inter-annotator agreement proved most
Gene Ontology annotation of the rice blast fungus, Magnaporthe oryzae

Directory of Open Access Journals (Sweden)

Deng Jixin

2009-02-01

were assigned to the 3 root terms. The Version 5 GO annotation is publically queryable via the GO site http://amigo.geneontology.org/cgi-bin/amigo/go.cgi. Additionally, the genome of M. oryzae is constantly being refined and updated as new information is incorporated. For the latest GO annotation of Version 6 genome, please visit our website http://scotland.fgl.ncsu.edu/smeng/GoAnnotationMagnaporthegrisea.html. The preliminary GO annotation of Version 6 genome is placed at a local MySql database that is publically queryable via a user-friendly interface Adhoc Query System. Conclusion Our analysis provides comprehensive and robust GO annotations of the M. oryzae genome assemblies that will be solid foundations for further functional interrogation of M. oryzae.
Predicting word sense annotation agreement

DEFF Research Database (Denmark)

Martinez Alonso, Hector; Johannsen, Anders Trærup; Lopez de Lacalle, Oier

2015-01-01

High agreement is a common objective when annotating data for word senses. However, a number of factors make perfect agreement impossible, e.g. the limitations of the sense inventories, the difficulty of the examples or the interpretation preferences of the annotations. Estimating potential...... agreement is thus a relevant task to supplement the evaluation of sense annotations. In this article we propose two methods to predict agreement on word-annotation instances. We experiment with a continuous representation and a three-way discretization of observed agreement. In spite of the difficulty...
Selecting landscape metrics as indicators of spatial heterogeneity-A comparison among Greek landscapes

Science.gov (United States)

Plexida, Sofia G.; Sfougaris, Athanassios I.; Ispikoudis, Ioannis P.; Papanastasis, Vasilios P.

2014-02-01

This paper investigates the spatial heterogeneity of three landscapes along an altitudinal gradient and different human land use. The main aim was the identification of appropriate landscape indicators using different extents. ASTER image was used to create a land cover map consisting of three landscapes which differed in altitude and land use. A number of landscape metrics quantifying patch complexity, configuration, diversity and connectivity were derived from the thematic map at the landscape level. There were significant differences among the three landscapes regarding these four aspects of landscape heterogeneity. The analysis revealed a specific pattern of land use where lowlands are being increasingly utilized by humans (percentage of agricultural land = 65.84%) characterized by physical connectedness (high values of Patch Cohesion Index) and relatively simple geometries (low values of fractal dimension index). The landscape pattern of uplands was found to be highly diverse based upon the Shannon Diversity index. After selecting the scale (600 ha) where metrics values stabilized, it was shown that metrics were more correlated at the small scale of 60 ha. From the original 24 metrics, 14 individual metrics with high Spearman correlation coefficient and Variance Inflation Factor criterion were eliminated, leaving 10 representative metrics for subsequent analysis. Data reduction analysis showed that Patch Density, Area-Weighted Mean Fractal Dimension Index and Patch Cohesion Index are suitable to describe landscape patterns irrespective of the scale. A systematic screening of these metrics could enhance a deeper understanding of the results obtained by them and contribute to a sustainable landscape management of Mediterranean landscapes.
Alignment-Annotator web server: rendering and annotating sequence alignments.

Science.gov (United States)

Gille, Christoph; Fähling, Michael; Weyand, Birgit; Wieland, Thomas; Gille, Andreas

2014-07-01

Alignment-Annotator is a novel web service designed to generate interactive views of annotated nucleotide and amino acid sequence alignments (i) de novo and (ii) embedded in other software. All computations are performed at server side. Interactivity is implemented in HTML5, a language native to web browsers. The alignment is initially displayed using default settings and can be modified with the graphical user interfaces. For example, individual sequences can be reordered or deleted using drag and drop, amino acid color code schemes can be applied and annotations can be added. Annotations can be made manually or imported (BioDAS servers, the UniProt, the Catalytic Site Atlas and the PDB). Some edits take immediate effect while others require server interaction and may take a few seconds to execute. The final alignment document can be downloaded as a zip-archive containing the HTML files. Because of the use of HTML the resulting interactive alignment can be viewed on any platform including Windows, Mac OS X, Linux, Android and iOS in any standard web browser. Importantly, no plugins nor Java are required and therefore Alignment-Anotator represents the first interactive browser-based alignment visualization. http://www.bioinformatics.org/strap/aa/ and http://strap.charite.de/aa/. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Enabling Histopathological Annotations on Immunofluorescent Images through Virtualization of Hematoxylin and Eosin.

Science.gov (United States)

Lahiani, Amal; Klaiman, Eldad; Grimm, Oliver

2018-01-01

Medical diagnosis and clinical decisions rely heavily on the histopathological evaluation of tissue samples, especially in oncology. Historically, classical histopathology has been the gold standard for tissue evaluation and assessment by pathologists. The most widely and commonly used dyes in histopathology are hematoxylin and eosin (H&E) as most malignancies diagnosis is largely based on this protocol. H&E staining has been used for more than a century to identify tissue characteristics and structures morphologies that are needed for tumor diagnosis. In many cases, as tissue is scarce in clinical studies, fluorescence imaging is necessary to allow staining of the same specimen with multiple biomarkers simultaneously. Since fluorescence imaging is a relatively new technology in the pathology landscape, histopathologists are not used to or trained in annotating or interpreting these images. To allow pathologists to annotate these images without the need for additional training, we designed an algorithm for the conversion of fluorescence images to brightfield H&E images. In this algorithm, we use fluorescent nuclei staining to reproduce the hematoxylin information and natural tissue autofluorescence to reproduce the eosin information avoiding the necessity to specifically stain the proteins or intracellular structures with an additional fluorescence stain. Our method is based on optimizing a transform function from fluorescence to H&E images using least mean square optimization. It results in high quality virtual H&E digital images that can easily and efficiently be analyzed by pathologists. We validated our results with pathologists by making them annotate tumor in real and virtual H&E whole slide images and we obtained promising results. Hence, we provide a solution that enables pathologists to assess tissue and annotate specific structures based on multiplexed fluorescence images.
A Stakeholders’ Analysis of Eastern Mediterranean Landscapes: Contextualities, Commonalities and Concerns

Directory of Open Access Journals (Sweden)

Theano S. Terkenli

2017-12-01

Full Text Available This study aims at demonstrating and critically assessing high-level landscape stakeholders’ perceptions and understandings of landscape-related issues, threats and problems, in the Eastern Mediterranean, through a purposive comparative research survey of four case studies: Cyprus, Greece, Jordan and Lebanon. Employing qualitative data analysis of intensive stakeholder interviews, performed in the broader context of the MEDSCAPES ENPI-MED project (www.enpi-medscapes.org, the paper draws together the insights and concerns of a total of 61 public entities, private entrepreneurs, academicians and NGO representatives, on landscape knowledge, understanding, management and public awareness, in these four countries. The results point to significant commonalities among them and begin to show relational and synthetic nature of the interrelationship between humans and the landscape, as it developed in the context of the local and regional geographies and histories of this broader region, affected by and involving a series of relevant geophysical, economic, political, social, moral, institutional and other parameters.
Functional annotation of hierarchical modularity.

Directory of Open Access Journals (Sweden)

Kanchana Padmanabhan

Full Text Available In biological networks of molecular interactions in a cell, network motifs that are biologically relevant are also functionally coherent, or form functional modules. These functionally coherent modules combine in a hierarchical manner into larger, less cohesive subsystems, thus revealing one of the essential design principles of system-level cellular organization and function-hierarchical modularity. Arguably, hierarchical modularity has not been explicitly taken into consideration by most, if not all, functional annotation systems. As a result, the existing methods would often fail to assign a statistically significant functional coherence score to biologically relevant molecular machines. We developed a methodology for hierarchical functional annotation. Given the hierarchical taxonomy of functional concepts (e.g., Gene Ontology and the association of individual genes or proteins with these concepts (e.g., GO terms, our method will assign a Hierarchical Modularity Score (HMS to each node in the hierarchy of functional modules; the HMS score and its p-value measure functional coherence of each module in the hierarchy. While existing methods annotate each module with a set of "enriched" functional terms in a bag of genes, our complementary method provides the hierarchical functional annotation of the modules and their hierarchically organized components. A hierarchical organization of functional modules often comes as a bi-product of cluster analysis of gene expression data or protein interaction data. Otherwise, our method will automatically build such a hierarchy by directly incorporating the functional taxonomy information into the hierarchy search process and by allowing multi-functional genes to be part of more than one component in the hierarchy. In addition, its underlying HMS scoring metric ensures that functional specificity of the terms across different levels of the hierarchical taxonomy is properly treated. We have evaluated our
Urban-Historical Landscape Analysis on the Basis of Mental Perceptions Case Study: Tajrish Neighborhood

Directory of Open Access Journals (Sweden)

Anoosheh Gohari

2015-12-01

Full Text Available Despite the close affinity between collective memory and urban structures as the relationship between what is hidden and what is visible; rapid changes throughout the city have caused a disconnection between integrated memories and landscape cohesion. As a context for memories, the historical urban landscape proves to be valuable. The present research seeks to identify elements and signs in urban landscape design that is associated with collective memories and to determine the extent of their impact on maintainability and consolidation of the cultural integrity and attachment to residential areas and urban spaces. Now, a question is raised: Which kinds of elements help us to reach landscape perception in relation to collective memory? Accordingly, major categories, which have influences on mental perceptions, based on the studies, are elements that affect landscape, mental attachment, rootedness, and social relations. Identification and utilization of these categories in urban landscape design would enable the perception of the landscape as a mental reality that is tied with memories of the users of the space and is possible with elements such as signs in the landscape. To address research inquiries, the researcher has surveyed components of collective memory via landscape analysis method. The use of qualitative techniques is dominant in the paper along with some quantitative methods, and the under-investigation location is Shemiran. The research method was comprised of field survey and obtaining information regarding history of the site. In order to answer research questions, landscape analysis method based on subjective perceptions was selected. The statistical population of the study included 30 residents of the district that were 30 years old or older. The respondents were presented with the obtained elements, as well as 6 pictures in order to score them based on their subjective perception. Questionnaire data was analyzed and elements that
Annotation-based feature extraction from sets of SBML models.

Science.gov (United States)

Alm, Rebekka; Waltemath, Dagmar; Wolfien, Markus; Wolkenhauer, Olaf; Henkel, Ron

2015-01-01

Model repositories such as BioModels Database provide computational models of biological systems for the scientific community. These models contain rich semantic annotations that link model entities to concepts in well-established bio-ontologies such as Gene Ontology. Consequently, thematically similar models are likely to share similar annotations. Based on this assumption, we argue that semantic annotations are a suitable tool to characterize sets of models. These characteristics improve model classification, allow to identify additional features for model retrieval tasks, and enable the comparison of sets of models. In this paper we discuss four methods for annotation-based feature extraction from model sets. We tested all methods on sets of models in SBML format which were composed from BioModels Database. To characterize each of these sets, we analyzed and extracted concepts from three frequently used ontologies, namely Gene Ontology, ChEBI and SBO. We find that three out of the methods are suitable to determine characteristic features for arbitrary sets of models: The selected features vary depending on the underlying model set, and they are also specific to the chosen model set. We show that the identified features map on concepts that are higher up in the hierarchy of the ontologies than the concepts used for model annotations. Our analysis also reveals that the information content of concepts in ontologies and their usage for model annotation do not correlate. Annotation-based feature extraction enables the comparison of model sets, as opposed to existing methods for model-to-keyword comparison, or model-to-model comparison.
CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences

Science.gov (United States)

2012-01-01

Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR) are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas. PMID:23256920
CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences

Directory of Open Access Journals (Sweden)

Liu Chang

2012-12-01

Full Text Available Abstract Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas.
Annotation of nerve cord transcriptome in earthworm Eisenia fetida

Directory of Open Access Journals (Sweden)

Vasanthakumar Ponesakki

2017-12-01

Full Text Available In annelid worms, the nerve cord serves as a crucial organ to control the sensory and behavioral physiology. The inadequate genome resource of earthworms has prioritized the comprehensive analysis of their transcriptome dataset to monitor the genes express in the nerve cord and predict their role in the neurotransmission and sensory perception of the species. The present study focuses on identifying the potential transcripts and predicting their functional features by annotating the transcriptome dataset of nerve cord tissues prepared by Gong et al., 2010 from the earthworm Eisenia fetida. Totally 9762 transcripts were successfully annotated against the NCBI nr database using the BLASTX algorithm and among them 7680 transcripts were assigned to a total of 44,354 GO terms. The conserve domain analysis indicated the over representation of P-loop NTPase domain and calcium binding EF-hand domain. The COG functional annotation classified 5860 transcript sequences into 25 functional categories. Further, 4502 contig sequences were found to map with 124 KEGG pathways. The annotated contig dataset exhibited 22 crucial neuropeptides having considerable matches to the marine annelid Platynereis dumerilii, suggesting their possible role in neurotransmission and neuromodulation. In addition, 108 human stem cell marker homologs were identified including the crucial epigenetic regulators, transcriptional repressors and cell cycle regulators, which may contribute to the neuronal and segmental regeneration. The complete functional annotation of this nerve cord transcriptome can be further utilized to interpret genetic and molecular mechanisms associated with neuronal development, nervous system regeneration and nerve cord function.
Expressed Peptide Tags: An additional layer of data for genome annotation

Energy Technology Data Exchange (ETDEWEB)

Savidor, Alon [ORNL; Donahoo, Ryan S [ORNL; Hurtado-Gonzales, Oscar [University of Tennessee, Knoxville (UTK); Verberkmoes, Nathan C [ORNL; Shah, Manesh B [ORNL; Lamour, Kurt H [ORNL; McDonald, W Hayes [ORNL

2006-01-01

While genome sequencing is becoming ever more routine, genome annotation remains a challenging process. Identification of the coding sequences within the genomic milieu presents a tremendous challenge, especially for eukaryotes with their complex gene architectures. Here we present a method to assist the annotation process through the use of proteomic data and bioinformatics. Mass spectra of digested protein preparations of the organism of interest were acquired and searched against a protein database created by a six frame translation of the genome. The identified peptides were mapped back to the genome, compared to the current annotation, and then categorized as supporting or extending the current genome annotation. We named the classified peptides Expressed Peptide Tags (EPTs). The well annotated bacterium Rhodopseudomonas palustris was used as a control for the method and showed high degree of correlation between EPT mapping and the current annotation, with 86% of the EPTs confirming existing gene calls and less than 1% of the EPTs expanding on the current annotation. The eukaryotic plant pathogens Phytophthora ramorum and Phytophthora sojae, whose genomes have been recently sequenced and are much less well annotated, were also subjected to this method. A series of algorithmic steps were taken to increase the confidence of EPT identification for these organisms, including generation of smaller sub-databases to be searched against, and definition of EPT criteria that accommodates the more complex eukaryotic gene architecture. As expected, the analysis of the Phytophthora species showed less correlation between EPT mapping and their current annotation. While ~77% of Phytophthora EPTs supported the current annotation, a portion of them (7.2% and 12.6% for P. ramorum and P. sojae, respectively) suggested modification to current gene calls or identified novel genes that were missed by the current genome annotation of these organisms.
Lines of landscape organisation

DEFF Research Database (Denmark)

Løvschal, Mette

2015-01-01

This paper offers a landscape analysis of the earliest linear landscape boundaries on Skovbjerg Moraine, Denmark, during the first millennium BC. Using Delaunay triangulation as well as classic distribution analyses, it demonstrates that landscape boundaries articulated already established use-pa...
Snap: an integrated SNP annotation platform

DEFF Research Database (Denmark)

Li, Shengting; Ma, Lijia; Li, Heng

2007-01-01

Snap (Single Nucleotide Polymorphism Annotation Platform) is a server designed to comprehensively analyze single genes and relationships between genes basing on SNPs in the human genome. The aim of the platform is to facilitate the study of SNP finding and analysis within the framework of medical...

Automating Ontological Annotation with WordNet

Energy Technology Data Exchange (ETDEWEB)

Sanfilippo, Antonio P.; Tratz, Stephen C.; Gregory, Michelle L.; Chappell, Alan R.; Whitney, Paul D.; Posse, Christian; Paulson, Patrick R.; Baddeley, Bob L.; Hohimer, Ryan E.; White, Amanda M.

2006-01-22

Semantic Web applications require robust and accurate annotation tools that are capable of automating the assignment of ontological classes to words in naturally occurring text (ontological annotation). Most current ontologies do not include rich lexical databases and are therefore not easily integrated with word sense disambiguation algorithms that are needed to automate ontological annotation. WordNet provides a potentially ideal solution to this problem as it offers a highly structured lexical conceptual representation that has been extensively used to develop word sense disambiguation algorithms. However, WordNet has not been designed as an ontology, and while it can be easily turned into one, the result of doing this would present users with serious practical limitations due to the great number of concepts (synonym sets) it contains. Moreover, mapping WordNet to an existing ontology may be difficult and requires substantial labor. We propose to overcome these limitations by developing an analytical platform that (1) provides a WordNet-based ontology offering a manageable and yet comprehensive set of concept classes, (2) leverages the lexical richness of WordNet to give an extensive characterization of concept class in terms of lexical instances, and (3) integrates a class recognition algorithm that automates the assignment of concept classes to words in naturally occurring text. The ensuing framework makes available an ontological annotation platform that can be effectively integrated with intelligence analysis systems to facilitate evidence marshaling and sustain the creation and validation of inference models.
Assessment of community-submitted ontology annotations from a novel database-journal partnership.

Science.gov (United States)

Berardini, Tanya Z; Li, Donghui; Muller, Robert; Chetty, Raymond; Ploetz, Larry; Singh, Shanker; Wensel, April; Huala, Eva

2012-01-01

As the scientific literature grows, leading to an increasing volume of published experimental data, so does the need to access and analyze this data using computational tools. The most commonly used method to convert published experimental data on gene function into controlled vocabulary annotations relies on a professional curator, employed by a model organism database or a more general resource such as UniProt, to read published articles and compose annotation statements based on the articles' contents. A more cost-effective and scalable approach capable of capturing gene function data across the whole range of biological research organisms in computable form is urgently needed. We have analyzed a set of ontology annotations generated through collaborations between the Arabidopsis Information Resource and several plant science journals. Analysis of the submissions entered using the online submission tool shows that most community annotations were well supported and the ontology terms chosen were at an appropriate level of specificity. Of the 503 individual annotations that were submitted, 97% were approved and community submissions captured 72% of all possible annotations. This new method for capturing experimental results in a computable form provides a cost-effective way to greatly increase the available body of annotations without sacrificing annotation quality. Database URL: www.arabidopsis.org.
Comparison of concept recognizers for building the Open Biomedical Annotator

Directory of Open Access Journals (Sweden)

Rubin Daniel

2009-09-01

Full Text Available Abstract The National Center for Biomedical Ontology (NCBO is developing a system for automated, ontology-based access to online biomedical resources (Shah NH, et al.: Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinformatics 2009, 10(Suppl 2:S1. The system's indexing workflow processes the text metadata of diverse resources such as datasets from GEO and ArrayExpress to annotate and index them with concepts from appropriate ontologies. This indexing requires the use of a concept-recognition tool to identify ontology concepts in the resource's textual metadata. In this paper, we present a comparison of two concept recognizers – NLM's MetaMap and the University of Michigan's Mgrep. We utilize a number of data sources and dictionaries to evaluate the concept recognizers in terms of precision, recall, speed of execution, scalability and customizability. Our evaluations demonstrate that Mgrep has a clear edge over MetaMap for large-scale service oriented applications. Based on our analysis we also suggest areas of potential improvements for Mgrep. We have subsequently used Mgrep to build the Open Biomedical Annotator service. The Annotator service has access to a large dictionary of biomedical terms derived from the United Medical Language System (UMLS and NCBO ontologies. The Annotator also leverages the hierarchical structure of the ontologies and their mappings to expand annotations. The Annotator service is available to the community as a REST Web service for creating ontology-based annotations of their data.
Multifunctional landscape practice and accessibility in manorial landscapes

DEFF Research Database (Denmark)

Brandt, Jesper; Svenningsen, Stig Roar; Christensen, Andreas Aagaard

. However classical manorial estates seems to represent an opposite trend. Allthough working at the same market conditions as other large specialized holdings developed through the process of structural rationalization, they have often maintained and elaborated a land use strategy based on a multifunctional...... use of the potential ecosystem services present within their domain. The targeted combination of agriculture, forestry, hunting rents, rental housing, and a variety of recreational activities influences makes a certain public accessibility to an integrated part of this strategy, diverging from...... the multifunctional landscape strategy supporting a certain public access. A study of this thesis is presented based on an analysis of multifunctionality, landscape development and accessibility in Danish Manorial landscapes and eventual linkages between their multifunctional landscape strategy, their history...
Topological data analysis of financial time series: Landscapes of crashes

Science.gov (United States)

Gidea, Marian; Katz, Yuri

2018-02-01

We explore the evolution of daily returns of four major US stock market indices during the technology crash of 2000, and the financial crisis of 2007-2009. Our methodology is based on topological data analysis (TDA). We use persistence homology to detect and quantify topological patterns that appear in multidimensional time series. Using a sliding window, we extract time-dependent point cloud data sets, to which we associate a topological space. We detect transient loops that appear in this space, and we measure their persistence. This is encoded in real-valued functions referred to as a 'persistence landscapes'. We quantify the temporal changes in persistence landscapes via their Lp-norms. We test this procedure on multidimensional time series generated by various non-linear and non-equilibrium models. We find that, in the vicinity of financial meltdowns, the Lp-norms exhibit strong growth prior to the primary peak, which ascends during a crash. Remarkably, the average spectral density at low frequencies of the time series of Lp-norms of the persistence landscapes demonstrates a strong rising trend for 250 trading days prior to either dotcom crash on 03/10/2000, or to the Lehman bankruptcy on 09/15/2008. Our study suggests that TDA provides a new type of econometric analysis, which complements the standard statistical measures. The method can be used to detect early warning signals of imminent market crashes. We believe that this approach can be used beyond the analysis of financial time series presented here.
Concept annotation in the CRAFT corpus.

Science.gov (United States)

Bada, Michael; Eckert, Miriam; Evans, Donald; Garcia, Kristin; Shipley, Krista; Sitnikov, Dmitry; Baumgartner, William A; Cohen, K Bretonnel; Verspoor, Karin; Blake, Judith A; Hunter, Lawrence E

2012-07-09

Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text. This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP) community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released). Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement. As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens), our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection), the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are freely available at http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml.
A Flexible Object-of-Interest Annotation Framework for Online Video Portals

Directory of Open Access Journals (Sweden)

Robert Sorschag

2012-02-01

Full Text Available In this work, we address the use of object recognition techniques to annotate what is shown where in online video collections. These annotations are suitable to retrieve specific video scenes for object related text queries which is not possible with the manually generated metadata that is used by current portals. We are not the first to present object annotations that are generated with content-based analysis methods. However, the proposed framework possesses some outstanding features that offer good prospects for its application in real video portals. Firstly, it can be easily used as background module in any video environment. Secondly, it is not based on a fixed analysis chain but on an extensive recognition infrastructure that can be used with all kinds of visual features, matching and machine learning techniques. New recognition approaches can be integrated into this infrastructure with low development costs and a configuration of the used recognition approaches can be performed even on a running system. Thus, this framework might also benefit from future advances in computer vision. Thirdly, we present an automatic selection approach to support the use of different recognition strategies for different objects. Last but not least, visual analysis can be performed efficiently on distributed, multi-processor environments and a database schema is presented to store the resulting video annotations as well as the off-line generated low-level features in a compact form. We achieve promising results in an annotation case study and the instance search task of the TRECVID 2011 challenge.
Making web annotations persistent over time

Energy Technology Data Exchange (ETDEWEB)

Sanderson, Robert [Los Alamos National Laboratory; Van De Sompel, Herbert [Los Alamos National Laboratory

2010-01-01

As Digital Libraries (DL) become more aligned with the web architecture, their functional components need to be fundamentally rethought in terms of URIs and HTTP. Annotation, a core scholarly activity enabled by many DL solutions, exhibits a clearly unacceptable characteristic when existing models are applied to the web: due to the representations of web resources changing over time, an annotation made about a web resource today may no longer be relevant to the representation that is served from that same resource tomorrow. We assume the existence of archived versions of resources, and combine the temporal features of the emerging Open Annotation data model with the capability offered by the Memento framework that allows seamless navigation from the URI of a resource to archived versions of that resource, and arrive at a solution that provides guarantees regarding the persistence of web annotations over time. More specifically, we provide theoretical solutions and proof-of-concept experimental evaluations for two problems: reconstructing an existing annotation so that the correct archived version is displayed for all resources involved in the annotation, and retrieving all annotations that involve a given archived version of a web resource.
Use of Annotations for Component and Framework Interoperability

Science.gov (United States)

David, O.; Lloyd, W.; Carlson, J.; Leavesley, G. H.; Geter, F.

2009-12-01

western United States at the USDA NRCS National Water and Climate Center. PRMS is a component based modular precipitation-runoff model developed to evaluate the impacts of various combinations of precipitation, climate, and land use on streamflow and general basin hydrology. The new OMS 3.0 PRMS model source code is more concise and flexible as a result of using the new framework’s annotation based approach. The fully annotated components are now providing information directly for (i) model assembly and building, (ii) dataflow analysis for implicit multithreading, (iii) automated and comprehensive model documentation of component dependencies, physical data properties, (iv) automated model and component testing, and (v) automated audit-traceability to account for all model resources leading to a particular simulation result. Experience to date has demonstrated the multi-purpose value of using annotations. Annotations are also a feasible and practical method to enable interoperability among models and modeling frameworks. As a prototype example, model code annotations were used to generate binding and mediation code to allow the use of OMS 3.0 model components within the OpenMI context.
Cross-Population Joint Analysis of eQTLs: Fine Mapping and Functional Annotation

Science.gov (United States)

Wen, Xiaoquan; Luca, Francesca; Pique-Regi, Roger

2015-01-01

Mapping expression quantitative trait loci (eQTLs) has been shown as a powerful tool to uncover the genetic underpinnings of many complex traits at molecular level. In this paper, we present an integrative analysis approach that leverages eQTL data collected from multiple population groups. In particular, our approach effectively identifies multiple independent cis-eQTL signals that are consistent across populations, accounting for population heterogeneity in allele frequencies and linkage disequilibrium patterns. Furthermore, by integrating genomic annotations, our analysis framework enables high-resolution functional analysis of eQTLs. We applied our statistical approach to analyze the GEUVADIS data consisting of samples from five population groups. From this analysis, we concluded that i) jointly analysis across population groups greatly improves the power of eQTL discovery and the resolution of fine mapping of causal eQTL ii) many genes harbor multiple independent eQTLs in their cis regions iii) genetic variants that disrupt transcription factor binding are significantly enriched in eQTLs (p-value = 4.93 × 10-22). PMID:25906321
Annotating the Function of the Human Genome with Gene Ontology and Disease Ontology.

Science.gov (United States)

Hu, Yang; Zhou, Wenyang; Ren, Jun; Dong, Lixiang; Wang, Yadong; Jin, Shuilin; Cheng, Liang

2016-01-01

Increasing evidences indicated that function annotation of human genome in molecular level and phenotype level is very important for systematic analysis of genes. In this study, we presented a framework named Gene2Function to annotate Gene Reference into Functions (GeneRIFs), in which each functional description of GeneRIFs could be annotated by a text mining tool Open Biomedical Annotator (OBA), and each Entrez gene could be mapped to Human Genome Organisation Gene Nomenclature Committee (HGNC) gene symbol. After annotating all the records about human genes of GeneRIFs, 288,869 associations between 13,148 mRNAs and 7,182 terms, 9,496 associations between 948 microRNAs and 533 terms, and 901 associations between 139 long noncoding RNAs (lncRNAs) and 297 terms were obtained as a comprehensive annotation resource of human genome. High consistency of term frequency of individual gene (Pearson correlation = 0.6401, p = 2.2e - 16) and gene frequency of individual term (Pearson correlation = 0.1298, p = 3.686e - 14) in GeneRIFs and GOA shows our annotation resource is very reliable.
Intra-species sequence comparisons for annotating genomes

Energy Technology Data Exchange (ETDEWEB)

Boffelli, Dario; Weer, Claire V.; Weng, Li; Lewis, Keith D.; Shoukry, Malak I.; Pachter, Lior; Keys, David N.; Rubin, Edward M.

2004-07-15

Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its high rate of allelic polymorphism and ease of genetic manipulability, we chose the sea squirt, Ciona intestinalis, to explore intra-species sequence comparisons for genome annotation. A large number of C. intestinalis specimens were collected from four continents and a set of genomic intervals amplified, resequenced and analyzed to determine the mutation rates at each nucleotide in the sequence. We found that regions with low mutation rates efficiently demarcated functionally constrained sequences: these include a set of noncoding elements, which we showed in C intestinalis transgenic assays to act as tissue-specific enhancers, as well as the location of coding sequences. This illustrates that comparisons of multiple members of a species can be used for genome annotation, suggesting a path for the annotation of the sequenced genomes of organisms occupying uncharacterized phylogenetic branches of the animal kingdom and raises the possibility that the resequencing of a large number of Homo sapiens individuals might be used to annotate the human genome and identify sequences defining traits unique to our species. The sequence data from this study has been submitted to GenBank under accession nos. AY667278-AY667407.
Using RNA-seq to determine the transcriptional landscape and the hypoxic response of the pathogenic yeast Candida parapsilosis

LENUS (Irish Health Repository)

Guida, Alessandro

2011-12-22

Abstract Background Candida parapsilosis is one of the most common causes of Candida infection worldwide. However, the genome sequence annotation was made without experimental validation and little is known about the transcriptional landscape. The transcriptional response of C. parapsilosis to hypoxic (low oxygen) conditions, such as those encountered in the host, is also relatively unexplored. Results We used next generation sequencing (RNA-seq) to determine the transcriptional profile of C. parapsilosis growing in several conditions including different media, temperatures and oxygen concentrations. We identified 395 novel protein-coding sequences that had not previously been annotated. We removed > 300 unsupported gene models, and corrected approximately 900. We mapped the 5\\' and 3\\' UTR for thousands of genes. We also identified 422 introns, including two introns in the 3\\' UTR of one gene. This is the first report of 3\\' UTR introns in the Saccharomycotina. Comparing the introns in coding sequences with other species shows that small numbers have been gained and lost throughout evolution. Our analysis also identified a number of novel transcriptional active regions (nTARs). We used both RNA-seq and microarray analysis to determine the transcriptional profile of cells grown in normoxic and hypoxic conditions in rich media, and we showed that there was a high correlation between the approaches. We also generated a knockout of the UPC2 transcriptional regulator, and we found that similar to C. albicans, Upc2 is required for conferring resistance to azole drugs, and for regulation of expression of the ergosterol pathway in hypoxia. Conclusion We provide the first detailed annotation of the C. parapsilosis genome, based on gene predictions and transcriptional analysis. We identified a number of novel ORFs and other transcribed regions, and detected transcripts from approximately 90% of the annotated protein coding genes. We found that the transcription factor
Contributions to In Silico Genome Annotation

KAUST Repository

Kalkatawi, Manal M.

2017-11-30

Genome annotation is an important topic since it provides information for the foundation of downstream genomic and biological research. It is considered as a way of summarizing part of existing knowledge about the genomic characteristics of an organism. Annotating different regions of a genome sequence is known as structural annotation, while identifying functions of these regions is considered as a functional annotation. In silico approaches can facilitate both tasks that otherwise would be difficult and timeconsuming. This study contributes to genome annotation by introducing several novel bioinformatics methods, some based on machine learning (ML) approaches. First, we present Dragon PolyA Spotter (DPS), a method for accurate identification of the polyadenylation signals (PAS) within human genomic DNA sequences. For this, we derived a novel feature-set able to characterize properties of the genomic region surrounding the PAS, enabling development of high accuracy optimized ML predictive models. DPS considerably outperformed the state-of-the-art results. The second contribution concerns developing generic models for structural annotation, i.e., the recognition of different genomic signals and regions (GSR) within eukaryotic DNA. We developed DeepGSR, a systematic framework that facilitates generating ML models to predict GSR with high accuracy. To the best of our knowledge, no available generic and automated method exists for such task that could facilitate the studies of newly sequenced organisms. The prediction module of DeepGSR uses deep learning algorithms to derive highly abstract features that depend mainly on proper data representation and hyperparameters calibration. DeepGSR, which was evaluated on recognition of PAS and translation initiation sites (TIS) in different organisms, yields a simpler and more precise representation of the problem under study, compared to some other hand-tailored models, while producing high accuracy prediction results. Finally
Model and Interoperability using Meta Data Annotations

Science.gov (United States)

David, O.

2011-12-01

Software frameworks and architectures are in need for meta data to efficiently support model integration. Modelers have to know the context of a model, often stepping into modeling semantics and auxiliary information usually not provided in a concise structure and universal format, consumable by a range of (modeling) tools. XML often seems the obvious solution for capturing meta data, but its wide adoption to facilitate model interoperability is limited by XML schema fragmentation, complexity, and verbosity outside of a data-automation process. Ontologies seem to overcome those shortcomings, however the practical significance of their use remains to be demonstrated. OMS version 3 took a different approach for meta data representation. The fundamental building block of a modular model in OMS is a software component representing a single physical process, calibration method, or data access approach. Here, programing language features known as Annotations or Attributes were adopted. Within other (non-modeling) frameworks it has been observed that annotations lead to cleaner and leaner application code. Framework-supported model integration, traditionally accomplished using Application Programming Interfaces (API) calls is now achieved using descriptive code annotations. Fully annotated components for various hydrological and Ag-system models now provide information directly for (i) model assembly and building, (ii) data flow analysis for implicit multi-threading or visualization, (iii) automated and comprehensive model documentation of component dependencies, physical data properties, (iv) automated model and component testing, calibration, and optimization, and (v) automated audit-traceability to account for all model resources leading to a particular simulation result. Such a non-invasive methodology leads to models and modeling components with only minimal dependencies on the modeling framework but a strong reference to its originating code. Since models and
Active learning reduces annotation time for clinical concept extraction.

Science.gov (United States)

Kholghi, Mahnoosh; Sitbon, Laurianne; Zuccon, Guido; Nguyen, Anthony

2017-10-01

To investigate: (1) the annotation time savings by various active learning query strategies compared to supervised learning and a random sampling baseline, and (2) the benefits of active learning-assisted pre-annotations in accelerating the manual annotation process compared to de novo annotation. There are 73 and 120 discharge summary reports provided by Beth Israel institute in the train and test sets of the concept extraction task in the i2b2/VA 2010 challenge, respectively. The 73 reports were used in user study experiments for manual annotation. First, all sequences within the 73 reports were manually annotated from scratch. Next, active learning models were built to generate pre-annotations for the sequences selected by a query strategy. The annotation/reviewing time per sequence was recorded. The 120 test reports were used to measure the effectiveness of the active learning models. When annotating from scratch, active learning reduced the annotation time up to 35% and 28% compared to a fully supervised approach and a random sampling baseline, respectively. Reviewing active learning-assisted pre-annotations resulted in 20% further reduction of the annotation time when compared to de novo annotation. The number of concepts that require manual annotation is a good indicator of the annotation time for various active learning approaches as demonstrated by high correlation between time rate and concept annotation rate. Active learning has a key role in reducing the time required to manually annotate domain concepts from clinical free text, either when annotating from scratch or reviewing active learning-assisted pre-annotations. Copyright © 2017 Elsevier B.V. All rights reserved.
Rich RNA Structure Landscapes Revealed by Mutate-and-Map Analysis.

Directory of Open Access Journals (Sweden)

Pablo Cordero

2015-11-01

Full Text Available Landscapes exhibiting multiple secondary structures arise in natural RNA molecules that modulate gene expression, protein synthesis, and viral infection [corrected]. We report herein that high-throughput chemical experiments can isolate an RNA's multiple alternative secondary structures as they are stabilized by systematic mutagenesis (mutate-and-map, M2 and that a computational algorithm, REEFFIT, enables unbiased reconstruction of these states' structures and populations. In an in silico benchmark on non-coding RNAs with complex landscapes, M2-REEFFIT recovers 95% of RNA helices present with at least 25% population while maintaining a low false discovery rate (10% and conservative error estimates. In experimental benchmarks, M2-REEFFIT recovers the structure landscapes of a 35-nt MedLoop hairpin, a 110-nt 16S rRNA four-way junction with an excited state, a 25-nt bistable hairpin, and a 112-nt three-state adenine riboswitch with its expression platform, molecules whose characterization previously required expert mutational analysis and specialized NMR or chemical mapping experiments. With this validation, M2-REEFFIT enabled tests of whether artificial RNA sequences might exhibit complex landscapes in the absence of explicit design. An artificial flavin mononucleotide riboswitch and a randomly generated RNA sequence are found to interconvert between three or more states, including structures for which there was no design, but that could be stabilized through mutations. These results highlight the likely pervasiveness of rich landscapes with multiple secondary structures in both natural and artificial RNAs and demonstrate an automated chemical/computational route for their empirical characterization.
Analysis and visualisation of lake disappearance process in Iława Lakeland Landscape Park

Directory of Open Access Journals (Sweden)

Marynowska Weronika Cecylia

2018-03-01

Full Text Available Lake disappearance as a natural stage of evolution of lakes is an extremely important issue in the landscape and ecosystem research context. Studies of the changes that occur in the lake landscape, characteristic in the northern part of Poland, are aimed at defining the cause and forecasting the results. The possibilities of the Geographic Information Systems (GIS were used in this paper to analyse and visualise the process of lake disappearance in the Iława Lakeland Landscape Park. GIS technologies which primarily are used for gathering, storing, processing and presenting spatial data have been used to interpret changes in lakes coverage over a period of 100 years. The analysis were based on data bases and cartographic former maps such as hydrographic maps, attribute data and bathymetric plans. The data was gathered by transfer from different geoportals, next vectorised, and then preprocessed. Former maps were rectified. The lake disappearance process was presented in several forms: lake cards, animations and interactive map. Basing on the GIS analysis about lake disappearance in Iława Lakeland Landscape Park it was possible to state that lakes are disappearing in rate of 3.99 ha a−1.
Integrating landscape analysis and planning: a multi-scale approach for oriented management of tourist recreation.

Science.gov (United States)

de Aranzabal, Itziar; Schmitz, María F; Pineda, Francisco D

2009-11-01

Tourism and landscape are interdependent concepts. Nature- and culture-based tourism are now quite well developed activities and can constitute an excellent way of exploiting the natural resources of certain areas, and should therefore be considered as key objectives in landscape planning and management in a growing number of countries. All of this calls for careful evaluation of the effects of tourism on the territory. This article focuses on an integrated spatial method for landscape analysis aimed at quantifying the relationship between preferences of visitors and landscape features. The spatial expression of the model relating types of leisure and recreational preferences to the potential capacity of the landscape to meet them involves a set of maps showing degrees of potential visitor satisfaction. The method constitutes a useful tool for the design of tourism planning and management strategies, with landscape conservation as a reference.
Integrating Landscape Analysis and Planning: A Multi-Scale Approach for Oriented Management of Tourist Recreation

Science.gov (United States)

de Aranzabal, Itziar; Schmitz, María F.; Pineda, Francisco D.

2009-11-01

Tourism and landscape are interdependent concepts. Nature- and culture-based tourism are now quite well developed activities and can constitute an excellent way of exploiting the natural resources of certain areas, and should therefore be considered as key objectives in landscape planning and management in a growing number of countries. All of this calls for careful evaluation of the effects of tourism on the territory. This article focuses on an integrated spatial method for landscape analysis aimed at quantifying the relationship between preferences of visitors and landscape features. The spatial expression of the model relating types of leisure and recreational preferences to the potential capacity of the landscape to meet them involves a set of maps showing degrees of potential visitor satisfaction. The method constitutes a useful tool for the design of tourism planning and management strategies, with landscape conservation as a reference.

Feedback Driven Annotation and Refactoring of Parallel Programs

DEFF Research Database (Denmark)

Larsen, Per

and communication in embedded programs. Runtime checks are developed to ensure that annotations correctly describe observable program behavior. The performance impact of runtime checking is evaluated on several benchmark kernels and is negligible in all cases. The second aspect is compilation feedback. Annotations...... are not effective unless programmers are told how and when they are benecial. A prototype compilation feedback system was developed in collaboration with IBM Haifa Research Labs. It reports issues that prevent further analysis to the programmer. Performance evaluation shows that three programs performes signicantly......This thesis combines programmer knowledge and feedback to improve modeling and optimization of software. The research is motivated by two observations. First, there is a great need for automatic analysis of software for embedded systems - to expose and model parallelism inherent in programs. Second...
Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database.

Science.gov (United States)

Winsor, Geoffrey L; Griffiths, Emma J; Lo, Raymond; Dhillon, Bhavjinder K; Shay, Julie A; Brinkman, Fiona S L

2016-01-04

The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Trends in landscape research and landscape planning : implications for PhD students

NARCIS (Netherlands)

Tress, G.; Tress, B.; Fry, G.; Antrop, M.

2005-01-01

This chapter introduces the contents of the book through an analysis of current trends in landscape research and landscape planning and a discussion of the consequences of these trends for PhD students.
RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes

Energy Technology Data Exchange (ETDEWEB)

Brettin, Thomas; Davis, James J.; Disz, Terry; Edwards, Robert A.; Gerdes, Svetlana; Olsen, Gary J.; Olson, Robert; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D.; Shukla, Maulik; Thomason, James A.; Stevens, Rick; Vonstein, Veronika; Wattam, Alice R.; Xia, Fangfang

2015-02-10

The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.
RASTtk: a modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes.

Science.gov (United States)

Brettin, Thomas; Davis, James J; Disz, Terry; Edwards, Robert A; Gerdes, Svetlana; Olsen, Gary J; Olson, Robert; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D; Shukla, Maulik; Thomason, James A; Stevens, Rick; Vonstein, Veronika; Wattam, Alice R; Xia, Fangfang

2015-02-10

The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.
Graph-based sequence annotation using a data integration approach

Directory of Open Access Journals (Sweden)

Pesch Robert

2008-06-01

Full Text Available The automated annotation of data from high throughput sequencing and genomics experiments is a significant challenge for bioinformatics. Most current approaches rely on sequential pipelines of gene finding and gene function prediction methods that annotate a gene with information from different reference data sources. Each function prediction method contributes evidence supporting a functional assignment. Such approaches generally ignore the links between the information in the reference datasets. These links, however, are valuable for assessing the plausibility of a function assignment and can be used to evaluate the confidence in a prediction. We are working towards a novel annotation system that uses the network of information supporting the function assignment to enrich the annotation process for use by expert curators and predicting the function of previously unannotated genes. In this paper we describe our success in the first stages of this development. We present the data integration steps that are needed to create the core database of integrated reference databases (UniProt, PFAM, PDB, GO and the pathway database Ara- Cyc which has been established in the ONDEX data integration system. We also present a comparison between different methods for integration of GO terms as part of the function assignment pipeline and discuss the consequences of this analysis for improving the accuracy of gene function annotation.
A topic modeling approach for web service annotation

Directory of Open Access Journals (Sweden)

Leandro Ordóñez-Ante

2014-06-01

Full Text Available The actual implementation of semantic-based mechanisms for service retrieval has been restricted, given the resource-intensive procedure involved in the formal specification of services, which generally comprises associating semantic annotations to their documentation sources. Typically, developer performs such a procedure by hand, requiring specialized knowledge on models for semantic description of services (e.g. OWL-S, WSMO, SAWSDL, as well as formal specifications of knowledge. Thus, this semantic-based service description procedure turns out to be a cumbersome and error-prone task. This paper introduces a proposal for service annotation, based on processing web service documentation for extracting information regarding its offered capabilities. By uncovering the hidden semantic structure of such information through statistical analysis techniques, we are able to associate meaningful annotations to the services operations/resources, while grouping those operations into non-exclusive semantic related categories. This research paper belongs to the TelComp 2.0 project, which Colciencas and University of Cauca founded in cooperation.
An Atlas of annotations of Hydra vulgaris transcriptome.

Science.gov (United States)

Evangelista, Daniela; Tripathi, Kumar Parijat; Guarracino, Mario Rosario

2016-09-22

RNA sequencing takes advantage of the Next Generation Sequencing (NGS) technologies for analyzing RNA transcript counts with an excellent accuracy. Trying to interpret this huge amount of data in biological information is still a key issue, reason for which the creation of web-resources useful for their analysis is highly desiderable. Starting from a previous work, Transcriptator, we present the Atlas of Hydra's vulgaris, an extensible web tool in which its complete transcriptome is annotated. In order to provide to the users an advantageous resource that include the whole functional annotated transcriptome of Hydra vulgaris water polyp, we implemented the Atlas web-tool contains 31.988 accesible and downloadable transcripts of this non-reference model organism. Atlas, as a freely available resource, can be considered a valuable tool to rapidly retrieve functional annotation for transcripts differentially expressed in Hydra vulgaris exposed to the distinct experimental treatments. WEB RESOURCE URL: http://www-labgtp.na.icar.cnr.it/Atlas .
Geomorpho-Landscapes

Science.gov (United States)

Farabollini, Piero; Lugeri, Francesca; Amadio, Vittorio

2014-05-01

Landscape is the object of human perceptions, being the image of spatial organization of elements and structures: mankind lives the first approach with the environment, viewing and feeling the landscape. Many definitions of landscape have been given over time: in this case we refer to the Landscape defined as the result of interaction among physical, biotic and anthropic phenomena acting in a different spatial-temporal scale (Foreman & Godron) Following an Aristotelic approach in studying nature, we can assert that " Shape is synthesis": so it is possible to read the land features as the expression of the endogenous and exogenous processes that mould earth surfaces; moreover, Landscape is the result of the interaction of natural and cultural components, and conditions the spatial-temporal development of a region. The study of the Landscape offers results useful in order to promote sustainable development, ecotourism, enhancement of natural and cultural heritage, popularization of the scientific knowledge. In Italy, a very important GIS-based tool to represent the territory is the "Carta della Natura" ("Map of Nature", presently coordinated by the ISPRA) that aims at assessing the state of the whole Italian territory, analyzing Landscape. The methodology follows a holistic approach, taking into consideration all the components of a landscape and then integrating the information. Each individual landscape, studied at different scales, shows distinctive elements: structural, which depend on physical form and specific spatial organization; functional, which depend on relationships created between biotic and abiotic elements, and dynamic, which depend on the successive evolution of the structure. The identification of the landscape units, recognized at different scales of analysis, allows an evaluation of the state of the land, referring to the dual risk/resource which characterizes the Italian country. An interesting opportunity is to discover those areas of unusual
The potential of landscape labelling approaches for integrated landscape management in Europe

DEFF Research Database (Denmark)

Mann, Carsten; Plieninger, Tobias

2017-01-01

This paper combines conceptual thinking and empirical analysis of landscape labelling as a new governance approach. With the help of a literature review and qualitative interviews, we (1) explore the conceptual orientation of landscape labelling, (2) analyse existing approaches in Europe and (3......) elaborate its potential for integrated landscape management on a regional scale. Governance analysis to identify fostering and hindering factors is carried out for regional brands in biosphere reserves in Germany, geographic indication in Spain, organic agriculture in France and a community forest...... approach within policy mixes that depend on supportive governance structures and stakeholders....
Landscape Ecology

DEFF Research Database (Denmark)

Christensen, Andreas Aagaard; Brandt, Jesper; Svenningsen, Stig Roar

2017-01-01

, and the ecological significance of the patterns which are generated by such processes. In landscape ecology, perspectives drawn from existing academic disciplines are integrated based on a common, spatially explicit mode of analysis developed from classical holistic geography, emphasizing spatial and landscape...... to translate positivist readings of the environment and hermeneutical perspectives on socioecological interaction into a common framework or terminology....
HBVRegDB: Annotation, comparison, detection and visualization of regulatory elements in hepatitis B virus sequences

Directory of Open Access Journals (Sweden)

Firth Andrew E

2007-12-01

Full Text Available Abstract Background The many Hepadnaviridae sequences available have widely varied functional annotation. The genomes are very compact (~3.2 kb but contain multiple layers of functional regulatory elements in addition to coding regions. Key regions are subject to purifying selection, as mutations in these regions will produce non-functional viruses. Results These genomic sequences have been organized into a structured database to facilitate research at the molecular level. HBVRegDB is a comparative genomic analysis tool with an integrated underlying sequence database. The database contains genomic sequence data from representative viruses. In addition to INSDC and RefSeq annotation, HBVRegDB also contains expert and systematically calculated annotations (e.g. promoters and comparative genome analysis results (e.g. blastn, tblastx. It also contains analyses based on curated HBV alignments. Information about conserved regions – including primary conservation (e.g. CDS-Plotcon and RNA secondary structure predictions (e.g. Alidot – is integrated into the database. A large amount of data is graphically presented using the GBrowse (Generic Genome Browser adapted for analysis of viral genomes. Flexible query access is provided based on any annotated genomic feature. Novel regulatory motifs can be found by analysing the annotated sequences. Conclusion HBVRegDB serves as a knowledge database and as a comparative genomic analysis tool for molecular biologists investigating HBV. It is publicly available and complementary to other viral and HBV focused datasets and tools http://hbvregdb.otago.ac.nz. The availability of multiple and highly annotated sequences of viral genomes in one database combined with comparative analysis tools facilitates detection of novel genomic elements.
Computer systems for annotation of single molecule fragments

Science.gov (United States)

Schwartz, David Charles; Severin, Jessica

2016-07-19

There are provided computer systems for visualizing and annotating single molecule images. Annotation systems in accordance with this disclosure allow a user to mark and annotate single molecules of interest and their restriction enzyme cut sites thereby determining the restriction fragments of single nucleic acid molecules. The markings and annotations may be automatically generated by the system in certain embodiments and they may be overlaid translucently onto the single molecule images. An image caching system may be implemented in the computer annotation systems to reduce image processing time. The annotation systems include one or more connectors connecting to one or more databases capable of storing single molecule data as well as other biomedical data. Such diverse array of data can be retrieved and used to validate the markings and annotations. The annotation systems may be implemented and deployed over a computer network. They may be ergonomically optimized to facilitate user interactions.
The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4).

Science.gov (United States)

Huntemann, Marcel; Ivanova, Natalia N; Mavromatis, Konstantinos; Tripp, H James; Paez-Espino, David; Palaniappan, Krishnaveni; Szeto, Ernest; Pillay, Manoj; Chen, I-Min A; Pati, Amrita; Nielsen, Torben; Markowitz, Victor M; Kyrpides, Nikos C

2015-01-01

The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. Structural annotation is followed by assignment of protein product names and functions.
Pattern-based compression of multi-band image data for landscape analysis

CERN Document Server

Myers, Wayne L; Patil, Ganapati P

2006-01-01

This book describes an integrated approach to using remotely sensed data in conjunction with geographic information systems for landscape analysis. Remotely sensed data are compressed into an analytical image-map that is compatible with the most popular geographic information systems as well as freeware viewers. The approach is most effective for landscapes that exhibit a pronounced mosaic pattern of land cover. The image maps are much more compact than the original remotely sensed data, which enhances utility on the internet. As value-added products, distribution of image-maps is not affected by copyrights on original multi-band image data.
Image annotation under X Windows

Science.gov (United States)

Pothier, Steven

1991-08-01

A mechanism for attaching graphic and overlay annotation to multiple bits/pixel imagery while providing levels of performance approaching that of native mode graphics systems is presented. This mechanism isolates programming complexity from the application programmer through software encapsulation under the X Window System. It ensures display accuracy throughout operations on the imagery and annotation including zooms, pans, and modifications of the annotation. Trade-offs that affect speed of display, consumption of memory, and system functionality are explored. The use of resource files to tune the display system is discussed. The mechanism makes use of an abstraction consisting of four parts; a graphics overlay, a dithered overlay, an image overly, and a physical display window. Data structures are maintained that retain the distinction between the four parts so that they can be modified independently, providing system flexibility. A unique technique for associating user color preferences with annotation is introduced. An interface that allows interactive modification of the mapping between image value and color is discussed. A procedure that provides for the colorization of imagery on 8-bit display systems using pixel dithering is explained. Finally, the application of annotation mechanisms to various applications is discussed.
Motion lecture annotation system to learn Naginata performances

Science.gov (United States)

Kobayashi, Daisuke; Sakamoto, Ryota; Nomura, Yoshihiko

2013-12-01

This paper describes a learning assistant system using motion capture data and annotation to teach "Naginata-jutsu" (a skill to practice Japanese halberd) performance. There are some video annotation tools such as YouTube. However these video based tools have only single angle of view. Our approach that uses motion-captured data allows us to view any angle. A lecturer can write annotations related to parts of body. We have made a comparison of effectiveness between the annotation tool of YouTube and the proposed system. The experimental result showed that our system triggered more annotations than the annotation tool of YouTube.
JGI Plant Genomics Gene Annotation Pipeline

Energy Technology Data Exchange (ETDEWEB)

Shu, Shengqiang; Rokhsar, Dan; Goodstein, David; Hayes, David; Mitros, Therese

2014-07-14

Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward this aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.
Annotating temporal information in clinical narratives.

Science.gov (United States)

Sun, Weiyi; Rumshisky, Anna; Uzuner, Ozlem

2013-12-01

Temporal information in clinical narratives plays an important role in patients' diagnosis, treatment and prognosis. In order to represent narrative information accurately, medical natural language processing (MLP) systems need to correctly identify and interpret temporal information. To promote research in this area, the Informatics for Integrating Biology and the Bedside (i2b2) project developed a temporally annotated corpus of clinical narratives. This corpus contains 310 de-identified discharge summaries, with annotations of clinical events, temporal expressions and temporal relations. This paper describes the process followed for the development of this corpus and discusses annotation guideline development, annotation methodology, and corpus quality. Copyright © 2013 Elsevier Inc. All rights reserved.
The Sigiriya Royal Gardens. Analysis of the Landscape Architectonic Composition

Directory of Open Access Journals (Sweden)

Jude Nilan Cooray

2017-11-01

Full Text Available Besides the efforts that are of descriptive and celebrative nature, studies related to Sri Lanka’s historical built heritage are largely to view material remains in historical, sociological, socio-historical and semiological perspectives. But there is hardly any serious attempt to view such material remains from a technical-analytical approach to understand the compositional aspects of their designs. The 5th century AC royal complex at Sigiriya is no exception in this regard. The enormous wealth of information and the unearthed material remains during more than hundred years of field-based research by several generations of archaeologists at Sigiriya provide ideal opportunity for such an analysis. The present study is, therefore, to fill the gap in research related to Sri Lanka’s historical built heritage in general and to Sigiriya in particular. Therefore the present research attempts to read Sigiriya as a landscape architectonic design to expose its architectonic composition and design instruments. The study which is approached from a technical-analytical point of view follows a methodological framework that is developed at the Landscape Design Department of the Faculty of Architecture at Delft University of Technology. The study reveals that the architectonic design of Sigiriya constitutes multiple design layers and multiple layers of significance with material-spatial-metaphorical-functional coherence, and that it has both general and unique landscape architectonic elements, aspects, characteristics and qualities. The richness of its composition also enables to identify the landscape architectural value of the Sigiriya, which will help re-shape the policies related to conservation and presentation of Sigiriya as a heritage site as well as the protection and management as a green monument. The positive results of the study also underline that the methodology adapted in this research has devised a framework for the study of other examples

Putative drug and vaccine target protein identification using comparative genomic analysis of KEGG annotated metabolic pathways of Mycoplasma hyopneumoniae.

Science.gov (United States)

Damte, Dereje; Suh, Joo-Won; Lee, Seung-Jin; Yohannes, Sileshi Belew; Hossain, Md Akil; Park, Seung-Chun

2013-07-01

In the present study, a computational comparative and subtractive genomic/proteomic analysis aimed at the identification of putative therapeutic target and vaccine candidate proteins from Kyoto Encyclopedia of Genes and Genomes (KEGG) annotated metabolic pathways of Mycoplasma hyopneumoniae was performed for drug design and vaccine production pipelines against M.hyopneumoniae. The employed comparative genomic and metabolic pathway analysis with a predefined computational systemic workflow extracted a total of 41 annotated metabolic pathways from KEGG among which five were unique to M. hyopneumoniae. A total of 234 proteins were identified to be involved in these metabolic pathways. Although 125 non homologous and predicted essential proteins were found from the total that could serve as potential drug targets and vaccine candidates, additional prioritizing parameters characterize 21 proteins as vaccine candidate while druggability of each of the identified proteins evaluated by the DrugBank database prioritized 42 proteins suitable for drug targets. Copyright © 2013 Elsevier Inc. All rights reserved.
Facilitating functional annotation of chicken microarray data

Directory of Open Access Journals (Sweden)

Gresham Cathy R

2009-10-01

Full Text Available Abstract Background Modeling results from chicken microarray studies is challenging for researchers due to little functional annotation associated with these arrays. The Affymetrix GenChip chicken genome array, one of the biggest arrays that serve as a key research tool for the study of chicken functional genomics, is among the few arrays that link gene products to Gene Ontology (GO. However the GO annotation data presented by Affymetrix is incomplete, for example, they do not show references linked to manually annotated functions. In addition, there is no tool that facilitates microarray researchers to directly retrieve functional annotations for their datasets from the annotated arrays. This costs researchers amount of time in searching multiple GO databases for functional information. Results We have improved the breadth of functional annotations of the gene products associated with probesets on the Affymetrix chicken genome array by 45% and the quality of annotation by 14%. We have also identified the most significant diseases and disorders, different types of genes, and known drug targets represented on Affymetrix chicken genome array. To facilitate functional annotation of other arrays and microarray experimental datasets we developed an Array GO Mapper (AGOM tool to help researchers to quickly retrieve corresponding functional information for their dataset. Conclusion Results from this study will directly facilitate annotation of other chicken arrays and microarray experimental datasets. Researchers will be able to quickly model their microarray dataset into more reliable biological functional information by using AGOM tool. The disease, disorders, gene types and drug targets revealed in the study will allow researchers to learn more about how genes function in complex biological systems and may lead to new drug discovery and development of therapies. The GO annotation data generated will be available for public use via AgBase website and
Annotating Evidence Based Clinical Guidelines : A Lightweight Ontology

NARCIS (Netherlands)

Hoekstra, R.; de Waard, A.; Vdovjak, R.; Paschke, A.; Burger, A.; Romano, P.; Marshall, M.S.; Splendiani, A.

2012-01-01

This paper describes a lightweight ontology for representing annotations of declarative evidence based clinical guidelines. We present the motivation and requirements for this representation, based on an analysis of several guidelines. The ontology provides the means to connect clinical questions
Graph-based sequence annotation using a data integration approach.

Science.gov (United States)

Pesch, Robert; Lysenko, Artem; Hindle, Matthew; Hassani-Pak, Keywan; Thiele, Ralf; Rawlings, Christopher; Köhler, Jacob; Taubert, Jan

2008-08-25

The automated annotation of data from high throughput sequencing and genomics experiments is a significant challenge for bioinformatics. Most current approaches rely on sequential pipelines of gene finding and gene function prediction methods that annotate a gene with information from different reference data sources. Each function prediction method contributes evidence supporting a functional assignment. Such approaches generally ignore the links between the information in the reference datasets. These links, however, are valuable for assessing the plausibility of a function assignment and can be used to evaluate the confidence in a prediction. We are working towards a novel annotation system that uses the network of information supporting the function assignment to enrich the annotation process for use by expert curators and predicting the function of previously unannotated genes. In this paper we describe our success in the first stages of this development. We present the data integration steps that are needed to create the core database of integrated reference databases (UniProt, PFAM, PDB, GO and the pathway database Ara-Cyc) which has been established in the ONDEX data integration system. We also present a comparison between different methods for integration of GO terms as part of the function assignment pipeline and discuss the consequences of this analysis for improving the accuracy of gene function annotation. The methods and algorithms presented in this publication are an integral part of the ONDEX system which is freely available from http://ondex.sf.net/.
Dictionary-driven protein annotation.

Science.gov (United States)

Rigoutsos, Isidore; Huynh, Tien; Floratos, Aris; Parida, Laxmi; Platt, Daniel

2002-09-01

Computational methods seeking to automatically determine the properties (functional, structural, physicochemical, etc.) of a protein directly from the sequence have long been the focus of numerous research groups. With the advent of advanced sequencing methods and systems, the number of amino acid sequences that are being deposited in the public databases has been increasing steadily. This has in turn generated a renewed demand for automated approaches that can annotate individual sequences and complete genomes quickly, exhaustively and objectively. In this paper, we present one such approach that is centered around and exploits the Bio-Dictionary, a collection of amino acid patterns that completely covers the natural sequence space and can capture functional and structural signals that have been reused during evolution, within and across protein families. Our annotation approach also makes use of a weighted, position-specific scoring scheme that is unaffected by the over-representation of well-conserved proteins and protein fragments in the databases used. For a given query sequence, the method permits one to determine, in a single pass, the following: local and global similarities between the query and any protein already present in a public database; the likeness of the query to all available archaeal/ bacterial/eukaryotic/viral sequences in the database as a function of amino acid position within the query; the character of secondary structure of the query as a function of amino acid position within the query; the cytoplasmic, transmembrane or extracellular behavior of the query; the nature and position of binding domains, active sites, post-translationally modified sites, signal peptides, etc. In terms of performance, the proposed method is exhaustive, objective and allows for the rapid annotation of individual sequences and full genomes. Annotation examples are presented and discussed in Results, including individual queries and complete genomes that were
The visual-landscape analysis during the integration of high-rise buildings within the historic urban environment

Science.gov (United States)

Akristiniy, Vera A.; Dikova, Elena A.

2018-03-01

The article is devoted to one of the types of urban planning studies - the visual-landscape analysis during the integration of high-rise buildings within the historic urban environment for the purposes of providing pre-design and design studies in terms of preserving the historical urban environment and the implementation of the reconstructional resource of the area. In the article formed and systematized the stages and methods of conducting the visual-landscape analysis taking into account the influence of high-rise buildings on objects of cultural heritage and valuable historical buildings of the city. Practical application of the visual-landscape analysis provides an opportunity to assess the influence of hypothetical location of high-rise buildings on the perception of a historically developed environment and optimal building parameters. The contents of the main stages in the conduct of the visual - landscape analysis and their key aspects, concerning the construction of predicted zones of visibility of the significant historically valuable urban development objects and hypothetically planned of the high-rise buildings are revealed. The obtained data are oriented to the successive development of the planning and typological structure of the city territory and preservation of the compositional influence of valuable fragments of the historical environment in the structure of the urban landscape. On their basis, an information database is formed to determine the permissible urban development parameters of the high-rise buildings for the preservation of the compositional integrity of the urban area.
The visual-landscape analysis during the integration of high-rise buildings within the historic urban environment

Directory of Open Access Journals (Sweden)

Akristiniy Vera A.

2018-01-01

Full Text Available The article is devoted to one of the types of urban planning studies - the visual-landscape analysis during the integration of high-rise buildings within the historic urban environment for the purposes of providing pre-design and design studies in terms of preserving the historical urban environment and the implementation of the reconstructional resource of the area. In the article formed and systematized the stages and methods of conducting the visual-landscape analysis taking into account the influence of high-rise buildings on objects of cultural heritage and valuable historical buildings of the city. Practical application of the visual-landscape analysis provides an opportunity to assess the influence of hypothetical location of high-rise buildings on the perception of a historically developed environment and optimal building parameters. The contents of the main stages in the conduct of the visual - landscape analysis and their key aspects, concerning the construction of predicted zones of visibility of the significant historically valuable urban development objects and hypothetically planned of the high-rise buildings are revealed. The obtained data are oriented to the successive development of the planning and typological structure of the city territory and preservation of the compositional influence of valuable fragments of the historical environment in the structure of the urban landscape. On their basis, an information database is formed to determine the permissible urban development parameters of the high-rise buildings for the preservation of the compositional integrity of the urban area.
ANALYSIS OF LANDSCAPE ELEMENTS THAT AFFECT PROPERTY VALUE BASED ON THE PERCEPTION OF HOUSING RESIDENTS IN SURABAYA

Directory of Open Access Journals (Sweden)

Nita Setiawati Wibisono

2017-07-01

Full Text Available This research is done to determine landscape elements that affect property value based on housing residents’ perception in Surabaya residential areas. The landscape elements that used in this research are natural elements such as vegetation and soil; and man-made element such as garden statue, road pattern, road width and hierarchy, park and plant, artificial lake, and road equipment. Purposive sampling technique is used to represent respondents in the residential areas that provide landscape elements in East and West Surabaya. Data analysis technique uses validity and reliability test, analysis factor, binary logistic regression, and the average of variable test. The result shows that majority of the residents of East Surabaya and West Surabaya residential areas approved that landscape consist of park and plant, road width and hierarchies, and road pattern affect their residential property values. The residents are also willing to contribute about 7.4% of their house price to improve the appearance of the residential landscape.
Microtask crowdsourcing for disease mention annotation in PubMed abstracts.

Science.gov (United States)

Good, Benjamin M; Nanis, Max; Wu, Chunlei; Su, Andrew I

2015-01-01

Identifying concepts and relationships in biomedical text enables knowledge to be applied in computational analyses. Many biological natural language processing (BioNLP) projects attempt to address this challenge, but the state of the art still leaves much room for improvement. Progress in BioNLP research depends on large, annotated corpora for evaluating information extraction systems and training machine learning models. Traditionally, such corpora are created by small numbers of expert annotators often working over extended periods of time. Recent studies have shown that workers on microtask crowdsourcing platforms such as Amazon's Mechanical Turk (AMT) can, in aggregate, generate high-quality annotations of biomedical text. Here, we investigated the use of the AMT in capturing disease mentions in PubMed abstracts. We used the NCBI Disease corpus as a gold standard for refining and benchmarking our crowdsourcing protocol. After several iterations, we arrived at a protocol that reproduced the annotations of the 593 documents in the 'training set' of this gold standard with an overall F measure of 0.872 (precision 0.862, recall 0.883). The output can also be tuned to optimize for precision (max = 0.984 when recall = 0.269) or recall (max = 0.980 when precision = 0.436). Each document was completed by 15 workers, and their annotations were merged based on a simple voting method. In total 145 workers combined to complete all 593 documents in the span of 9 days at a cost of $.066 per abstract per worker. The quality of the annotations, as judged with the F measure, increases with the number of workers assigned to each task; however minimal performance gains were observed beyond 8 workers per task. These results add further evidence that microtask crowdsourcing can be a valuable tool for generating well-annotated corpora in BioNLP. Data produced for this analysis are available at http://figshare.com/articles/Disease_Mention_Annotation_with_Mechanical_Turk/1126402.
The effectiveness of annotated (vs. non-annotated) digital pathology slides as a teaching tool during dermatology and pathology residencies.

Science.gov (United States)

Marsch, Amanda F; Espiritu, Baltazar; Groth, John; Hutchens, Kelli A

2014-06-01

With today's technology, paraffin-embedded, hematoxylin & eosin-stained pathology slides can be scanned to generate high quality virtual slides. Using proprietary software, digital images can also be annotated with arrows, circles and boxes to highlight certain diagnostic features. Previous studies assessing digital microscopy as a teaching tool did not involve the annotation of digital images. The objective of this study was to compare the effectiveness of annotated digital pathology slides versus non-annotated digital pathology slides as a teaching tool during dermatology and pathology residencies. A study group composed of 31 dermatology and pathology residents was asked to complete an online pre-quiz consisting of 20 multiple choice style questions, each associated with a static digital pathology image. After completion, participants were given access to an online tutorial composed of digitally annotated pathology slides and subsequently asked to complete a post-quiz. A control group of 12 residents completed a non-annotated version of the tutorial. Nearly all participants in the study group improved their quiz score, with an average improvement of 17%, versus only 3% (P = 0.005) in the control group. These results support the notion that annotated digital pathology slides are superior to non-annotated slides for the purpose of resident education. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Automatic annotation of head velocity and acceleration in Anvil

DEFF Research Database (Denmark)

Jongejan, Bart

2012-01-01

We describe an automatic face tracker plugin for the ANVIL annotation tool. The face tracker produces data for velocity and for acceleration in two dimensions. We compare the annotations generated by the face tracking algorithm with independently made manual annotations for head movements....... The annotations are a useful supplement to manual annotations and may help human annotators to quickly and reliably determine onset of head movements and to suggest which kind of head movement is taking place....
Mesotext. Framing and exploring annotations

NARCIS (Netherlands)

Boot, P.; Boot, P.; Stronks, E.

2007-01-01

From the introduction: Annotation is an important item on the wish list for digital scholarly tools. It is one of John Unsworth’s primitives of scholarship (Unsworth 2000). Especially in linguistics,a number of tools have been developed that facilitate the creation of annotations to source material
Annotating images by mining image search results

NARCIS (Netherlands)

Wang, X.J.; Zhang, L.; Li, X.; Ma, W.Y.

2008-01-01

Although it has been studied for years by the computer vision and machine learning communities, image annotation is still far from practical. In this paper, we propose a novel attempt at model-free image annotation, which is a data-driven approach that annotates images by mining their search
Automatically annotating web pages using Google Rich Snippets

NARCIS (Netherlands)

Hogenboom, F.P.; Frasincar, F.; Vandic, D.; Meer, van der J.; Boon, F.; Kaymak, U.

2011-01-01

We propose the Automatic Review Recognition and annO- tation of Web pages (ARROW) framework, a framework for Web page review identification and annotation using RDFa Google Rich Snippets. The ARROW framework consists of four steps: hotspot identification, subjectivity analysis, in- formation
Sequence-based feature prediction and annotation of proteins

DEFF Research Database (Denmark)

Juncker, Agnieszka; Jensen, Lars J.; Pierleoni, Andrea

2009-01-01

A recent trend in computational methods for annotation of protein function is that many prediction tools are combined in complex workflows and pipelines to facilitate the analysis of feature combinations, for example, the entire repertoire of kinase-binding motifs in the human proteome....
Annotation of Regular Polysemy

DEFF Research Database (Denmark)

Martinez Alonso, Hector

Regular polysemy has received a lot of attention from the theory of lexical semantics and from computational linguistics. However, there is no consensus on how to represent the sense of underspecified examples at the token level, namely when annotating or disambiguating senses of metonymic words...... and metonymic. We have conducted an analysis in English, Danish and Spanish. Later on, we have tried to replicate the human judgments by means of unsupervised and semi-supervised sense prediction. The automatic sense-prediction systems have been unable to find empiric evidence for the underspecified sense, even...
A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data.

Science.gov (United States)

Lu, Qiongshi; Hu, Yiming; Sun, Jiehuan; Cheng, Yuwei; Cheung, Kei-Hoi; Zhao, Hongyu

2015-05-27

Identifying functional regions in the human genome is a major goal in human genetics. Great efforts have been made to functionally annotate the human genome either through computational predictions, such as genomic conservation, or high-throughput experiments, such as the ENCODE project. These efforts have resulted in a rich collection of functional annotation data of diverse types that need to be jointly analyzed for integrated interpretation and annotation. Here we present GenoCanyon, a whole-genome annotation method that performs unsupervised statistical learning using 22 computational and experimental annotations thereby inferring the functional potential of each position in the human genome. With GenoCanyon, we are able to predict many of the known functional regions. The ability of predicting functional regions as well as its generalizable statistical framework makes GenoCanyon a unique and powerful tool for whole-genome annotation. The GenoCanyon web server is available at http://genocanyon.med.yale.edu.
WormBase: Annotating many nematode genomes.

Science.gov (United States)

Howe, Kevin; Davis, Paul; Paulini, Michael; Tuli, Mary Ann; Williams, Gary; Yook, Karen; Durbin, Richard; Kersey, Paul; Sternberg, Paul W

2012-01-01

WormBase (www.wormbase.org) has been serving the scientific community for over 11 years as the central repository for genomic and genetic information for the soil nematode Caenorhabditis elegans. The resource has evolved from its beginnings as a database housing the genomic sequence and genetic and physical maps of a single species, and now represents the breadth and diversity of nematode research, currently serving genome sequence and annotation for around 20 nematodes. In this article, we focus on WormBase's role of genome sequence annotation, describing how we annotate and integrate data from a growing collection of nematode species and strains. We also review our approaches to sequence curation, and discuss the impact on annotation quality of large functional genomics projects such as modENCODE.
Globalization and Landscape Architecture

Directory of Open Access Journals (Sweden)

Robert R. Hewitt

2014-02-01

Full Text Available The literature review examines globalization and landscape architecture as discourse, samples its various meanings, and proposes methods to identify and contextualize its specific literature. Methodologically, the review surveys published articles and books by leading authors and within the WorldCat.org Database associated with landscape architecture and globalization, analyzing survey results for comprehensive conceptual and co-relational frameworks. Three “higher order” dimensions frame the review’s conceptual organization, facilitating the organization of subordinate/subtopical areas of interest useful for comparative analysis. Comparative analysis of the literature suggests an uneven clustering of discipline-related subject matter across the literature’s “higher order” dimensions, with a much smaller body of literature related to landscape architecture confined primarily to topics associated with the dispersion of global phenomena. A subcomponent of this smaller body of literature is associated with other fields of study, but inferentially related to landscape architecture. The review offers separate references and bibliographies for globalization literature in general and globalization and landscape architecture literature, specifically.
SplicingTypesAnno: annotating and quantifying alternative splicing events for RNA-Seq data.

Science.gov (United States)

Sun, Xiaoyong; Zuo, Fenghua; Ru, Yuanbin; Guo, Jiqiang; Yan, Xiaoyan; Sablok, Gaurav

2015-04-01

Alternative splicing plays a key role in the regulation of the central dogma. Four major types of alternative splicing have been classified as intron retention, exon skipping, alternative 5 splice sites or alternative donor sites, and alternative 3 splice sites or alternative acceptor sites. A few algorithms have been developed to detect splice junctions from RNA-Seq reads. However, there are few tools targeting at the major alternative splicing types at the exon/intron level. This type of analysis may reveal subtle, yet important events of alternative splicing, and thus help gain deeper understanding of the mechanism of alternative splicing. This paper describes a user-friendly R package, extracting, annotating and analyzing alternative splicing types for sequence alignment files from RNA-Seq. SplicingTypesAnno can: (1) provide annotation for major alternative splicing at exon/intron level. By comparing the annotation from GTF/GFF file, it identifies the novel alternative splicing sites; (2) offer a convenient two-level analysis: genome-scale annotation for users with high performance computing environment, and gene-scale annotation for users with personal computers; (3) generate a user-friendly web report and additional BED files for IGV visualization. SplicingTypesAnno is a user-friendly R package for extracting, annotating and analyzing alternative splicing types at exon/intron level for sequence alignment files from RNA-Seq. It is publically available at https://sourceforge.net/projects/splicingtypes/files/ or http://genome.sdau.edu.cn/research/software/SplicingTypesAnno.html. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

Teaching and Learning Communities through Online Annotation

Science.gov (United States)

van der Pluijm, B.

2016-12-01

What do colleagues do with your assigned textbook? What they say or think about the material? Want students to be more engaged in their learning experience? If so, online materials that complement standard lecture format provide new opportunity through managed, online group annotation that leverages the ubiquity of internet access, while personalizing learning. The concept is illustrated with the new online textbook "Processes in Structural Geology and Tectonics", by Ben van der Pluijm and Stephen Marshak, which offers a platform for sharing of experiences, supplementary materials and approaches, including readings, mathematical applications, exercises, challenge questions, quizzes, alternative explanations, and more. The annotation framework used is Hypothes.is, which offers a free, open platform markup environment for annotation of websites and PDF postings. The annotations can be public, grouped or individualized, as desired, including export access and download of annotations. A teacher group, hosted by a moderator/owner, limits access to members of a user group of teachers, so that its members can use, copy or transcribe annotations for their own lesson material. Likewise, an instructor can host a student group that encourages sharing of observations, questions and answers among students and instructor. Also, the instructor can create one or more closed groups that offers study help and hints to students. Options galore, all of which aim to engage students and to promote greater responsibility for their learning experience. Beyond new capacity, the ability to analyze student annotation supports individual learners and their needs. For example, student notes can be analyzed for key phrases and concepts, and identify misunderstandings, omissions and problems. Also, example annotations can be shared to enhance notetaking skills and to help with studying. Lastly, online annotation allows active application to lecture posted slides, supporting real-time notetaking
Displaying Annotations for Digitised Globes

Science.gov (United States)

Gede, Mátyás; Farbinger, Anna

2018-05-01

Thanks to the efforts of the various globe digitising projects, nowadays there are plenty of old globes that can be examined as 3D models on the computer screen. These globes usually contain a lot of interesting details that an average observer would not entirely discover for the first time. The authors developed a website that can display annotations for such digitised globes. These annotations help observers of the globe to discover all the important, interesting details. Annotations consist of a plain text title, a HTML formatted descriptive text and a corresponding polygon and are stored in KML format. The website is powered by the Cesium virtual globe engine.
MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads

DEFF Research Database (Denmark)

Petersen, Thomas Nordahl; Lukjancenko, Oksana; Thomsen, Martin Christen Frølund

2017-01-01

number of false positive species annotations are a problem unless thresholds or post-processing are applied to differentiate between correct and false annotations. MGmapper is a package to process raw next generation sequence data and perform reference based sequence assignment, followed by a post...... pipeline is freely available as a bitbucked package (https://bitbucket.org/genomicepidemiology/mgmapper). A web-version (https://cge.cbs.dtu.dk/services/MGmapper) provides the basic functionality for analysis of small fastq datasets....
THE DIMENSIONS OF COMPOSITION ANNOTATION.

Science.gov (United States)

MCCOLLY, WILLIAM

ENGLISH TEACHER ANNOTATIONS WERE STUDIED TO DETERMINE THE DIMENSIONS AND PROPERTIES OF THE ENTIRE SYSTEM FOR WRITING CORRECTIONS AND CRITICISMS ON COMPOSITIONS. FOUR SETS OF COMPOSITIONS WERE WRITTEN BY STUDENTS IN GRADES 9 THROUGH 13. TYPESCRIPTS OF THE COMPOSITIONS WERE ANNOTATED BY CLASSROOM ENGLISH TEACHERS. THEN, 32 ENGLISH TEACHERS JUDGED…
Evaluation of three automated genome annotations for Halorhabdus utahensis.

Directory of Open Access Journals (Sweden)

Peter Bakke

2009-07-01

Full Text Available Genome annotations are accumulating rapidly and depend heavily on automated annotation systems. Many genome centers offer annotation systems but no one has compared their output in a systematic way to determine accuracy and inherent errors. Errors in the annotations are routinely deposited in databases such as NCBI and used to validate subsequent annotation errors. We submitted the genome sequence of halophilic archaeon Halorhabdus utahensis to be analyzed by three genome annotation services. We have examined the output from each service in a variety of ways in order to compare the methodology and effectiveness of the annotations, as well as to explore the genes, pathways, and physiology of the previously unannotated genome. The annotation services differ considerably in gene calls, features, and ease of use. We had to manually identify the origin of replication and the species-specific consensus ribosome-binding site. Additionally, we conducted laboratory experiments to test H. utahensis growth and enzyme activity. Current annotation practices need to improve in order to more accurately reflect a genome's biological potential. We make specific recommendations that could improve the quality of microbial annotation projects.
Navigating the Interface Between Landscape Genetics and Landscape Genomics

Directory of Open Access Journals (Sweden)

Andrew Storfer

2018-03-01

Full Text Available As next-generation sequencing data become increasingly available for non-model organisms, a shift has occurred in the focus of studies of the geographic distribution of genetic variation. Whereas landscape genetics studies primarily focus on testing the effects of landscape variables on gene flow and genetic population structure, landscape genomics studies focus on detecting candidate genes under selection that indicate possible local adaptation. Navigating the transition between landscape genomics and landscape genetics can be challenging. The number of molecular markers analyzed has shifted from what used to be a few dozen loci to thousands of loci and even full genomes. Although genome scale data can be separated into sets of neutral loci for analyses of gene flow and population structure and putative loci under selection for inference of local adaptation, there are inherent differences in the questions that are addressed in the two study frameworks. We discuss these differences and their implications for study design, marker choice and downstream analysis methods. Similar to the rapid proliferation of analysis methods in the early development of landscape genetics, new analytical methods for detection of selection in landscape genomics studies are burgeoning. We focus on genome scan methods for detection of selection, and in particular, outlier differentiation methods and genetic-environment association tests because they are the most widely used. Use of genome scan methods requires an understanding of the potential mismatches between the biology of a species and assumptions inherent in analytical methods used, which can lead to high false positive rates of detected loci under selection. Key to choosing appropriate genome scan methods is an understanding of the underlying demographic structure of study populations, and such data can be obtained using neutral loci from the generated genome-wide data or prior knowledge of a species
Analysis of high-throughput sequencing and annotation strategies for phage genomes.

Directory of Open Access Journals (Sweden)

Matthew R Henn

Full Text Available BACKGROUND: Bacterial viruses (phages play a critical role in shaping microbial populations as they influence both host mortality and horizontal gene transfer. As such, they have a significant impact on local and global ecosystem function and human health. Despite their importance, little is known about the genomic diversity harbored in phages, as methods to capture complete phage genomes have been hampered by the lack of knowledge about the target genomes, and difficulties in generating sufficient quantities of genomic DNA for sequencing. Of the approximately 550 phage genomes currently available in the public domain, fewer than 5% are marine phage. METHODOLOGY/PRINCIPAL FINDINGS: To advance the study of phage biology through comparative genomic approaches we used marine cyanophage as a model system. We compared DNA preparation methodologies (DNA extraction directly from either phage lysates or CsCl purified phage particles, and sequencing strategies that utilize either Sanger sequencing of a linker amplification shotgun library (LASL or of a whole genome shotgun library (WGSL, or 454 pyrosequencing methods. We demonstrate that genomic DNA sample preparation directly from a phage lysate, combined with 454 pyrosequencing, is best suited for phage genome sequencing at scale, as this method is capable of capturing complete continuous genomes with high accuracy. In addition, we describe an automated annotation informatics pipeline that delivers high-quality annotation and yields few false positives and negatives in ORF calling. CONCLUSIONS/SIGNIFICANCE: These DNA preparation, sequencing and annotation strategies enable a high-throughput approach to the burgeoning field of phage genomics.
MimoSA: a system for minimotif annotation

Directory of Open Access Journals (Sweden)

Kundeti Vamsi

2010-06-01

Full Text Available Abstract Background Minimotifs are short peptide sequences within one protein, which are recognized by other proteins or molecules. While there are now several minimotif databases, they are incomplete. There are reports of many minimotifs in the primary literature, which have yet to be annotated, while entirely novel minimotifs continue to be published on a weekly basis. Our recently proposed function and sequence syntax for minimotifs enables us to build a general tool that will facilitate structured annotation and management of minimotif data from the biomedical literature. Results We have built the MimoSA application for minimotif annotation. The application supports management of the Minimotif Miner database, literature tracking, and annotation of new minimotifs. MimoSA enables the visualization, organization, selection and editing functions of minimotifs and their attributes in the MnM database. For the literature components, Mimosa provides paper status tracking and scoring of papers for annotation through a freely available machine learning approach, which is based on word correlation. The paper scoring algorithm is also available as a separate program, TextMine. Form-driven annotation of minimotif attributes enables entry of new minimotifs into the MnM database. Several supporting features increase the efficiency of annotation. The layered architecture of MimoSA allows for extensibility by separating the functions of paper scoring, minimotif visualization, and database management. MimoSA is readily adaptable to other annotation efforts that manually curate literature into a MySQL database. Conclusions MimoSA is an extensible application that facilitates minimotif annotation and integrates with the Minimotif Miner database. We have built MimoSA as an application that integrates dynamic abstract scoring with a high performance relational model of minimotif syntax. MimoSA's TextMine, an efficient paper-scoring algorithm, can be used to
Characterizing structural transitions using localized free energy landscape analysis.

Directory of Open Access Journals (Sweden)

Nilesh K Banavali

Full Text Available Structural changes in molecules are frequently observed during biological processes like replication, transcription and translation. These structural changes can usually be traced to specific distortions in the backbones of the macromolecules involved. Quantitative energetic characterization of such distortions can greatly advance the atomic-level understanding of the dynamic character of these biological processes.Molecular dynamics simulations combined with a variation of the Weighted Histogram Analysis Method for potential of mean force determination are applied to characterize localized structural changes for the test case of cytosine (underlined base flipping in a GTCAGCGCATGG DNA duplex. Free energy landscapes for backbone torsion and sugar pucker degrees of freedom in the DNA are used to understand their behavior in response to the base flipping perturbation. By simplifying the base flipping structural change into a two-state model, a free energy difference of upto 14 kcal/mol can be attributed to the flipped state relative to the stacked Watson-Crick base paired state. This two-state classification allows precise evaluation of the effect of base flipping on local backbone degrees of freedom.The calculated free energy landscapes of individual backbone and sugar degrees of freedom expectedly show the greatest change in the vicinity of the flipping base itself, but specific delocalized effects can be discerned upto four nucleotide positions away in both 5' and 3' directions. Free energy landscape analysis thus provides a quantitative method to pinpoint the determinants of structural change on the atomic scale and also delineate the extent of propagation of the perturbation along the molecule. In addition to nucleic acids, this methodology is anticipated to be useful for studying conformational changes in all macromolecules, including carbohydrates, lipids, and proteins.
Integrative specimen information service - a campus-wide resource for tissue banking, experimental data annotation, and analysis services.

Science.gov (United States)

Schadow, Gunther; Dhaval, Rakesh; McDonald, Clement J; Ragg, Susanne

2006-01-01

We present the architecture and approach of an evolving campus-wide information service for tissues with clinical and data annotations to be used and contributed to by clinical researchers across the campus. The services provided include specimen tracking, long term data storage, and computational analysis services. The project is conceived and sustained by collaboration among researchers on the campus as well as participation in standards organizations and national collaboratives.
ANALYSIS OF THE WOODY VEGETATION DYNAMICS IN THE AREA OF TREE LINE ECOTONE ON THE BASIS OF PHOTO MONITORING DATA AND USING GIS

Directory of Open Access Journals (Sweden)

A. P. Mikhailovich

2016-01-01

Full Text Available A method of processing and presentation of the repeated landscape photographs for analysis of spatio-temporal dynamics of woody vegetation in tree line ecotone the Polar Urals (mountain Rai-Iz was developed. It is intended to solve problems with the use of such photographs so as to help the researcher to gain an integral representation of the space under study, obtain additional information about the region of interest, create and update annotation to photographs, and develop thematic maps using repeated landscape photography.
Diverse Image Annotation

KAUST Repository

Wu, Baoyuan

2017-11-09

In this work we study the task of image annotation, of which the goal is to describe an image using a few tags. Instead of predicting the full list of tags, here we target for providing a short list of tags under a limited number (e.g., 3), to cover as much information as possible of the image. The tags in such a short list should be representative and diverse. It means they are required to be not only corresponding to the contents of the image, but also be different to each other. To this end, we treat the image annotation as a subset selection problem based on the conditional determinantal point process (DPP) model, which formulates the representation and diversity jointly. We further explore the semantic hierarchy and synonyms among the candidate tags, and require that two tags in a semantic hierarchy or in a pair of synonyms should not be selected simultaneously. This requirement is then embedded into the sampling algorithm according to the learned conditional DPP model. Besides, we find that traditional metrics for image annotation (e.g., precision, recall and F1 score) only consider the representation, but ignore the diversity. Thus we propose new metrics to evaluate the quality of the selected subset (i.e., the tag list), based on the semantic hierarchy and synonyms. Human study through Amazon Mechanical Turk verifies that the proposed metrics are more close to the humans judgment than traditional metrics. Experiments on two benchmark datasets show that the proposed method can produce more representative and diverse tags, compared with existing image annotation methods.
Diverse Image Annotation

KAUST Repository

Wu, Baoyuan; Jia, Fan; Liu, Wei; Ghanem, Bernard

2017-01-01

In this work we study the task of image annotation, of which the goal is to describe an image using a few tags. Instead of predicting the full list of tags, here we target for providing a short list of tags under a limited number (e.g., 3), to cover as much information as possible of the image. The tags in such a short list should be representative and diverse. It means they are required to be not only corresponding to the contents of the image, but also be different to each other. To this end, we treat the image annotation as a subset selection problem based on the conditional determinantal point process (DPP) model, which formulates the representation and diversity jointly. We further explore the semantic hierarchy and synonyms among the candidate tags, and require that two tags in a semantic hierarchy or in a pair of synonyms should not be selected simultaneously. This requirement is then embedded into the sampling algorithm according to the learned conditional DPP model. Besides, we find that traditional metrics for image annotation (e.g., precision, recall and F1 score) only consider the representation, but ignore the diversity. Thus we propose new metrics to evaluate the quality of the selected subset (i.e., the tag list), based on the semantic hierarchy and synonyms. Human study through Amazon Mechanical Turk verifies that the proposed metrics are more close to the humans judgment than traditional metrics. Experiments on two benchmark datasets show that the proposed method can produce more representative and diverse tags, compared with existing image annotation methods.
Annotating individual human genomes.

Science.gov (United States)

Torkamani, Ali; Scott-Van Zeeland, Ashley A; Topol, Eric J; Schork, Nicholas J

2011-10-01

Advances in DNA sequencing technologies have made it possible to rapidly, accurately and affordably sequence entire individual human genomes. As impressive as this ability seems, however, it will not likely amount to much if one cannot extract meaningful information from individual sequence data. Annotating variations within individual genomes and providing information about their biological or phenotypic impact will thus be crucially important in moving individual sequencing projects forward, especially in the context of the clinical use of sequence information. In this paper we consider the various ways in which one might annotate individual sequence variations and point out limitations in the available methods for doing so. It is arguable that, in the foreseeable future, DNA sequencing of individual genomes will become routine for clinical, research, forensic, and personal purposes. We therefore also consider directions and areas for further research in annotating genomic variants. Copyright © 2011 Elsevier Inc. All rights reserved.
ANNOTATING INDIVIDUAL HUMAN GENOMES*

Science.gov (United States)

Torkamani, Ali; Scott-Van Zeeland, Ashley A.; Topol, Eric J.; Schork, Nicholas J.

2014-01-01

Advances in DNA sequencing technologies have made it possible to rapidly, accurately and affordably sequence entire individual human genomes. As impressive as this ability seems, however, it will not likely to amount to much if one cannot extract meaningful information from individual sequence data. Annotating variations within individual genomes and providing information about their biological or phenotypic impact will thus be crucially important in moving individual sequencing projects forward, especially in the context of the clinical use of sequence information. In this paper we consider the various ways in which one might annotate individual sequence variations and point out limitations in the available methods for doing so. It is arguable that, in the foreseeable future, DNA sequencing of individual genomes will become routine for clinical, research, forensic, and personal purposes. We therefore also consider directions and areas for further research in annotating genomic variants. PMID:21839162
Landscape and land use history of Eurajoki between 1840 and 2007: Analysis of geographical data and landscape transformation

International Nuclear Information System (INIS)

Koistinen, T.; Kaeyhkoe, N.

2011-11-01

, vectorizing of the selected elements, standardizing of the heterogeneous spatial data and analyzing and estimating of the quality of the produced data. In addition report includes description of used landscape change study methods and visualizations of the produced spatial data and change analysis results. The future possibilities and restrictions of the data are also discussed. Main challenges for the use are thematic differencies and inconsistencies between the original map products. Different scales and geometric errors cause also some concerns for the future use. The spatial data produced from the more general map products can however turn out to be useful for example as approaching maps. The time scale of the study starts from the 1840's and ends in 2007. The maps used in this project turned out to be very varying in their consistency and reliability. This alternation was evident also between the landscape elements. Fields, meadows and buildings were depicted in the most consistent manner between the maps, altough meadow as a concept has somewhat changed during the research period. Most of the meadows in Eurajoki disappeared by the 1903-1904 as most of them were ploughed into fields. Measuring the changes in small lakes between the historical spatial data and the modern land use information was more challenging as precision of depiction was so different between the products. Comparing wetlands was also very difficult task because the way of classifying and depicting of them varied from one historical map to another. The basic structure of settlements has maintained during the whole study period although the housing in different parts of the municipal has somewhat changed it's emphasis. The basic composition of roadnetwork has also remained throughout the study period. The only clear new feature in the landscape is the highway that was built in 1950's. (orig.)
Assembly, Annotation, and Analysis of Multiple Mycorrhizal Fungal Genomes

Energy Technology Data Exchange (ETDEWEB)

Initiative Consortium, Mycorrhizal Genomics; Kuo, Alan; Grigoriev, Igor; Kohler, Annegret; Martin, Francis

2013-03-08

Mycorrhizal fungi play critical roles in host plant health, soil community structure and chemistry, and carbon and nutrient cycling, all areas of intense interest to the US Dept. of Energy (DOE) Joint Genome Institute (JGI). To this end we are building on our earlier sequencing of the Laccaria bicolor genome by partnering with INRA-Nancy and the mycorrhizal research community in the MGI to sequence and analyze dozens of mycorrhizal genomes of all Basidiomycota and Ascomycota orders and multiple ecological types (ericoid, orchid, and ectomycorrhizal). JGI has developed and deployed high-throughput sequencing techniques, and Assembly, RNASeq, and Annotation Pipelines. In 2012 alone we sequenced, assembled, and annotated 12 draft or improved genomes of mycorrhizae, and predicted ~;;232831 genes and ~;;15011 multigene families, All of this data is publicly available on JGI MycoCosm (http://jgi.doe.gov/fungi/), which provides access to both the genome data and tools with which to analyze the data. Preliminary comparisons of the current total of 14 public mycorrhizal genomes suggest that 1) short secreted proteins potentially involved in symbiosis are more enriched in some orders than in others amongst the mycorrhizal Agaricomycetes, 2) there are wide ranges of numbers of genes involved in certain functional categories, such as signal transduction and post-translational modification, and 3) novel gene families are specific to some ecological types.
Developing national on-line services to annotate and analyse underwater imagery in a research cloud

Science.gov (United States)

Proctor, R.; Langlois, T.; Friedman, A.; Davey, B.

2017-12-01

Fish image annotation data is currently collected by various research, management and academic institutions globally (+100,000's hours of deployments) with varying degrees of standardisation and limited formal collaboration or data synthesis. We present a case study of how national on-line services, developed within a domain-oriented research cloud, have been used to annotate habitat images and synthesise fish annotation data sets collected using Autonomous Underwater Vehicles (AUVs) and baited remote underwater stereo-video (stereo-BRUV). Two developing software tools have been brought together in the marine science cloud to provide marine biologists with a powerful service for image annotation. SQUIDLE+ is an online platform designed for exploration, management and annotation of georeferenced images & video data. It provides a flexible annotation framework allowing users to work with their preferred annotation schemes. We have used SQUIDLE+ to sample the habitat composition and complexity of images of the benthos collected using stereo-BRUV. GlobalArchive is designed to be a centralised repository of aquatic ecological survey data with design principles including ease of use, secure user access, flexible data import, and the collection of any sampling and image analysis information. To easily share and synthesise data we have implemented data sharing protocols, including Open Data and synthesis Collaborations, and a spatial map to explore global datasets and filter to create a synthesis. These tools in the science cloud, together with a virtual desktop analysis suite offering python and R environments offer an unprecedented capability to deliver marine biodiversity information of value to marine managers and scientists alike.
The visual-landscape analysis during the integration of high-rise buildings within the historic urban environment

OpenAIRE

Akristiniy Vera A.; Dikova Elena A.

2018-01-01

The article is devoted to one of the types of urban planning studies - the visual-landscape analysis during the integration of high-rise buildings within the historic urban environment for the purposes of providing pre-design and design studies in terms of preserving the historical urban environment and the implementation of the reconstructional resource of the area. In the article formed and systematized the stages and methods of conducting the visual-landscape analysis taking into account t...
Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus.

Science.gov (United States)

Savkov, Aleksandar; Carroll, John; Koeling, Rob; Cassell, Jackie

The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning.

Discovering gene annotations in biomedical text databases

Directory of Open Access Journals (Sweden)

Ozsoyoglu Gultekin

2008-03-01

Full Text Available Abstract Background Genes and gene products are frequently annotated with Gene Ontology concepts based on the evidence provided in genomics articles. Manually locating and curating information about a genomic entity from the biomedical literature requires vast amounts of human effort. Hence, there is clearly a need forautomated computational tools to annotate the genes and gene products with Gene Ontology concepts by computationally capturing the related knowledge embedded in textual data. Results In this article, we present an automated genomic entity annotation system, GEANN, which extracts information about the characteristics of genes and gene products in article abstracts from PubMed, and translates the discoveredknowledge into Gene Ontology (GO concepts, a widely-used standardized vocabulary of genomic traits. GEANN utilizes textual "extraction patterns", and a semantic matching framework to locate phrases matching to a pattern and produce Gene Ontology annotations for genes and gene products. In our experiments, GEANN has reached to the precision level of 78% at therecall level of 61%. On a select set of Gene Ontology concepts, GEANN either outperforms or is comparable to two other automated annotation studies. Use of WordNet for semantic pattern matching improves the precision and recall by 24% and 15%, respectively, and the improvement due to semantic pattern matching becomes more apparent as the Gene Ontology terms become more general. Conclusion GEANN is useful for two distinct purposes: (i automating the annotation of genomic entities with Gene Ontology concepts, and (ii providing existing annotations with additional "evidence articles" from the literature. The use of textual extraction patterns that are constructed based on the existing annotations achieve high precision. The semantic pattern matching framework provides a more flexible pattern matching scheme with respect to "exactmatching" with the advantage of locating approximate
Annotating the human genome with Disease Ontology

Science.gov (United States)

Osborne, John D; Flatow, Jared; Holko, Michelle; Lin, Simon M; Kibbe, Warren A; Zhu, Lihua (Julie); Danila, Maria I; Feng, Gang; Chisholm, Rex L

2009-01-01

Background The human genome has been extensively annotated with Gene Ontology for biological functions, but minimally computationally annotated for diseases. Results We used the Unified Medical Language System (UMLS) MetaMap Transfer tool (MMTx) to discover gene-disease relationships from the GeneRIF database. We utilized a comprehensive subset of UMLS, which is disease-focused and structured as a directed acyclic graph (the Disease Ontology), to filter and interpret results from MMTx. The results were validated against the Homayouni gene collection using recall and precision measurements. We compared our results with the widely used Online Mendelian Inheritance in Man (OMIM) annotations. Conclusion The validation data set suggests a 91% recall rate and 97% precision rate of disease annotation using GeneRIF, in contrast with a 22% recall and 98% precision using OMIM. Our thesaurus-based approach allows for comparisons to be made between disease containing databases and allows for increased accuracy in disease identification through synonym matching. The much higher recall rate of our approach demonstrates that annotating human genome with Disease Ontology and GeneRIF for diseases dramatically increases the coverage of the disease annotation of human genome. PMID:19594883
Annotating functional RNAs in genomes using Infernal.

Science.gov (United States)

Nawrocki, Eric P

2014-01-01

Many different types of functional non-coding RNAs participate in a wide range of important cellular functions but the large majority of these RNAs are not routinely annotated in published genomes. Several programs have been developed for identifying RNAs, including specific tools tailored to a particular RNA family as well as more general ones designed to work for any family. Many of these tools utilize covariance models (CMs), statistical models of the conserved sequence, and structure of an RNA family. In this chapter, as an illustrative example, the Infernal software package and CMs from the Rfam database are used to identify RNAs in the genome of the archaeon Methanobrevibacter ruminantium, uncovering some additional RNAs not present in the genome's initial annotation. Analysis of the results and comparison with family-specific methods demonstrate some important strengths and weaknesses of this general approach.
MIPS bacterial genomes functional annotation benchmark dataset.

Science.gov (United States)

Tetko, Igor V; Brauner, Barbara; Dunger-Kaltenbach, Irmtraud; Frishman, Goar; Montrone, Corinna; Fobo, Gisela; Ruepp, Andreas; Antonov, Alexey V; Surmeli, Dimitrij; Mewes, Hans-Wernen

2005-05-15

Any development of new methods for automatic functional annotation of proteins according to their sequences requires high-quality data (as benchmark) as well as tedious preparatory work to generate sequence parameters required as input data for the machine learning methods. Different program settings and incompatible protocols make a comparison of the analyzed methods difficult. The MIPS Bacterial Functional Annotation Benchmark dataset (MIPS-BFAB) is a new, high-quality resource comprising four bacterial genomes manually annotated according to the MIPS functional catalogue (FunCat). These resources include precalculated sequence parameters, such as sequence similarity scores, InterPro domain composition and other parameters that could be used to develop and benchmark methods for functional annotation of bacterial protein sequences. These data are provided in XML format and can be used by scientists who are not necessarily experts in genome annotation. BFAB is available at http://mips.gsf.de/proj/bfab
Longwave infrared observation of urban landscapes

Science.gov (United States)

Goward, S. N.

1981-01-01

An investigation is conducted regarding the feasibility to develop improved methods for the identification and analysis of urban landscapes on the basis of a utilization of longwave infrared observations. Attention is given to landscape thermal behavior, urban thermal properties, modeled thermal behavior of pavements and buildings, and observed urban landscape thermal emissions. The differential thermal behavior of buildings, pavements, and natural areas within urban landscapes is found to suggest that integrated multispectral solar radiant reflectance and terrestrial radiant emissions data will significantly increase potentials for analyzing urban landscapes. In particular, daytime satellite observations of the considered type should permit better identification of urban areas and an analysis of the density of buildings and pavements within urban areas. This capability should enhance the utility of satellite remote sensor data in urban applications.
Systems Theory and Communication. Annotated Bibliography.

Science.gov (United States)

Covington, William G., Jr.

This annotated bibliography presents annotations of 31 books and journal articles dealing with systems theory and its relation to organizational communication, marketing, information theory, and cybernetics. Materials were published between 1963 and 1992 and are listed alphabetically by author. (RS)
The surplus value of semantic annotations

NARCIS (Netherlands)

Marx, M.

2010-01-01

We compare the costs of semantic annotation of textual documents to its benefits for information processing tasks. Semantic annotation can improve the performance of retrieval tasks and facilitates an improved search experience through faceted search, focused retrieval, better document summaries,
Annotation-based enrichment of Digital Objects using open-source frameworks

Directory of Open Access Journals (Sweden)

Marcus Emmanuel Barnes

2017-07-01

Full Text Available The W3C Web Annotation Data Model, Protocol, and Vocabulary unify approaches to annotations across the web, enabling their aggregation, discovery and persistence over time. In addition, new javascript libraries provide the ability for users to annotate multi-format content. In this paper, we describe how we have leveraged these developments to provide annotation features alongside Islandora’s existing preservation, access, and management capabilities. We also discuss our experience developing with the Web Annotation Model as an open web architecture standard, as well as our approach to integrating mature external annotation libraries. The resulting software (the Web Annotation Utility Module for Islandora accommodates annotation across multiple formats. This solution can be used in various digital scholarship contexts.
Visualizing nD Point Clouds as Topological Landscape Profiles to Guide Local Data Analysis

Energy Technology Data Exchange (ETDEWEB)

Oesterling, Patrick [Univ. of Leipzig (Germany). Computer Science Dept.; Heine, Christian [Univ. of Leipzig (Germany). Computer Science Dept.; Federal Inst. of Technology (ETH), Zurich (Switzerland). Dept. of Computer Science; Weber, Gunther H. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division; Scheuermann, Gerik [Univ. of Leipzig (Germany). Computer Science Dept.

2012-05-04

Analyzing high-dimensional point clouds is a classical challenge in visual analytics. Traditional techniques, such as projections or axis-based techniques, suffer from projection artifacts, occlusion, and visual complexity.We propose to split data analysis into two parts to address these shortcomings. First, a structural overview phase abstracts data by its density distribution. This phase performs topological analysis to support accurate and non-overlapping presentation of the high-dimensional cluster structure as a topological landscape profile. Utilizing a landscape metaphor, it presents clusters and their nesting as hills whose height, width, and shape reflect cluster coherence, size, and stability, respectively. A second local analysis phase utilizes this global structural knowledge to select individual clusters or point sets for further, localized data analysis. Focusing on structural entities significantly reduces visual clutter in established geometric visualizations and permits a clearer, more thorough data analysis. In conclusion, this analysis complements the global topological perspective and enables the user to study subspaces or geometric properties, such as shape.
xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud.

Science.gov (United States)

Duvick, Jon; Standage, Daniel S; Merchant, Nirav; Brendel, Volker P

2016-04-01

Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today's pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant's Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. © 2016 American Society of Plant Biologists. All rights reserved.
Heritage landscape structure analysis in surrounding environment of the Grand Canal Yangzhou section

Science.gov (United States)

Xu, Huan

2018-03-01

The Yangzhou section of the Grand Canal is selected for a case study in this paper. The ZY-3 satellite images of 2016 are adopted as the data source. RS and GIS are used to analyze the landscape classification of the surrounding landscape of the Grand Canal, and the classification results are precisely evaluated. Next, the overall features of the landscape pattern are analyzed. The results showed that the overall accuracy is 82.5% and the Kappa coefficient is 78.17% in the Yangzhou section. The producer’s accuracy of the water landscape is the highest, followed by that of the other landscape, farmland landscape, garden and forest landscape, architectural landscape. The user’s accuracy of different landscape types can be ranked in a descending order, as the water landscape, farmland landscape, road landscape, architectural landscape, other landscape, garden and forest landscape. The farmland landscape and the architectural landscape are the top advantageous landscape types of the heritage site. The research findings can provide basic data for landscape protection, management and sustainable development of the Grand Canal Yangzhou section.
PANNZER2: a rapid functional annotation web server.

Science.gov (United States)

Törönen, Petri; Medlar, Alan; Holm, Liisa

2018-05-08

The unprecedented growth of high-throughput sequencing has led to an ever-widening annotation gap in protein databases. While computational prediction methods are available to make up the shortfall, a majority of public web servers are hindered by practical limitations and poor performance. Here, we introduce PANNZER2 (Protein ANNotation with Z-scoRE), a fast functional annotation web server that provides both Gene Ontology (GO) annotations and free text description predictions. PANNZER2 uses SANSparallel to perform high-performance homology searches, making bulk annotation based on sequence similarity practical. PANNZER2 can output GO annotations from multiple scoring functions, enabling users to see which predictions are robust across predictors. Finally, PANNZER2 predictions scored within the top 10 methods for molecular function and biological process in the CAFA2 NK-full benchmark. The PANNZER2 web server is updated on a monthly schedule and is accessible at http://ekhidna2.biocenter.helsinki.fi/sanspanz/. The source code is available under the GNU Public Licence v3.
Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana

Science.gov (United States)

Itoh, Takeshi; Tanaka, Tsuyoshi; Barrero, Roberto A.; Yamasaki, Chisato; Fujii, Yasuyuki; Hilton, Phillip B.; Antonio, Baltazar A.; Aono, Hideo; Apweiler, Rolf; Bruskiewich, Richard; Bureau, Thomas; Burr, Frances; Costa de Oliveira, Antonio; Fuks, Galina; Habara, Takuya; Haberer, Georg; Han, Bin; Harada, Erimi; Hiraki, Aiko T.; Hirochika, Hirohiko; Hoen, Douglas; Hokari, Hiroki; Hosokawa, Satomi; Hsing, Yue; Ikawa, Hiroshi; Ikeo, Kazuho; Imanishi, Tadashi; Ito, Yukiyo; Jaiswal, Pankaj; Kanno, Masako; Kawahara, Yoshihiro; Kawamura, Toshiyuki; Kawashima, Hiroaki; Khurana, Jitendra P.; Kikuchi, Shoshi; Komatsu, Setsuko; Koyanagi, Kanako O.; Kubooka, Hiromi; Lieberherr, Damien; Lin, Yao-Cheng; Lonsdale, David; Matsumoto, Takashi; Matsuya, Akihiro; McCombie, W. Richard; Messing, Joachim; Miyao, Akio; Mulder, Nicola; Nagamura, Yoshiaki; Nam, Jongmin; Namiki, Nobukazu; Numa, Hisataka; Nurimoto, Shin; O’Donovan, Claire; Ohyanagi, Hajime; Okido, Toshihisa; OOta, Satoshi; Osato, Naoki; Palmer, Lance E.; Quetier, Francis; Raghuvanshi, Saurabh; Saichi, Naomi; Sakai, Hiroaki; Sakai, Yasumichi; Sakata, Katsumi; Sakurai, Tetsuya; Sato, Fumihiko; Sato, Yoshiharu; Schoof, Heiko; Seki, Motoaki; Shibata, Michie; Shimizu, Yuji; Shinozaki, Kazuo; Shinso, Yuji; Singh, Nagendra K.; Smith-White, Brian; Takeda, Jun-ichi; Tanino, Motohiko; Tatusova, Tatiana; Thongjuea, Supat; Todokoro, Fusano; Tsugane, Mika; Tyagi, Akhilesh K.; Vanavichit, Apichart; Wang, Aihui; Wing, Rod A.; Yamaguchi, Kaori; Yamamoto, Mayu; Yamamoto, Naoyuki; Yu, Yeisoo; Zhang, Hao; Zhao, Qiang; Higo, Kenichi; Burr, Benjamin; Gojobori, Takashi; Sasaki, Takuji

2007-01-01

We present here the annotation of the complete genome of rice Oryza sativa L. ssp. japonica cultivar Nipponbare. All functional annotations for proteins and non-protein-coding RNA (npRNA) candidates were manually curated. Functions were identified or inferred in 19,969 (70%) of the proteins, and 131 possible npRNAs (including 58 antisense transcripts) were found. Almost 5000 annotated protein-coding genes were found to be disrupted in insertional mutant lines, which will accelerate future experimental validation of the annotations. The rice loci were determined by using cDNA sequences obtained from rice and other representative cereals. Our conservative estimate based on these loci and an extrapolation suggested that the gene number of rice is ∼32,000, which is smaller than previous estimates. We conducted comparative analyses between rice and Arabidopsis thaliana and found that both genomes possessed several lineage-specific genes, which might account for the observed differences between these species, while they had similar sets of predicted functional domains among the protein sequences. A system to control translational efficiency seems to be conserved across large evolutionary distances. Moreover, the evolutionary process of protein-coding genes was examined. Our results suggest that natural selection may have played a role for duplicated genes in both species, so that duplication was suppressed or favored in a manner that depended on the function of a gene. PMID:17210932
Spatial Analysis of Agricultural Landscape and Hymenoptera Biodiversity at Cianjur Watershed

Directory of Open Access Journals (Sweden)

YAHERWANDI

2006-12-01

Full Text Available Hymenoptera is one of the four largest insect order (the other three are Coleoptera, Diptera, and Lepidoptera. There are curerently over 115 000 described Hymenoptera species. It is clear that Hymenoptera is one of the major components of insect biodiversity. However, Hymenoptera biodiversity is affected by ecology, environment, and ecosystem management. In an agricultural areas, the spatial structure, habitat diversity, and habitat composition may vary from cleared landscapes to structurally rich landscape. Thus, it is very likely that such large-scale spatial patterns (landscape effects may influence local biodiversity and ecological functions. Therefore, the objective of this research were to study diversity and configuration elements of agricultural landscapes at Cianjur Watershed with geographical information sytems (GIS and its influence on Hymenoptera biodiversity. The structural differences between agricultural landscapes of Nyalindung, Gasol, and Selajambe were characterized by patch analyst with ArcView 3.2 of digital land use data. Results indicated that class of land uses of Cianjur Watershed landscape were housing, mixed gardens, talun and rice, vegetable, and corn fields. Landscape structure influenced the biodiversity of Hymenoptera. Species richness and the species diversity were higher in Nyalindung landscape compare to Gasol and Selajambe landscape.
Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator

Science.gov (United States)

Seyed, P.; Chastain, K.; McGuinness, D. L.

2013-12-01

library of vocabularies to assist the user in locating terms to describe observed entities, their properties, and relationships. The Annotator leverages vocabulary definitions of these concepts to guide the user in describing data in a logically consistent manner. The vocabularies made available through the Annotator are open, as is the Annotator itself. We have taken a step towards making semantic annotation/translation of data more accessible. Our vision for the Annotator is as a tool that can be integrated into a semantic data 'workbench' environment, which would allow semantic annotation of a variety of data formats, using standard vocabularies. These vocabularies involved enable search for similar datasets, and integration with any semantically-enabled applications for analysis and visualization.
AnnoLnc: a web server for systematically annotating novel human lncRNAs.

Science.gov (United States)

Hou, Mei; Tang, Xing; Tian, Feng; Shi, Fangyuan; Liu, Fenglin; Gao, Ge

2016-11-16

Long noncoding RNAs (lncRNAs) have been shown to play essential roles in almost every important biological process through multiple mechanisms. Although the repertoire of human lncRNAs has rapidly expanded, their biological function and regulation remain largely elusive, calling for a systematic and integrative annotation tool. Here we present AnnoLnc ( http://annolnc.cbi.pku.edu.cn ), a one-stop portal for systematically annotating novel human lncRNAs. Based on more than 700 data sources and various tool chains, AnnoLnc enables a systematic annotation covering genomic location, secondary structure, expression patterns, transcriptional regulation, miRNA interaction, protein interaction, genetic association and evolution. An intuitive web interface is available for interactive analysis through both desktops and mobile devices, and programmers can further integrate AnnoLnc into their pipeline through standard JSON-based Web Service APIs. To the best of our knowledge, AnnoLnc is the only web server to provide on-the-fly and systematic annotation for newly identified human lncRNAs. Compared with similar tools, the annotation generated by AnnoLnc covers a much wider spectrum with intuitive visualization. Case studies demonstrate the power of AnnoLnc in not only rediscovering known functions of human lncRNAs but also inspiring novel hypotheses.
Correction of the Caulobacter crescentus NA1000 genome annotation.

Directory of Open Access Journals (Sweden)

Bert Ely

Full Text Available Bacterial genome annotations are accumulating rapidly in the GenBank database and the use of automated annotation technologies to create these annotations has become the norm. However, these automated methods commonly result in a small, but significant percentage of genome annotation errors. To improve accuracy and reliability, we analyzed the Caulobacter crescentus NA1000 genome utilizing computer programs Artemis and MICheck to manually examine the third codon position GC content, alignment to a third codon position GC frame plot peak, and matches in the GenBank database. We identified 11 new genes, modified the start site of 113 genes, and changed the reading frame of 38 genes that had been incorrectly annotated. Furthermore, our manual method of identifying protein-coding genes allowed us to remove 112 non-coding regions that had been designated as coding regions. The improved NA1000 genome annotation resulted in a reduction in the use of rare codons since noncoding regions with atypical codon usage were removed from the annotation and 49 new coding regions were added to the annotation. Thus, a more accurate codon usage table was generated as well. These results demonstrate that a comparison of the location of peaks third codon position GC content to the location of protein coding regions could be used to verify the annotation of any genome that has a GC content that is greater than 60%.
Annotation of regular polysemy and underspecification

DEFF Research Database (Denmark)

Martínez Alonso, Héctor; Pedersen, Bolette Sandford; Bel, Núria

2013-01-01

We present the result of an annotation task on regular polysemy for a series of seman- tic classes or dot types in English, Dan- ish and Spanish. This article describes the annotation process, the results in terms of inter-encoder agreement, and the sense distributions obtained with two methods...
Large-scale inference of gene function through phylogenetic annotation of Gene Ontology terms: case study of the apoptosis and autophagy cellular processes.

Science.gov (United States)

Feuermann, Marc; Gaudet, Pascale; Mi, Huaiyu; Lewis, Suzanna E; Thomas, Paul D

2016-01-01

We previously reported a paradigm for large-scale phylogenomic analysis of gene families that takes advantage of the large corpus of experimentally supported Gene Ontology (GO) annotations. This 'GO Phylogenetic Annotation' approach integrates GO annotations from evolutionarily related genes across ∼100 different organisms in the context of a gene family tree, in which curators build an explicit model of the evolution of gene functions. GO Phylogenetic Annotation models the gain and loss of functions in a gene family tree, which is used to infer the functions of uncharacterized (or incompletely characterized) gene products, even for human proteins that are relatively well studied. Here, we report our results from applying this paradigm to two well-characterized cellular processes, apoptosis and autophagy. This revealed several important observations with respect to GO annotations and how they can be used for function inference. Notably, we applied only a small fraction of the experimentally supported GO annotations to infer function in other family members. The majority of other annotations describe indirect effects, phenotypes or results from high throughput experiments. In addition, we show here how feedback from phylogenetic annotation leads to significant improvements in the PANTHER trees, the GO annotations and GO itself. Thus GO phylogenetic annotation both increases the quantity and improves the accuracy of the GO annotations provided to the research community. We expect these phylogenetically based annotations to be of broad use in gene enrichment analysis as well as other applications of GO annotations.Database URL: http://amigo.geneontology.org/amigo. © The Author(s) 2016. Published by Oxford University Press.
A semi-automatic annotation tool for cooking video

Science.gov (United States)

Bianco, Simone; Ciocca, Gianluigi; Napoletano, Paolo; Schettini, Raimondo; Margherita, Roberto; Marini, Gianluca; Gianforme, Giorgio; Pantaleo, Giuseppe

2013-03-01

In order to create a cooking assistant application to guide the users in the preparation of the dishes relevant to their profile diets and food preferences, it is necessary to accurately annotate the video recipes, identifying and tracking the foods of the cook. These videos present particular annotation challenges such as frequent occlusions, food appearance changes, etc. Manually annotate the videos is a time-consuming, tedious and error-prone task. Fully automatic tools that integrate computer vision algorithms to extract and identify the elements of interest are not error free, and false positive and false negative detections need to be corrected in a post-processing stage. We present an interactive, semi-automatic tool for the annotation of cooking videos that integrates computer vision techniques under the supervision of the user. The annotation accuracy is increased with respect to completely automatic tools and the human effort is reduced with respect to completely manual ones. The performance and usability of the proposed tool are evaluated on the basis of the time and effort required to annotate the same video sequences.

Performance of single and multi-atlas based automated landmarking methods compared to expert annotations in volumetric microCT datasets of mouse mandibles.

Science.gov (United States)

Young, Ryan; Maga, A Murat

2015-01-01

Here we present an application of advanced registration and atlas building framework DRAMMS to the automated annotation of mouse mandibles through a series of tests using single and multi-atlas segmentation paradigms and compare the outcomes to the current gold standard, manual annotation. Our results showed multi-atlas annotation procedure yields landmark precisions within the human observer error range. The mean shape estimates from gold standard and multi-atlas annotation procedure were statistically indistinguishable for both Euclidean Distance Matrix Analysis (mean form matrix) and Generalized Procrustes Analysis (Goodall F-test). Further research needs to be done to validate the consistency of variance-covariance matrix estimates from both methods with larger sample sizes. Multi-atlas annotation procedure shows promise as a framework to facilitate truly high-throughput phenomic analyses by channeling investigators efforts to annotate only a small portion of their datasets.
Experiments with crowdsourced re-annotation of a POS tagging data set

DEFF Research Database (Denmark)

Hovy, Dirk; Plank, Barbara; Søgaard, Anders

2014-01-01

Crowdsourcing lets us collect multiple annotations for an item from several annotators. Typically, these are annotations for non-sequential classification tasks. While there has been some work on crowdsourcing named entity annotations, researchers have assumed that syntactic tasks such as part......-of-speech (POS) tagging cannot be crowdsourced. This paper shows that workers can actually annotate sequential data almost as well as experts. Further, we show that the models learned from crowdsourced annotations fare as well as the models learned from expert annotations in downstream tasks....
MPEG-7 based video annotation and browsing

Science.gov (United States)

Hoeynck, Michael; Auweiler, Thorsten; Wellhausen, Jens

2003-11-01

The huge amount of multimedia data produced worldwide requires annotation in order to enable universal content access and to provide content-based search-and-retrieval functionalities. Since manual video annotation can be time consuming, automatic annotation systems are required. We review recent approaches to content-based indexing and annotation of videos for different kind of sports and describe our approach to automatic annotation of equestrian sports videos. We especially concentrate on MPEG-7 based feature extraction and content description, where we apply different visual descriptors for cut detection. Further, we extract the temporal positions of single obstacles on the course by analyzing MPEG-7 edge information. Having determined single shot positions as well as the visual highlights, the information is jointly stored with meta-textual information in an MPEG-7 description scheme. Based on this information, we generate content summaries which can be utilized in a user-interface in order to provide content-based access to the video stream, but further for media browsing on a streaming server.
PanCoreGen - Profiling, detecting, annotating protein-coding genes in microbial genomes.

Science.gov (United States)

Paul, Sandip; Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V; Chattopadhyay, Sujay

2015-12-01

A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing the pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen - a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for a species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars - Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. Copyright © 2015 Elsevier Inc. All rights reserved.
Combining gene prediction methods to improve metagenomic gene annotation

Directory of Open Access Journals (Sweden)

Rosen Gail L

2011-01-01

Full Text Available Abstract Background Traditional gene annotation methods rely on characteristics that may not be available in short reads generated from next generation technology, resulting in suboptimal performance for metagenomic (environmental samples. Therefore, in recent years, new programs have been developed that optimize performance on short reads. In this work, we benchmark three metagenomic gene prediction programs and combine their predictions to improve metagenomic read gene annotation. Results We not only analyze the programs' performance at different read-lengths like similar studies, but also separate different types of reads, including intra- and intergenic regions, for analysis. The main deficiencies are in the algorithms' ability to predict non-coding regions and gene edges, resulting in more false-positives and false-negatives than desired. In fact, the specificities of the algorithms are notably worse than the sensitivities. By combining the programs' predictions, we show significant improvement in specificity at minimal cost to sensitivity, resulting in 4% improvement in accuracy for 100 bp reads with ~1% improvement in accuracy for 200 bp reads and above. To correctly annotate the start and stop of the genes, we find that a consensus of all the predictors performs best for shorter read lengths while a unanimous agreement is better for longer read lengths, boosting annotation accuracy by 1-8%. We also demonstrate use of the classifier combinations on a real dataset. Conclusions To optimize the performance for both prediction and annotation accuracies, we conclude that the consensus of all methods (or a majority vote is the best for reads 400 bp and shorter, while using the intersection of GeneMark and Orphelia predictions is the best for reads 500 bp and longer. We demonstrate that most methods predict over 80% coding (including partially coding reads on a real human gut sample sequenced by Illumina technology.
Agrarian Landscape Management in a Modernized World

DEFF Research Database (Denmark)

Christensen, Andreas Aagaard

) A historical analysis of social drivers of land use change affecting agrarian landscapes in the Western world in the period 1700-2000 based on a litterature review of modernization theory applied to two local scale historical case studies of changes in landscape structure; (2) A national scale analysis based...... on archival and cartographic sources of the way selected modernization processes affected rural land use patterns in New Zealand in the period from its first Europeancolonial exploration in the 17th century until the present. (3) A global scale analysis of historical patterns of modernization affecting rural...... land use patterns within the Western world based on historical cartographic evidence, (4) A local scale analysis of the decision making practices of landscape managers in four modern case landscapes in Denmark and New Zealand, based on interview surveys conducted in 2011 and 2012. Findings indicate...
Exploring Energy Landscapes

Science.gov (United States)

Wales, David J.

2018-04-01

Recent advances in the potential energy landscapes approach are highlighted, including both theoretical and computational contributions. Treating the high dimensionality of molecular and condensed matter systems of contemporary interest is important for understanding how emergent properties are encoded in the landscape and for calculating these properties while faithfully representing barriers between different morphologies. The pathways characterized in full dimensionality, which are used to construct kinetic transition networks, may prove useful in guiding such calculations. The energy landscape perspective has also produced new procedures for structure prediction and analysis of thermodynamic properties. Basin-hopping global optimization, with alternative acceptance criteria and generalizations to multiple metric spaces, has been used to treat systems ranging from biomolecules to nanoalloy clusters and condensed matter. This review also illustrates how all this methodology, developed in the context of chemical physics, can be transferred to landscapes defined by cost functions associated with machine learning.
Ground Truth Annotation in T Analyst

DEFF Research Database (Denmark)

2015-01-01

This video shows how to annotate the ground truth tracks in the thermal videos. The ground truth tracks are produced to be able to compare them to tracks obtained from a Computer Vision tracking approach. The program used for annotation is T-Analyst, which is developed by Aliaksei Laureshyn, Ph...
Using Landscape metrics to analyze the landscape evolution under land abandonment

Science.gov (United States)

Pelorosso, Raffaele; Della Chiesa, Stefano; Gobattoni, Federica; Leone, Antonio

2010-05-01

view, therefore, not all the abandoned land will be covered by woods also after a reasonable time (e.g 20-30 years); open areas patches can resist over time as a consequence of different (more o less natural) disturbances, pointing out a landscape mosaic and vegetation pattern almost never completely homogeneous. This spatial and temporal differentiation of landscape pattern, therefore, require both the individuation of disturbances and their effect on land abandonment process to be analyzed for each different landscape. Many types of analysis and models were developed and used to understand the reason of abandonment, its evolution, likelihood future landscape scenarios and the leading consequences on environment and population in order to establish right land-uses to obtain suitable and sustainable goods and services from landscape itself. One of these analysis recurs to landscape metrics. Landscape metrics have been widely applied in ecology and landscape ecology (Rainis, 2003; Romero-Calcerrada and Perry, 2004 ; Narumalani et al., 2004; Rocchini et al., 2006) because they allow an objective description of the temporal pattern of landscape change and a comparison with other landscapes (Turner et al., 2001). Furthermore, a description of the shape, size and spatial arrangement of patches of vegetation can be used to link the observed pattern with the ecological processes that may have generated it (Rocchini et al., 2006). So these metrics can be used to see how an abandoned landscape can evolve under the effects of different constrictions that, also if not completely knew, have been affecting the present assessment. Through historical and recent aerial photos (1954-1985-1999) and several landscape metrics, the evolution of marginal municipality of central Apennine under abandonment is presented here. Temporal evolution of landscape metrics was discussed to underline the importance of such descriptors of vegetation pattern dynamics and the key role played by these
Gene calling and bacterial genome annotation with BG7.

Science.gov (United States)

Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo

2015-01-01

New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).
Annotation of the Evaluative Language in a Dependency Treebank

Directory of Open Access Journals (Sweden)

Šindlerová Jana

2017-12-01

Full Text Available In the paper, we present our efforts to annotate evaluative language in the Prague Dependency Treebank 2.0. The project is a follow-up of the series of annotations of small plaintext corpora. It uses automatic identification of potentially evaluative nodes through mapping a Czech subjectivity lexicon to syntactically annotated data. These nodes are then manually checked by an annotator and either dismissed as standing in a non-evaluative context, or confirmed as evaluative. In the latter case, information about the polarity orientation, the source and target of evaluation is added by the annotator. The annotations unveiled several advantages and disadvantages of the chosen framework. The advantages involve more structured and easy-to-handle environment for the annotator, visibility of syntactic patterning of the evaluative state, effective solving of discontinuous structures or a new perspective on the influence of good/bad news. The disadvantages include little capability of treating cases with evaluation spread among more syntactically connected nodes at once, little capability of treating metaphorical expressions, or disregarding the effects of negation and intensification in the current scheme.
Annotating gene sets by mining large literature collections with protein networks.

Science.gov (United States)

Wang, Sheng; Ma, Jianzhu; Yu, Michael Ku; Zheng, Fan; Huang, Edward W; Han, Jiawei; Peng, Jian; Ideker, Trey

2018-01-01

Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networks. This system links genes with associated literature phrases and combines these links with protein interactions in a single heterogeneous network. Multiscale functional annotations are inferred based on network distances between phrases and genes and then visualized as an ontology of biological concepts. To evaluate this system, we predict functions for gene sets representing known pathways and find that our approach achieves substantial improvement over the conventional text-mining baseline method. Moreover, our system discovers novel annotations for gene sets or pathways without previously known functions. Two case studies demonstrate how the system is used in discovery of new cancer-related pathways with ontological annotations.
Landscape Character of Pongkor Mining Ecotourism Area

Science.gov (United States)

Kusumoarto, A.; Gunawan, A.; Machfud; Hikmat, A.

2017-10-01

Pongkor Mining Ecotourism Area has a diverse landscape character as a potential landscape resources for the development of ecotourism destination. This area is part of the Mount of Botol Resort, Halimun Salak National Park (HSNP). This area also has a fairly high biodiversity. This study aims to identify and analysis the category of landscape character in the Pongkor Mining Ecotourism Area for the development of ecotourism destination. This study used a descriptive approach through field surveys and interviews, was carried out through two steps : 1) identify the landscape character, and 2) analysis of the landscape character. The results showed that in areas set aside for ecotourism destination in Pongkor Mining, landscape character category scattered forests, tailing ponds, river, plain, and the built environment. The Category of landscape character most dominant scattered in the area is forest, here is the river, plain, tailing ponds, the built environment, and plain. The landscape character in a natural environment most preferred for ecotourism activities. The landscape character that spread in the natural environment and the built environment is a potential that must be protected and modified such as elimination of incongruous element, accentuation of natural form, alteration of the natural form, intensification and enhanced visual quality intensively to be developed as a ecotourism destination area.
The caBIG annotation and image Markup project.

Science.gov (United States)

Channin, David S; Mongkolwat, Pattanasak; Kleper, Vladimir; Sepukar, Kastubh; Rubin, Daniel L

2010-04-01

Image annotation and markup are at the core of medical interpretation in both the clinical and the research setting. Digital medical images are managed with the DICOM standard format. While DICOM contains a large amount of meta-data about whom, where, and how the image was acquired, DICOM says little about the content or meaning of the pixel data. An image annotation is the explanatory or descriptive information about the pixel data of an image that is generated by a human or machine observer. An image markup is the graphical symbols placed over the image to depict an annotation. While DICOM is the standard for medical image acquisition, manipulation, transmission, storage, and display, there are no standards for image annotation and markup. Many systems expect annotation to be reported verbally, while markups are stored in graphical overlays or proprietary formats. This makes it difficult to extract and compute with both of them. The goal of the Annotation and Image Markup (AIM) project is to develop a mechanism, for modeling, capturing, and serializing image annotation and markup data that can be adopted as a standard by the medical imaging community. The AIM project produces both human- and machine-readable artifacts. This paper describes the AIM information model, schemas, software libraries, and tools so as to prepare researchers and developers for their use of AIM.
xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud[OPEN

Science.gov (United States)

Merchant, Nirav

2016-01-01

Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today’s pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant’s Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. PMID:27020957
Interoperable Multimedia Annotation and Retrieval for the Tourism Sector

NARCIS (Netherlands)

Chatzitoulousis, Antonios; Efraimidis, Pavlos S.; Athanasiadis, I.N.

2015-01-01

The Atlas Metadata System (AMS) employs semantic web annotation techniques in order to create an interoperable information annotation and retrieval platform for the tourism sector. AMS adopts state-of-the-art metadata vocabularies, annotation techniques and semantic web technologies.
Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles.

Science.gov (United States)

Cohen, K Bretonnel; Lanfranchi, Arrick; Choi, Miji Joo-Young; Bada, Michael; Baumgartner, William A; Panteleyeva, Natalya; Verspoor, Karin; Palmer, Martha; Hunter, Lawrence E

2017-08-17

Coreference resolution is the task of finding strings in text that have the same referent as other strings. Failures of coreference resolution are a common cause of false negatives in information extraction from the scientific literature. In order to better understand the nature of the phenomenon of coreference in biomedical publications and to increase performance on the task, we annotated the Colorado Richly Annotated Full Text (CRAFT) corpus with coreference relations. The corpus was manually annotated with coreference relations, including identity and appositives for all coreferring base noun phrases. The OntoNotes annotation guidelines, with minor adaptations, were used. Interannotator agreement ranges from 0.480 (entity-based CEAF) to 0.858 (Class-B3), depending on the metric that is used to assess it. The resulting corpus adds nearly 30,000 annotations to the previous release of the CRAFT corpus. Differences from related projects include a much broader definition of markables, connection to extensive annotation of several domain-relevant semantic classes, and connection to complete syntactic annotation. Tool performance was benchmarked on the data. A publicly available out-of-the-box, general-domain coreference resolution system achieved an F-measure of 0.14 (B3), while a simple domain-adapted rule-based system achieved an F-measure of 0.42. An ensemble of the two reached F of 0.46. Following the IDENTITY chains in the data would add 106,263 additional named entities in the full 97-paper corpus, for an increase of 76% percent in the semantic classes of the eight ontologies that have been annotated in earlier versions of the CRAFT corpus. The project produced a large data set for further investigation of coreference and coreference resolution in the scientific literature. The work raised issues in the phenomenon of reference in this domain and genre, and the paper proposes that many mentions that would be considered generic in the general domain are not
A Novel Approach to Semantic and Coreference Annotation at LLNL

Energy Technology Data Exchange (ETDEWEB)

Firpo, M

2005-02-04

A case is made for the importance of high quality semantic and coreference annotation. The challenges of providing such annotation are described. Asperger's Syndrome is introduced, and the connections are drawn between the needs of text annotation and the abilities of persons with Asperger's Syndrome to meet those needs. Finally, a pilot program is recommended wherein semantic annotation is performed by people with Asperger's Syndrome. The primary points embodied in this paper are as follows: (1) Document annotation is essential to the Natural Language Processing (NLP) projects at Lawrence Livermore National Laboratory (LLNL); (2) LLNL does not currently have a system in place to meet its need for text annotation; (3) Text annotation is challenging for a variety of reasons, many related to its very rote nature; (4) Persons with Asperger's Syndrome are particularly skilled at rote verbal tasks, and behavioral experts agree that they would excel at text annotation; and (6) A pilot study is recommend in which two to three people with Asperger's Syndrome annotate documents and then the quality and throughput of their work is evaluated relative to that of their neuro-typical peers.
An automated annotation tool for genomic DNA sequences using

Indian Academy of Sciences (India)

Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated ...
Combined evidence annotation of transposable elements in genome sequences.

Directory of Open Access Journals (Sweden)

Hadi Quesneville

2005-07-01

Full Text Available Transposable elements (TEs are mobile, repetitive sequences that make up significant fractions of metazoan genomes. Despite their near ubiquity and importance in genome and chromosome biology, most efforts to annotate TEs in genome sequences rely on the results of a single computational program, RepeatMasker. In contrast, recent advances in gene annotation indicate that high-quality gene models can be produced from combining multiple independent sources of computational evidence. To elevate the quality of TE annotations to a level comparable to that of gene models, we have developed a combined evidence-model TE annotation pipeline, analogous to systems used for gene annotation, by integrating results from multiple homology-based and de novo TE identification methods. As proof of principle, we have annotated "TE models" in Drosophila melanogaster Release 4 genomic sequences using the combined computational evidence derived from RepeatMasker, BLASTER, TBLASTX, all-by-all BLASTN, RECON, TE-HMM and the previous Release 3.1 annotation. Our system is designed for use with the Apollo genome annotation tool, allowing automatic results to be curated manually to produce reliable annotations. The euchromatic TE fraction of D. melanogaster is now estimated at 5.3% (cf. 3.86% in Release 3.1, and we found a substantially higher number of TEs (n = 6,013 than previously identified (n = 1,572. Most of the new TEs derive from small fragments of a few hundred nucleotides long and highly abundant families not previously annotated (e.g., INE-1. We also estimated that 518 TE copies (8.6% are inserted into at least one other TE, forming a nest of elements. The pipeline allows rapid and thorough annotation of even the most complex TE models, including highly deleted and/or nested elements such as those often found in heterochromatic sequences. Our pipeline can be easily adapted to other genome sequences, such as those of the D. melanogaster heterochromatin or other

NoGOA: predicting noisy GO annotations using evidences and sparse representation.

Science.gov (United States)

Yu, Guoxian; Lu, Chang; Wang, Jun

2017-07-21

Gene Ontology (GO) is a community effort to represent functional features of gene products. GO annotations (GOA) provide functional associations between GO terms and gene products. Due to resources limitation, only a small portion of annotations are manually checked by curators, and the others are electronically inferred. Although quality control techniques have been applied to ensure the quality of annotations, the community consistently report that there are still considerable noisy (or incorrect) annotations. Given the wide application of annotations, however, how to identify noisy annotations is an important but yet seldom studied open problem. We introduce a novel approach called NoGOA to predict noisy annotations. NoGOA applies sparse representation on the gene-term association matrix to reduce the impact of noisy annotations, and takes advantage of sparse representation coefficients to measure the semantic similarity between genes. Secondly, it preliminarily predicts noisy annotations of a gene based on aggregated votes from semantic neighborhood genes of that gene. Next, NoGOA estimates the ratio of noisy annotations for each evidence code based on direct annotations in GOA files archived on different periods, and then weights entries of the association matrix via estimated ratios and propagates weights to ancestors of direct annotations using GO hierarchy. Finally, it integrates evidence-weighted association matrix and aggregated votes to predict noisy annotations. Experiments on archived GOA files of six model species (H. sapiens, A. thaliana, S. cerevisiae, G. gallus, B. Taurus and M. musculus) demonstrate that NoGOA achieves significantly better results than other related methods and removing noisy annotations improves the performance of gene function prediction. The comparative study justifies the effectiveness of integrating evidence codes with sparse representation for predicting noisy GO annotations. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=NoGOA .
Plann: A command-line application for annotating plastome sequences.

Science.gov (United States)

Huang, Daisie I; Cronk, Quentin C B

2015-08-01

Plann automates the process of annotating a plastome sequence in GenBank format for either downstream processing or for GenBank submission by annotating a new plastome based on a similar, well-annotated plastome. Plann is a Perl script to be executed on the command line. Plann compares a new plastome sequence to the features annotated in a reference plastome and then shifts the intervals of any matching features to the locations in the new plastome. Plann's output can be used in the National Center for Biotechnology Information's tbl2asn to create a Sequin file for GenBank submission. Unlike Web-based annotation packages, Plann is a locally executable script that will accurately annotate a plastome sequence to a locally specified reference plastome. Because it executes from the command line, it is ready to use in other software pipelines and can be easily rerun as a draft plastome is improved.
Sources to the landscape - detailed spatiotemporal analysis of 200 years Danish landscape dynamics using unexploited historical maps and aerial photos

DEFF Research Database (Denmark)

Svenningsen, Stig Roar; Christensen, Andreas Aagaard; Dupont, Henrik

to declassification of military maps and aerial photos from the cold war, only relatively few sources have been made available to researchers due to lacking efforts in digitalization and related services. And even though the digitizing of cartographic material has been accelerated, the digitally available materials...... or to the commercial photo series from the last 20 years. This poster outlines a new research project focusing on the potential of unexploited cartographic sources for detailed analysis of the dynamic of the Danish landscape between 1800 – 2000. The project draws on cartographic sources available in Danish archives...... of material in landscape change studies giving a high temporal and spatial resolution. The project also deals with the opportunity and constrain of comparing different cartographic sources with diverse purpose and time of production, e.g. different scale and quality of aerial photos or the difference between...
Semantator: annotating clinical narratives with semantic web ontologies.

Science.gov (United States)

Song, Dezhao; Chute, Christopher G; Tao, Cui

2012-01-01

To facilitate clinical research, clinical data needs to be stored in a machine processable and understandable way. Manual annotating clinical data is time consuming. Automatic approaches (e.g., Natural Language Processing systems) have been adopted to convert such data into structured formats; however, the quality of such automatically extracted data may not always be satisfying. In this paper, we propose Semantator, a semi-automatic tool for document annotation with Semantic Web ontologies. With a loaded free text document and an ontology, Semantator supports the creation/deletion of ontology instances for any document fragment, linking/disconnecting instances with the properties in the ontology, and also enables automatic annotation by connecting to the NCBO annotator and cTAKES. By representing annotations in Semantic Web standards, Semantator supports reasoning based upon the underlying semantics of the owl:disjointWith and owl:equivalentClass predicates. We present discussions based on user experiences of using Semantator.
Pathway enrichment analysis approach based on topological structure and updated annotation of pathway.

Science.gov (United States)

Yang, Qian; Wang, Shuyuan; Dai, Enyu; Zhou, Shunheng; Liu, Dianming; Liu, Haizhou; Meng, Qianqian; Jiang, Bin; Jiang, Wei

2017-08-16

Pathway enrichment analysis has been widely used to identify cancer risk pathways, and contributes to elucidating the mechanism of tumorigenesis. However, most of the existing approaches use the outdated pathway information and neglect the complex gene interactions in pathway. Here, we first reviewed the existing widely used pathway enrichment analysis approaches briefly, and then, we proposed a novel topology-based pathway enrichment analysis (TPEA) method, which integrated topological properties and global upstream/downstream positions of genes in pathways. We compared TPEA with four widely used pathway enrichment analysis tools, including database for annotation, visualization and integrated discovery (DAVID), gene set enrichment analysis (GSEA), centrality-based pathway enrichment (CePa) and signaling pathway impact analysis (SPIA), through analyzing six gene expression profiles of three tumor types (colorectal cancer, thyroid cancer and endometrial cancer). As a result, we identified several well-known cancer risk pathways that could not be obtained by the existing tools, and the results of TPEA were more stable than that of the other tools in analyzing different data sets of the same cancer. Ultimately, we developed an R package to implement TPEA, which could online update KEGG pathway information and is available at the Comprehensive R Archive Network (CRAN): https://cran.r-project.org/web/packages/TPEA/. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
MEETING: Chlamydomonas Annotation Jamboree - October 2003

Energy Technology Data Exchange (ETDEWEB)

Grossman, Arthur R

2007-04-13

Shotgun sequencing of the nuclear genome of Chlamydomonas reinhardtii (Chlamydomonas throughout) was performed at an approximate 10X coverage by JGI. Roughly half of the genome is now contained on 26 scaffolds, all of which are at least 1.6 Mb, and the coverage of the genome is ~95%. There are now over 200,000 cDNA sequence reads that we have generated as part of the Chlamydomonas genome project (Grossman, 2003; Shrager et al., 2003; Grossman et al. 2007; Merchant et al., 2007); other sequences have also been generated by the Kasuza sequence group (Asamizu et al., 1999; Asamizu et al., 2000) or individual laboratories that have focused on specific genes. Shrager et al. (2003) placed the reads into distinct contigs (an assemblage of reads with overlapping nucleotide sequences), and contigs that group together as part of the same genes have been designated ACEs (assembly of contigs generated from EST information). All of the reads have also been mapped to the Chlamydomonas nuclear genome and the cDNAs and their corresponding genomic sequences have been reassembled, and the resulting assemblage is called an ACEG (an Assembly of contiguous EST sequences supported by genomic sequence) (Jain et al., 2007). Most of the unique genes or ACEGs are also represented by gene models that have been generated by the Joint Genome Institute (JGI, Walnut Creek, CA). These gene models have been placed onto the DNA scaffolds and are presented as a track on the Chlamydomonas genome browser associated with the genome portal (http://genome.jgi-psf.org/Chlre3/Chlre3.home.html). Ultimately, the meeting grant awarded by DOE has helped enormously in the development of an annotation pipeline (a set of guidelines used in the annotation of genes) and resulted in high quality annotation of over 4,000 genes; the annotators were from both Europe and the USA. Some of the people who led the annotation initiative were Arthur Grossman, Olivier Vallon, and Sabeeha Merchant (with many individual
Analysis of the development of land use in the Morava River floodplain, with special emphasis on the landscape matrix

Directory of Open Access Journals (Sweden)

Kilianová Helena

2017-03-01

Full Text Available The results of an analysis of land use development in the Morava River floodplain (Czech Republic using GIS from 1836 to the present, are the subject of this article. The results are based on the analysis of historical maps, using the landscape matrix assessment of the Morava River floodplain. The final analyses were processed from land use maps of the floodplain at a scale of 1 : 25,000 in five time horizons. These maps were compared with the present state of landscape by GIS methods. The study area was assessed according to five geomorphological areas from the northern/higher part to the southern/lower part of floodplain. In 1836 the landscape matrix of the floodplain was composed of meadows and forests. Forest components decreased minimally but the changes are more important. The grassland area (meadows and pastures decreased but arable land, as well as settlements, increased very significantly. In the 1950s the landscape matrix was composed of a mosaic of alluvial forests, meadows and arable land. Currently, the predominant landscape matrix consists of arable land and isolated forest complexes.
Computational prediction of over-annotated protein-coding genes in the genome of Agrobacterium tumefaciens strain C58

International Nuclear Information System (INIS)

Yu Jia-Feng; Sui Tian-Xiang; Wang Ji-Hua; Wang Hong-Mei; Wang Chun-Ling; Jing Li

2015-01-01

Agrobacterium tumefaciens strain C58 is a type of pathogen that can cause tumors in some dicotyledonous plants. Ever since the genome of A. tumefaciens strain C58 was sequenced, the quality of annotation of its protein-coding genes has been queried continually, because the annotation varies greatly among different databases. In this paper, the questionable hypothetical genes were re-predicted by integrating the TN curve and Z curve methods. As a result, 30 genes originally annotated as “hypothetical” were discriminated as being non-coding sequences. By testing the re-prediction program 10 times on data sets composed of the function-known genes, the mean accuracy of 99.99% and mean Matthews correlation coefficient value of 0.9999 were obtained. Further sequence analysis and COG analysis showed that the re-annotation results were very reliable. This work can provide an efficient tool and data resources for future studies of A. tumefaciens strain C58. (special topic)
Ontology modularization to improve semantic medical image annotation.

Science.gov (United States)

Wennerberg, Pinar; Schulz, Klaus; Buitelaar, Paul

2011-02-01

Searching for medical images and patient reports is a significant challenge in a clinical setting. The contents of such documents are often not described in sufficient detail thus making it difficult to utilize the inherent wealth of information contained within them. Semantic image annotation addresses this problem by describing the contents of images and reports using medical ontologies. Medical images and patient reports are then linked to each other through common annotations. Subsequently, search algorithms can more effectively find related sets of documents on the basis of these semantic descriptions. A prerequisite to realizing such a semantic search engine is that the data contained within should have been previously annotated with concepts from medical ontologies. One major challenge in this regard is the size and complexity of medical ontologies as annotation sources. Manual annotation is particularly time consuming labor intensive in a clinical environment. In this article we propose an approach to reducing the size of clinical ontologies for more efficient manual image and text annotation. More precisely, our goal is to identify smaller fragments of a large anatomy ontology that are relevant for annotating medical images from patients suffering from lymphoma. Our work is in the area of ontology modularization, which is a recent and active field of research. We describe our approach, methods and data set in detail and we discuss our results. Copyright © 2010 Elsevier Inc. All rights reserved.
Landscape Classifications for Landscape Metrics-based Assessment of Urban Heat Island: A Comparative Study

International Nuclear Information System (INIS)

Zhao, X F; Deng, L; Wang, H N; Chen, F; Hua, L Z

2014-01-01

In recent years, some studies have been carried out on the landscape analysis of urban thermal patterns. With the prevalence of thermal landscape, a key problem has come forth, which is how to classify thermal landscape into thermal patches. Current researches used different methods of thermal landscape classification such as standard deviation method (SD) and R method. To find out the differences, a comparative study was carried out in Xiamen using a 20-year winter time-serial Landsat images. After the retrieval of land surface temperature (LST), the thermal landscape was classified using the two methods separately. Then landscape metrics, 6 at class level and 14 at landscape level, were calculated and analyzed using Fragstats 3.3. We found that: (1) at the class level, all the metrics with SD method were evened and did not show an obvious trend along with the process of urbanization, while the R method could. (2) While at the landscape level, 6 of the 14 metrics remains the similar trends, 5 were different at local turn points of the curve, 3 of them differed completely in the shape of curves. (3) When examined with visual interpretation, SD method tended to exaggerate urban heat island effects than the R method
Comparative sequence analysis of Sordaria macrospora and Neurospora crassa as a means to improve genome annotation.

Science.gov (United States)

Nowrousian, Minou; Würtz, Christian; Pöggeler, Stefanie; Kück, Ulrich

2004-03-01

One of the most challenging parts of large scale sequencing projects is the identification of functional elements encoded in a genome. Recently, studies of genomes of up to six different Saccharomyces species have demonstrated that a comparative analysis of genome sequences from closely related species is a powerful approach to identify open reading frames and other functional regions within genomes [Science 301 (2003) 71, Nature 423 (2003) 241]. Here, we present a comparison of selected sequences from Sordaria macrospora to their corresponding Neurospora crassa orthologous regions. Our analysis indicates that due to the high degree of sequence similarity and conservation of overall genomic organization, S. macrospora sequence information can be used to simplify the annotation of the N. crassa genome.
A methodology for creating greenways through multidisciplinary sustainable landscape planning.

Science.gov (United States)

Pena, Selma Beatriz; Abreu, Maria Manuela; Teles, Rui; Espírito-Santo, Maria Dalila

2010-01-01

This research proposes a methodology for defining greenways via sustainable planning. This approach includes the analysis and discussion of culture and natural processes that occur in the landscape. The proposed methodology is structured in three phases: eco-cultural analysis; synthesis and diagnosis; and proposal. An interdisciplinary approach provides an assessment of the relationships between landscape structure and landscape dynamics, which are essential to any landscape management or land use. The landscape eco-cultural analysis provides a biophysical, dynamic (geomorphologic rate), vegetation (habitats from directive 92/43/EEC) and cultural characterisation. The knowledge obtained by this analysis then supports the definition of priority actions to stabilise the landscape and the management measures for the habitats. After the analysis and diagnosis phases, a proposal for the development of sustainable greenways can be achieved. This methodology was applied to a study area of the Azambuja Municipality in the Lisbon Metropolitan Area (Portugal). The application of the proposed methodology to the study area shows that landscape stability is crucial for greenway users in order to appreciate the landscape and its natural and cultural elements in a sustainable and healthy way, both by cycling or by foot. A balanced landscape will increase the value of greenways and in return, they can develop socio-economic activities with benefits for rural communities. Copyright 2009 Elsevier Ltd. All rights reserved.
[Prescription annotations in Welfare Pharmacy].

Science.gov (United States)

Han, Yi

2018-03-01

Welfare Pharmacy contains medical formulas documented by the government and official prescriptions used by the official pharmacy in the pharmaceutical process. In the last years of Southern Song Dynasty, anonyms gave a lot of prescription annotations, made textual researches for the name, source, composition and origin of the prescriptions, and supplemented important historical data of medical cases and researched historical facts. The annotations of Welfare Pharmacy gathered the essence of medical theory, and can be used as precious materials to correctly understand the syndrome differentiation, compatibility regularity and clinical application of prescriptions. This article deeply investigated the style and form of the prescription annotations in Welfare Pharmacy, the name of prescriptions and the evolution of terminology, the major functions of the prescriptions, processing methods, instructions for taking medicine and taboos of prescriptions, the medical cases and clinical efficacy of prescriptions, the backgrounds, sources, composition and cultural meanings of prescriptions, proposed that the prescription annotations played an active role in the textual dissemination, patent medicine production and clinical diagnosis and treatment of Welfare Pharmacy. This not only helps understand the changes in the names and terms of traditional Chinese medicines in Welfare Pharmacy, but also provides the basis for understanding the knowledge sources, compatibility regularity, important drug innovations and clinical medications of prescriptions in Welfare Pharmacy. Copyright© by the Chinese Pharmaceutical Association.
Annotating abstract pronominal anaphora in the DAD project

DEFF Research Database (Denmark)

Navarretta, Costanza; Olsen, Sussi Anni

2008-01-01

n this paper we present an extension of the MATE/GNOME annotation scheme for anaphora (Poesio 2004) which accounts for abstract anaphora in Danish and Italian. By abstract anaphora it is here meant pronouns whose linguistic antecedents are verbal phrases, clauses and discourse segments. The exten......n this paper we present an extension of the MATE/GNOME annotation scheme for anaphora (Poesio 2004) which accounts for abstract anaphora in Danish and Italian. By abstract anaphora it is here meant pronouns whose linguistic antecedents are verbal phrases, clauses and discourse segments....... The extended scheme, which we call the DAD annotation scheme, allows to annotate information about abstract anaphora which is important to investigate their use, see Webber (1988), Gundel et al. (2003), Navarretta (2004) and which can influence their automatic treatment. Intercoder agreement scores obtained...... by applying the DAD annotation scheme on texts and dialogues in the two languages are given and show that th information proposed in the scheme can be recognised in a reliable way....
Annotated bibliography

International Nuclear Information System (INIS)

1997-08-01

Under a cooperative agreement with the U.S. Department of Energy's Office of Science and Technology, Waste Policy Institute (WPI) is conducting a five-year research project to develop a research-based approach for integrating communication products in stakeholder involvement related to innovative technology. As part of the research, WPI developed this annotated bibliography which contains almost 100 citations of articles/books/resources involving topics related to communication and public involvement aspects of deploying innovative cleanup technology. To compile the bibliography, WPI performed on-line literature searches (e.g., Dialog, International Association of Business Communicators Public Relations Society of America, Chemical Manufacturers Association, etc.), consulted past years proceedings of major environmental waste cleanup conferences (e.g., Waste Management), networked with professional colleagues and DOE sites to gather reports or case studies, and received input during the August 1996 Research Design Team meeting held to discuss the project's research methodology. Articles were selected for annotation based upon their perceived usefulness to the broad range of public involvement and communication practitioners
Landscape Builder: Software for the creation of initial landscapes for LANDIS from FIA data

Directory of Open Access Journals (Sweden)

William Dijak

2013-06-01

Full Text Available I developed Landscape Builder to create spatially explicit landscapes as starting conditions for LANDIS Pro 7.0 and LANDIS II landscape forest simulation models from classified satellite imagery and Forest Inventory and Analysis (FIA data collected over multiple years. LANDIS Pro and LANDIS II models project future landscapes by simulating tree growth, tree species succession, disease, insects, fire, wind, and management disturbance. Landscape Builder uses inventory plot attributes from the FIA inventory database, FIA unit map, National Forest type map, National Forest size class map, land cover map, and landform map to assign FIA plot attributes to raster pixels representing a real forest landscape. In addition to creating a detailed map of current (initial forest landscape conditions, the software produces specific files required for use in LANDIS Pro 7.0 or LANDIS II format. Other tools include the ability to create a dominant species and age-class map from previously created LANDIS maps, a tool to create a dominant species and age-class map from a stand map and field plot data, and a tool to convert between Esri ascii rasters and Erdas file format types.
Supporting Keyword Search for Image Retrieval with Integration of Probabilistic Annotation

Directory of Open Access Journals (Sweden)

Tie Hua Zhou

2015-05-01

Full Text Available The ever-increasing quantities of digital photo resources are annotated with enriching vocabularies to form semantic annotations. Photo-sharing social networks have boosted the need for efficient and intuitive querying to respond to user requirements in large-scale image collections. In order to help users formulate efficient and effective image retrieval, we present a novel integration of a probabilistic model based on keyword query architecture that models the probability distribution of image annotations: allowing users to obtain satisfactory results from image retrieval via the integration of multiple annotations. We focus on the annotation integration step in order to specify the meaning of each image annotation, thus leading to the most representative annotations of the intent of a keyword search. For this demonstration, we show how a probabilistic model has been integrated to semantic annotations to allow users to intuitively define explicit and precise keyword queries in order to retrieve satisfactory image results distributed in heterogeneous large data sources. Our experiments on SBU (collected by Stony Brook University database show that (i our integrated annotation contains higher quality representatives and semantic matches; and (ii the results indicating annotation integration can indeed improve image search result quality.
GI-POP: a combinational annotation and genomic island prediction pipeline for ongoing microbial genome projects.

Science.gov (United States)

Lee, Chi-Ching; Chen, Yi-Ping Phoebe; Yao, Tzu-Jung; Ma, Cheng-Yu; Lo, Wei-Cheng; Lyu, Ping-Chiang; Tang, Chuan Yi

2013-04-10

Sequencing of microbial genomes is important because of microbial-carrying antibiotic and pathogenetic activities. However, even with the help of new assembling software, finishing a whole genome is a time-consuming task. In most bacteria, pathogenetic or antibiotic genes are carried in genomic islands. Therefore, a quick genomic island (GI) prediction method is useful for ongoing sequencing genomes. In this work, we built a Web server called GI-POP (http://gipop.life.nthu.edu.tw) which integrates a sequence assembling tool, a functional annotation pipeline, and a high-performance GI predicting module, in a support vector machine (SVM)-based method called genomic island genomic profile scanning (GI-GPS). The draft genomes of the ongoing genome projects in contigs or scaffolds can be submitted to our Web server, and it provides the functional annotation and highly probable GI-predicting results. GI-POP is a comprehensive annotation Web server designed for ongoing genome project analysis. Researchers can perform annotation and obtain pre-analytic information include possible GIs, coding/non-coding sequences and functional analysis from their draft genomes. This pre-analytic system can provide useful information for finishing a genome sequencing project. Copyright © 2012 Elsevier B.V. All rights reserved.
Impact of Climate Variability and Landscape Patterns on Water Budget and Nutrient Loads in a Peri-urban Watershed: A Coupled Analysis Using Process-based Hydrological Model and Landscape Indices

Science.gov (United States)

Li, Chongwei; Zhang, Yajuan; Kharel, Gehendra; Zou, Chris B.

2018-06-01

Nutrient discharge into peri-urban streams and reservoirs constitutes a significant pressure on environmental management, but quantitative assessment of non-point source pollution under climate variability in fast changing peri-urban watersheds is challenging. Soil and Water Assessment Tool (SWAT) was used to simulate water budget and nutrient loads for landscape patterns representing a 30-year progression of urbanization in a peri-urban watershed near Tianjin metropolis, China. A suite of landscape pattern indices was related to nitrogen (N) and phosphorous (P) loads under dry and wet climate using CANOCO redundancy analysis. The calibrated SWAT model was adequate to simulate runoff and nutrient loads for this peri-urban watershed, with Nash-Sutcliffe coefficient (NSE) and coefficient of determination ( R 2) > 0.70 and percentage bias (PBIAS) between -7 and +18 for calibration and validation periods. With the progression of urbanization, forest remained the main "sink" landscape while cultivated and urban lands remained the main "source" landscapes with the role of orchard and grassland being uncertain and changing with time. Compared to 1984, the landscape use pattern in 2013 increased nutrient discharge by 10%. Nutrient loads modelled under wet climate were 3-4 times higher than that under dry climate for the same landscape pattern. Results indicate that climate change could impose a far greater impact on runoff and nutrient discharge in a peri-urban watershed than landscape pattern change.
Impact of Climate Variability and Landscape Patterns on Water Budget and Nutrient Loads in a Peri-urban Watershed: A Coupled Analysis Using Process-based Hydrological Model and Landscape Indices.

Science.gov (United States)

Li, Chongwei; Zhang, Yajuan; Kharel, Gehendra; Zou, Chris B

2018-06-01

Nutrient discharge into peri-urban streams and reservoirs constitutes a significant pressure on environmental management, but quantitative assessment of non-point source pollution under climate variability in fast changing peri-urban watersheds is challenging. Soil and Water Assessment Tool (SWAT) was used to simulate water budget and nutrient loads for landscape patterns representing a 30-year progression of urbanization in a peri-urban watershed near Tianjin metropolis, China. A suite of landscape pattern indices was related to nitrogen (N) and phosphorous (P) loads under dry and wet climate using CANOCO redundancy analysis. The calibrated SWAT model was adequate to simulate runoff and nutrient loads for this peri-urban watershed, with Nash-Sutcliffe coefficient (NSE) and coefficient of determination (R 2 ) > 0.70 and percentage bias (PBIAS) between -7 and +18 for calibration and validation periods. With the progression of urbanization, forest remained the main "sink" landscape while cultivated and urban lands remained the main "source" landscapes with the role of orchard and grassland being uncertain and changing with time. Compared to 1984, the landscape use pattern in 2013 increased nutrient discharge by 10%. Nutrient loads modelled under wet climate were 3-4 times higher than that under dry climate for the same landscape pattern. Results indicate that climate change could impose a far greater impact on runoff and nutrient discharge in a peri-urban watershed than landscape pattern change.

Quick Pad Tagger : An Efficient Graphical User Interface for Building Annotated Corpora with Multiple Annotation Layers

OpenAIRE

Marc Schreiber; Kai Barkschat; Bodo Kraft; Albert Zundorf

2015-01-01

More and more domain specific applications in the internet make use of Natural Language Processing (NLP) tools (e. g. Information Extraction systems). The output quality of these applications relies on the output quality of the used NLP tools. Often, the quality can be increased by annotating a domain specific corpus. However, annotating a corpus is a time consuming and exhaustive task. To reduce the annota tion time we present...
Assessment of features for automatic CTG analysis based on expert annotation.

Science.gov (United States)

Chudácek, Vacláv; Spilka, Jirí; Lhotská, Lenka; Janku, Petr; Koucký, Michal; Huptych, Michal; Bursa, Miroslav

2011-01-01

Cardiotocography (CTG) is the monitoring of fetal heart rate (FHR) and uterine contractions (TOCO) since 1960's used routinely by obstetricians to detect fetal hypoxia. The evaluation of the FHR in clinical settings is based on an evaluation of macroscopic morphological features and so far has managed to avoid adopting any achievements from the HRV research field. In this work, most of the ever-used features utilized for FHR characterization, including FIGO, HRV, nonlinear, wavelet, and time and frequency domain features, are investigated and the features are assessed based on their statistical significance in the task of distinguishing the FHR into three FIGO classes. Annotation derived from the panel of experts instead of the commonly utilized pH values was used for evaluation of the features on a large data set (552 records). We conclude the paper by presenting the best uncorrelated features and their individual rank of importance according to the meta-analysis of three different ranking methods. Number of acceleration and deceleration, interval index, as well as Lempel-Ziv complexity and Higuchi's fractal dimension are among the top five features.
The Viking viewer for connectomics: scalable multi-user annotation and summarization of large volume data sets.

Science.gov (United States)

Anderson, J R; Mohammed, S; Grimm, B; Jones, B W; Koshevoy, P; Tasdizen, T; Whitaker, R; Marc, R E

2011-01-01

Modern microscope automation permits the collection of vast amounts of continuous anatomical imagery in both two and three dimensions. These large data sets present significant challenges for data storage, access, viewing, annotation and analysis. The cost and overhead of collecting and storing the data can be extremely high. Large data sets quickly exceed an individual's capability for timely analysis and present challenges in efficiently applying transforms, if needed. Finally annotated anatomical data sets can represent a significant investment of resources and should be easily accessible to the scientific community. The Viking application was our solution created to view and annotate a 16.5 TB ultrastructural retinal connectome volume and we demonstrate its utility in reconstructing neural networks for a distinctive retinal amacrine cell class. Viking has several key features. (1) It works over the internet using HTTP and supports many concurrent users limited only by hardware. (2) It supports a multi-user, collaborative annotation strategy. (3) It cleanly demarcates viewing and analysis from data collection and hosting. (4) It is capable of applying transformations in real-time. (5) It has an easily extensible user interface, allowing addition of specialized modules without rewriting the viewer. © 2010 The Authors Journal of Microscopy © 2010 The Royal Microscopical Society.
PanCoreGen – profiling, detecting, annotating protein-coding genes in microbial genomes

Science.gov (United States)

Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V.

2015-01-01

A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen – a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars – Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. PMID:26456591
Extending eScience Provenance with User-Submitted Semantic Annotations

Science.gov (United States)

Michaelis, J.; Zednik, S.; West, P.; Fox, P. A.; McGuinness, D. L.

2010-12-01

eScience based systems generate provenance of their data products, related to such things as: data processing, data collection conditions, expert evaluation, and data product quality. Recent advances in web-based technology offer users the possibility of making annotations to both data products and steps in accompanying provenance traces, thereby expanding the utility of such provenance for others. These contributing users may have varying backgrounds, ranging from system experts to outside domain experts to citizen scientists. Furthermore, such users may wish to make varying types of annotations - ranging from documenting the purpose of a provenance step to raising concerns about the quality of data dependencies. Semantic Web technologies allow for such kinds of rich annotations to be made to provenance through the use of ontology vocabularies for (i) organizing provenance, and (ii) organizing user/annotation classifications. Furthermore, through Linked Data practices, Semantic linkages may be made from provenance steps to external data of interest. A desire for Semantically-annotated provenance has been motivated by data management issues in the Mauna Loa Solar Observatory’s (MLSO) Advanced Coronal Observing System (ACOS). In ACOS, photomoeter-based readings are taken of solar activity and subsequently processed into final data products consumable by end users. At intermediate stages of ACOS processing, factors such as evaluations by human experts and weather conditions are logged, which could impact data product quality. If such factors are linked via user-submitted annotations to provenance, it could be significantly beneficial for other users. Likewise, the background of a user could impact the credibility of their annotations. For example, an annotation made by a citizen scientist describing the purpose of a provenance step may not be as reliable as a similar annotation made by an ACOS project member. For this work, we have developed a software package that
Flowscapes : Infrastructure as landscape, landscape as infrastructure. Graduation Lab Landscape Architecture 2012/2013

NARCIS (Netherlands)

Nijhuis, S.; Jauslin, D.; De Vries, C.

2012-01-01

Flowscapes explores infrastructure as a type of landscape and landscape as a type of infrastructure, and is focused on landscape architectonic design of transportation-, green- and water infrastructures. These landscape infrastructures are considered armatures for urban and rural development. With
Harnessing Collaborative Annotations on Online Formative Assessments

Science.gov (United States)

Lin, Jian-Wei; Lai, Yuan-Cheng

2013-01-01

This paper harnesses collaborative annotations by students as learning feedback on online formative assessments to improve the learning achievements of students. Through the developed Web platform, students can conduct formative assessments, collaboratively annotate, and review historical records in a convenient way, while teachers can generate…
Crowdsourcing and annotating NER for Twitter #drift

DEFF Research Database (Denmark)

Fromreide, Hege; Hovy, Dirk; Søgaard, Anders

2014-01-01

We present two new NER datasets for Twitter; a manually annotated set of 1,467 tweets (kappa=0.942) and a set of 2,975 expert-corrected, crowdsourced NER annotated tweets from the dataset described in Finin et al. (2010). In our experiments with these datasets, we observe two important points: (a......) language drift on Twitter is significant, and while off-the-shelf systems have been reported to perform well on in-sample data, they often perform poorly on new samples of tweets, (b) state-of-the-art performance across various datasets can beobtained from crowdsourced annotations, making it more feasible...
SNAD: sequence name annotation-based designer

Directory of Open Access Journals (Sweden)

Gorbalenya Alexander E

2009-08-01

Full Text Available Abstract Background A growing diversity of biological data is tagged with unique identifiers (UIDs associated with polynucleotides and proteins to ensure efficient computer-mediated data storage, maintenance, and processing. These identifiers, which are not informative for most people, are often substituted by biologically meaningful names in various presentations to facilitate utilization and dissemination of sequence-based knowledge. This substitution is commonly done manually that may be a tedious exercise prone to mistakes and omissions. Results Here we introduce SNAD (Sequence Name Annotation-based Designer that mediates automatic conversion of sequence UIDs (associated with multiple alignment or phylogenetic tree, or supplied as plain text list into biologically meaningful names and acronyms. This conversion is directed by precompiled or user-defined templates that exploit wealth of annotation available in cognate entries of external databases. Using examples, we demonstrate how this tool can be used to generate names for practical purposes, particularly in virology. Conclusion A tool for controllable annotation-based conversion of sequence UIDs into biologically meaningful names and acronyms has been developed and placed into service, fostering links between quality of sequence annotation, and efficiency of communication and knowledge dissemination among researchers.
ACID: annotation of cassette and integron data

Directory of Open Access Journals (Sweden)

Stokes Harold W

2009-04-01

Full Text Available Abstract Background Although integrons and their associated gene cassettes are present in ~10% of bacteria and can represent up to 3% of the genome in which they are found, very few have been properly identified and annotated in public databases. These genetic elements have been overlooked in comparison to other vectors that facilitate lateral gene transfer between microorganisms. Description By automating the identification of integron integrase genes and of the non-coding cassette-associated attC recombination sites, we were able to assemble a database containing all publicly available sequence information regarding these genetic elements. Specialists manually curated the database and this information was used to improve the automated detection and annotation of integrons and their encoded gene cassettes. ACID (annotation of cassette and integron data can be searched using a range of queries and the data can be downloaded in a number of formats. Users can readily annotate their own data and integrate it into ACID using the tools provided. Conclusion ACID is a community resource providing easy access to annotations of integrons and making tools available to detect them in novel sequence data. ACID also hosts a forum to prompt integron-related discussion, which can hopefully lead to a more universal definition of this genetic element.
Landscape genetics as a tool for conservation planning: predicting the effects of landscape change on gene flow.

Science.gov (United States)

van Strien, Maarten J; Keller, Daniela; Holderegger, Rolf; Ghazoul, Jaboury; Kienast, Felix; Bolliger, Janine

2014-03-01

For conservation managers, it is important to know whether landscape changes lead to increasing or decreasing gene flow. Although the discipline of landscape genetics assesses the influence of landscape elements on gene flow, no studies have yet used landscape-genetic models to predict gene flow resulting from landscape change. A species that has already been severely affected by landscape change is the large marsh grasshopper (Stethophyma grossum), which inhabits moist areas in fragmented agricultural landscapes in Switzerland. From transects drawn between all population pairs within maximum dispersal distance (landscape composition as well as some measures of habitat configuration. Additionally, a complete sampling of all populations in our study area allowed incorporating measures of population topology. These measures together with the landscape metrics formed the predictor variables in linear models with gene flow as response variable (F(ST) and mean pairwise assignment probability). With a modified leave-one-out cross-validation approach, we selected the model with the highest predictive accuracy. With this model, we predicted gene flow under several landscape-change scenarios, which simulated construction, rezoning or restoration projects, and the establishment of a new population. For some landscape-change scenarios, significant increase or decrease in gene flow was predicted, while for others little change was forecast. Furthermore, we found that the measures of population topology strongly increase model fit in landscape genetic analysis. This study demonstrates the use of predictive landscape-genetic models in conservation and landscape planning.
A Landscape Analysis to Understand Orientation of Honey Bee (Hymenoptera: Apidae) Drones in Puerto Rico.

Science.gov (United States)

Galindo-Cardona, A; Monmany, A C; Diaz, G; Giray, T

2015-08-01

Honey bees [Apis mellifera L. (Apidae, Hymenoptera)] show spatial learning behavior or orientation, in which animals make use of structured home ranges for their daily activities. Worker (female) orientation has been studied more extensively than drone (male) orientation. Given the extensive and large flight range of drones as part of their reproductive biology, the study of drone orientation may provide new insight on landscape features important for orientation. We report the return rate and orientation of drones released at three distances (1, 2, and 4 km) and at the four cardinal points from an apiary located in Gurabo, Puerto Rico. We used high-resolution aerial photographs to describe landscape characteristics at the releasing sites and at the apiary. Analyses of variance were used to test significance among returning times from different distances and directions. A principal components analysis was used to describe the landscape at the releasing sites and generalized linear models were used to identify landscape characteristics that influenced the returning times of drones. Our results showed for the first time that drones are able to return from as far as 4 km from the colony. Distance to drone congregation area, orientation, and tree lines were the most important landscape characteristics influencing drone return rate. We discuss the role of landscape in drone orientation. © The Authors 2015. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
CGKB: an annotation knowledge base for cowpea (Vigna unguiculata L. methylation filtered genomic genespace sequences

Directory of Open Access Journals (Sweden)

Spraggins Thomas A

2007-04-01

Full Text Available Abstract Background Cowpea [Vigna unguiculata (L. Walp.] is one of the most important food and forage legumes in the semi-arid tropics because of its ability to tolerate drought and grow on poor soils. It is cultivated mostly by poor farmers in developing countries, with 80% of production taking place in the dry savannah of tropical West and Central Africa. Cowpea is largely an underexploited crop with relatively little genomic information available for use in applied plant breeding. The goal of the Cowpea Genomics Initiative (CGI, funded by the Kirkhouse Trust, a UK-based charitable organization, is to leverage modern molecular genetic tools for gene discovery and cowpea improvement. One aspect of the initiative is the sequencing of the gene-rich region of the cowpea genome (termed the genespace recovered using methylation filtration technology and providing annotation and analysis of the sequence data. Description CGKB, Cowpea Genespace/Genomics Knowledge Base, is an annotation knowledge base developed under the CGI. The database is based on information derived from 298,848 cowpea genespace sequences (GSS isolated by methylation filtering of genomic DNA. The CGKB consists of three knowledge bases: GSS annotation and comparative genomics knowledge base, GSS enzyme and metabolic pathway knowledge base, and GSS simple sequence repeats (SSRs knowledge base for molecular marker discovery. A homology-based approach was applied for annotations of the GSS, mainly using BLASTX against four public FASTA formatted protein databases (NCBI GenBank Proteins, UniProtKB-Swiss-Prot, UniprotKB-PIR (Protein Information Resource, and UniProtKB-TrEMBL. Comparative genome analysis was done by BLASTX searches of the cowpea GSS against four plant proteomes from Arabidopsis thaliana, Oryza sativa, Medicago truncatula, and Populus trichocarpa. The possible exons and introns on each cowpea GSS were predicted using the HMM-based Genscan gene predication program and the
Comprehensive Feature-Based Landscape Analysis of Continuous and Constrained Optimization Problems Using the R-Package flacco

OpenAIRE

Kerschke, Pascal

2017-01-01

Choosing the best-performing optimizer(s) out of a portfolio of optimization algorithms is usually a difficult and complex task. It gets even worse, if the underlying functions are unknown, i.e., so-called Black-Box problems, and function evaluations are considered to be expensive. In the case of continuous single-objective optimization problems, Exploratory Landscape Analysis (ELA) - a sophisticated and effective approach for characterizing the landscapes of such problems by means of numeric...
Creating Gaze Annotations in Head Mounted Displays

DEFF Research Database (Denmark)

Mardanbeigi, Diako; Qvarfordt, Pernilla

2015-01-01

To facilitate distributed communication in mobile settings, we developed GazeNote for creating and sharing gaze annotations in head mounted displays (HMDs). With gaze annotations it possible to point out objects of interest within an image and add a verbal description. To create an annota- tion...
SAS- Semantic Annotation Service for Geoscience resources on the web

Science.gov (United States)

Elag, M.; Kumar, P.; Marini, L.; Li, R.; Jiang, P.

2015-12-01

There is a growing need for increased integration across the data and model resources that are disseminated on the web to advance their reuse across different earth science applications. Meaningful reuse of resources requires semantic metadata to realize the semantic web vision for allowing pragmatic linkage and integration among resources. Semantic metadata associates standard metadata with resources to turn them into semantically-enabled resources on the web. However, the lack of a common standardized metadata framework as well as the uncoordinated use of metadata fields across different geo-information systems, has led to a situation in which standards and related Standard Names abound. To address this need, we have designed SAS to provide a bridge between the core ontologies required to annotate resources and information systems in order to enable queries and analysis over annotation from a single environment (web). SAS is one of the services that are provided by the Geosematnic framework, which is a decentralized semantic framework to support the integration between models and data and allow semantically heterogeneous to interact with minimum human intervention. Here we present the design of SAS and demonstrate its application for annotating data and models. First we describe how predicates and their attributes are extracted from standards and ingested in the knowledge-base of the Geosemantic framework. Then we illustrate the application of SAS in annotating data managed by SEAD and annotating simulation models that have web interface. SAS is a step in a broader approach to raise the quality of geoscience data and models that are published on the web and allow users to better search, access, and use of the existing resources based on standard vocabularies that are encoded and published using semantic technologies.
FRAGSTATS: spatial pattern analysis program for quantifying landscape structure.

Science.gov (United States)

Kevin McGarigal; Barbara J. Marks

1995-01-01

This report describes a program, FRAGSTATS, developed to quantify landscape structure. FRAGSTATS offers a comprehensive choice of landscape metrics and was designed to be as versatile as possible. The program is almost completely automated and thus requires little technical training. Two separate versions of FRAGSTATS exist: one for vector images and one for raster...
Generation of silver standard concept annotations from biomedical texts with special relevance to phenotypes.

Science.gov (United States)

Oellrich, Anika; Collier, Nigel; Smedley, Damian; Groza, Tudor

2015-01-01

Electronic health records and scientific articles possess differing linguistic characteristics that may impact the performance of natural language processing tools developed for one or the other. In this paper, we investigate the performance of four extant concept recognition tools: the clinical Text Analysis and Knowledge Extraction System (cTAKES), the National Center for Biomedical Ontology (NCBO) Annotator, the Biomedical Concept Annotation System (BeCAS) and MetaMap. Each of the four concept recognition systems is applied to four different corpora: the i2b2 corpus of clinical documents, a PubMed corpus of Medline abstracts, a clinical trails corpus and the ShARe/CLEF corpus. In addition, we assess the individual system performances with respect to one gold standard annotation set, available for the ShARe/CLEF corpus. Furthermore, we built a silver standard annotation set from the individual systems' output and assess the quality as well as the contribution of individual systems to the quality of the silver standard. Our results demonstrate that mainly the NCBO annotator and cTAKES contribute to the silver standard corpora (F1-measures in the range of 21% to 74%) and their quality (best F1-measure of 33%), independent from the type of text investigated. While BeCAS and MetaMap can contribute to the precision of silver standard annotations (precision of up to 42%), the F1-measure drops when combined with NCBO Annotator and cTAKES due to a low recall. In conclusion, the performances of individual systems need to be improved independently from the text types, and the leveraging strategies to best take advantage of individual systems' annotations need to be revised. The textual content of the PubMed corpus, accession numbers for the clinical trials corpus, and assigned annotations of the four concept recognition systems as well as the generated silver standard annotation sets are available from http://purl.org/phenotype/resources. The textual content of the Sh
Generation of silver standard concept annotations from biomedical texts with special relevance to phenotypes.

Directory of Open Access Journals (Sweden)

Anika Oellrich

Full Text Available Electronic health records and scientific articles possess differing linguistic characteristics that may impact the performance of natural language processing tools developed for one or the other. In this paper, we investigate the performance of four extant concept recognition tools: the clinical Text Analysis and Knowledge Extraction System (cTAKES, the National Center for Biomedical Ontology (NCBO Annotator, the Biomedical Concept Annotation System (BeCAS and MetaMap. Each of the four concept recognition systems is applied to four different corpora: the i2b2 corpus of clinical documents, a PubMed corpus of Medline abstracts, a clinical trails corpus and the ShARe/CLEF corpus. In addition, we assess the individual system performances with respect to one gold standard annotation set, available for the ShARe/CLEF corpus. Furthermore, we built a silver standard annotation set from the individual systems' output and assess the quality as well as the contribution of individual systems to the quality of the silver standard. Our results demonstrate that mainly the NCBO annotator and cTAKES contribute to the silver standard corpora (F1-measures in the range of 21% to 74% and their quality (best F1-measure of 33%, independent from the type of text investigated. While BeCAS and MetaMap can contribute to the precision of silver standard annotations (precision of up to 42%, the F1-measure drops when combined with NCBO Annotator and cTAKES due to a low recall. In conclusion, the performances of individual systems need to be improved independently from the text types, and the leveraging strategies to best take advantage of individual systems' annotations need to be revised. The textual content of the PubMed corpus, accession numbers for the clinical trials corpus, and assigned annotations of the four concept recognition systems as well as the generated silver standard annotation sets are available from http://purl.org/phenotype/resources. The textual content
Landscape analysis of urban growth patterns in Seremban, Malaysia, using spatio-temporal data

Science.gov (United States)

Aburas, Maher M.; Abdullah, Sabrina H.; Ramli, Mohammad F.; As'shari, Zulfa H.

2016-06-01

Urban growth is one of the major issues that have played a significant role in destroying the ecosystem in recent years. Landscape analysis is an important technique widely used to evaluate urban growth patterns. In this study, four land-use maps from 1984, 1990, 2000, and 2010 have been used to analyze an urban landscape. The values of a built-up area were initially computed using a geographic information system environment based on the spatial gradient approach. Mathematical matrices were then used to determine the amount of change in urban patches in each direction. Results of the number of patches, landscape shape index, aggregation index, and total edges confirmed that the urban patches in Seremban, Malaysia, have become more dispersed from 2000 to 2010. The urban patches have also become more continuous, especially in the north-western part of Seremban as a result of the urban development in the Nilai District. These results indicate the necessity to create new policies in the city to protect the sustainability of the land use of Seremban.

Landscape morphology metrics for urban areas: analysis of the role of vegetation in the management of the quality of urban environment

Directory of Open Access Journals (Sweden)

Danilo Marques de Magalhães

2013-05-01

Full Text Available This study has the objective to demonstrate the applicability of landscape metric analysis undertaken in fragments of urban land use. More specifically, it focuses in low vegetation cover, arboreal and shrubbery vegetation and their distribution on land use. Differences of vegetation cover in dense urban areas are explained. It also discusses briefly the state-of-the-art Landscape Ecology and landscape metrics. It develops, as an example, a case study in Belo Horizonte, Minas Gerais, Brazil. For this study, it selects the use of the area’s metrics, the relation between area, perimeter, core, and circumscribed circle. From this analysis, this paper proposes the definition of priority areas for conservation, urban parks, free spaces of common land, linear parks and green corridors. It is demonstrated that, in order to design urban landscape, studies of two-dimension landscape representations are still interesting, but should consider the systemic relation between different factors related to shape and land use.
Fluid Annotations in a Open World

DEFF Research Database (Denmark)

Zellweger, Polle Trescott; Bouvin, Niels Olof; Jehøj, Henning

2001-01-01

Fluid Documents use animated typographical changes to provide a novel and appealing user experience for hypertext browsing and for viewing document annotations in context. This paper describes an effort to broaden the utility of Fluid Documents by using the open hypermedia Arakne Environment to l...... to layer fluid annotations and links on top of abitrary HTML pages on the World Wide Web. Changes to both Fluid Documents and Arakne are required....
Analysis of landscape patterns and their relationship with oak (Quercus Humboldtii Bonpl.) regeneration in the municipality of Popayan, Cauca

International Nuclear Information System (INIS)

Cabezas Gaviria Alexander; Ospina Montealegre Roman

2010-01-01

Landscape patterns were determined for three different areas having oak populations in the Popayan municipality (Clarete, Rejoya and Pisoje). Two Landsat images from different years and polygons with areas equal or greater than 1.5 hectares were used for land use classification. Patch Analysis software was used in order to determine quantitative variables. Structure description included: number of patches, mean patch size, mean patch index, mean patch fractal dimension and mean perimeter-area ratio. Dispersion and fragmentation were evaluated with the three indexes: Mean Nearest Neighbor Distance, Mean Proximity Index and Interspersion Juxtaposition Index. Community variables included: basal area, terrain slope, light percentage and regeneration density, and were measured in an area of 3600 m2 for each landscape. Landscape and community information were analyzed using principal component analysis (PCA). The first two components explained 91.4% of data variability; they were determined mostly by landscape variables than community factors. Correlation analysis and the Kruskal-Wallis test showed that the variable of major importance regarding oak tree regeneration were the Neighbor Distance in secondary forest patches, the Mean Proximity Index in oak tree forest patches and the Juxtaposition Index in patches of planted forests.
Annotating Cancer Variants and Anti-Cancer Therapeutics in Reactome

Energy Technology Data Exchange (ETDEWEB)

Milacic, Marija; Haw, Robin, E-mail: robin.haw@oicr.on.ca; Rothfels, Karen; Wu, Guanming [Informatics and Bio-computing Platform, Ontario Institute for Cancer Research, Toronto, ON, M5G0A3 (Canada); Croft, David; Hermjakob, Henning [European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD (United Kingdom); D’Eustachio, Peter [Department of Biochemistry, NYU School of Medicine, New York, NY 10016 (United States); Stein, Lincoln [Informatics and Bio-computing Platform, Ontario Institute for Cancer Research, Toronto, ON, M5G0A3 (Canada)

2012-11-08

Reactome describes biological pathways as chemical reactions that closely mirror the actual physical interactions that occur in the cell. Recent extensions of our data model accommodate the annotation of cancer and other disease processes. First, we have extended our class of protein modifications to accommodate annotation of changes in amino acid sequence and the formation of fusion proteins to describe the proteins involved in disease processes. Second, we have added a disease attribute to reaction, pathway, and physical entity classes that uses disease ontology terms. To support the graphical representation of “cancer” pathways, we have adapted our Pathway Browser to display disease variants and events in a way that allows comparison with the wild type pathway, and shows connections between perturbations in cancer and other biological pathways. The curation of pathways associated with cancer, coupled with our efforts to create other disease-specific pathways, will interoperate with our existing pathway and network analysis tools. Using the Epidermal Growth Factor Receptor (EGFR) signaling pathway as an example, we show how Reactome annotates and presents the altered biological behavior of EGFR variants due to their altered kinase and ligand-binding properties, and the mode of action and specificity of anti-cancer therapeutics.
Annotating cancer variants and anti-cancer therapeutics in reactome.

Science.gov (United States)

Milacic, Marija; Haw, Robin; Rothfels, Karen; Wu, Guanming; Croft, David; Hermjakob, Henning; D'Eustachio, Peter; Stein, Lincoln

2012-11-08

Reactome describes biological pathways as chemical reactions that closely mirror the actual physical interactions that occur in the cell. Recent extensions of our data model accommodate the annotation of cancer and other disease processes. First, we have extended our class of protein modifications to accommodate annotation of changes in amino acid sequence and the formation of fusion proteins to describe the proteins involved in disease processes. Second, we have added a disease attribute to reaction, pathway, and physical entity classes that uses disease ontology terms. To support the graphical representation of "cancer" pathways, we have adapted our Pathway Browser to display disease variants and events in a way that allows comparison with the wild type pathway, and shows connections between perturbations in cancer and other biological pathways. The curation of pathways associated with cancer, coupled with our efforts to create other disease-specific pathways, will interoperate with our existing pathway and network analysis tools. Using the Epidermal Growth Factor Receptor (EGFR) signaling pathway as an example, we show how Reactome annotates and presents the altered biological behavior of EGFR variants due to their altered kinase and ligand-binding properties, and the mode of action and specificity of anti-cancer therapeutics.
Annotating Cancer Variants and Anti-Cancer Therapeutics in Reactome

International Nuclear Information System (INIS)

Milacic, Marija; Haw, Robin; Rothfels, Karen; Wu, Guanming; Croft, David; Hermjakob, Henning; D’Eustachio, Peter; Stein, Lincoln

2012-01-01

Reactome describes biological pathways as chemical reactions that closely mirror the actual physical interactions that occur in the cell. Recent extensions of our data model accommodate the annotation of cancer and other disease processes. First, we have extended our class of protein modifications to accommodate annotation of changes in amino acid sequence and the formation of fusion proteins to describe the proteins involved in disease processes. Second, we have added a disease attribute to reaction, pathway, and physical entity classes that uses disease ontology terms. To support the graphical representation of “cancer” pathways, we have adapted our Pathway Browser to display disease variants and events in a way that allows comparison with the wild type pathway, and shows connections between perturbations in cancer and other biological pathways. The curation of pathways associated with cancer, coupled with our efforts to create other disease-specific pathways, will interoperate with our existing pathway and network analysis tools. Using the Epidermal Growth Factor Receptor (EGFR) signaling pathway as an example, we show how Reactome annotates and presents the altered biological behavior of EGFR variants due to their altered kinase and ligand-binding properties, and the mode of action and specificity of anti-cancer therapeutics
Computational prediction of over-annotated protein-coding genes in the genome of Agrobacterium tumefaciens strain C58

Science.gov (United States)

Yu, Jia-Feng; Sui, Tian-Xiang; Wang, Hong-Mei; Wang, Chun-Ling; Jing, Li; Wang, Ji-Hua

2015-12-01

Agrobacterium tumefaciens strain C58 is a type of pathogen that can cause tumors in some dicotyledonous plants. Ever since the genome of A. tumefaciens strain C58 was sequenced, the quality of annotation of its protein-coding genes has been queried continually, because the annotation varies greatly among different databases. In this paper, the questionable hypothetical genes were re-predicted by integrating the TN curve and Z curve methods. As a result, 30 genes originally annotated as “hypothetical” were discriminated as being non-coding sequences. By testing the re-prediction program 10 times on data sets composed of the function-known genes, the mean accuracy of 99.99% and mean Matthews correlation coefficient value of 0.9999 were obtained. Further sequence analysis and COG analysis showed that the re-annotation results were very reliable. This work can provide an efficient tool and data resources for future studies of A. tumefaciens strain C58. Project supported by the National Natural Science Foundation of China (Grant Nos. 61302186 and 61271378) and the Funding from the State Key Laboratory of Bioelectronics of Southeast University.
Paternity analysis of pollen-mediated gene flow for Fraxinus excelsior L. in a chronically fragmented landscape.

Science.gov (United States)

Bacles, C F E; Ennos, R A

2008-10-01

Paternity analysis based on microsatellite marker genotyping was used to infer contemporary genetic connectivity by pollen of three population remnants of the wind-pollinated, wind-dispersed tree Fraxinus excelsior, in a deforested Scottish landscape. By deterministically accounting for genotyping error and comparing a range of assignment methods, individual-based paternity assignments were used to derive population-level estimates of gene flow. Pollen immigration into a 300 ha landscape represents between 43 and 68% of effective pollination, mostly depending on assignment method. Individual male reproductive success is unequal, with 31 of 48 trees fertilizing one seed or more, but only three trees fertilizing more than ten seeds. Spatial analysis suggests a fat-tailed pollen dispersal curve with 85% of detected pollination occurring within 100 m, and 15% spreading between 300 and 1900 m from the source. Identification of immigrating pollen sourced from two neighbouring remnants indicates further effective dispersal at 2900 m. Pollen exchange among remnants is driven by population size rather than geographic distance, with larger remnants acting predominantly as pollen donors, and smaller remnants as pollen recipients. Enhanced wind dispersal of pollen in a barren landscape ensures that the seed produced within the catchment includes genetic material from a wide geographic area. However, gene flow estimates based on analysis of non-dispersed seeds were shown to underestimate realized gene immigration into the remnants by a factor of two suggesting that predictive landscape conservation requires integrated estimates of post-recruitment gene flow occurring via both pollen and seed.
Sustaining ecosystem services in cultural landscapes

Directory of Open Access Journals (Sweden)

Tobias Plieninger

2014-06-01

Full Text Available Classical conservation approaches focus on the man-made degradation of ecosystems and tend to neglect the social-ecological values that human land uses have imprinted on many environments. Throughout the world, ingenious land-use practices have generated unique cultural landscapes, but these are under pressure from agricultural intensification, land abandonment, and urbanization. In recent years, the cultural landscapes concept has been broadly adopted in science, policy, and management. The interest in both outstanding and vernacular landscapes finds expression in the UNESCO World Heritage Convention, the European Landscape Convention, and the IUCN Protected Landscape Approach. These policies promote the protection, management, planning, and governance of cultural landscapes. The ecosystem services approach is a powerful framework to guide such efforts, but has rarely been applied in landscape research and management. With this paper, we introduce a special feature that aims to enhance the theoretical, empirical and practical knowledge of how to safeguard the resilience of ecosystem services in cultural landscapes. It concludes (1 that the usefulness of the ecosystem services approach to the analysis and management of cultural landscapes should be reviewed more critically; (2 that conventional ecosystem services assessment needs to be complemented by socio-cultural valuation; (3 that cultural landscapes are inherently changing, so that a dynamic view on ecosystem services and a focus on drivers of landscape change are needed; and (4 that managing landscapes for ecosystem services provision may benefit from a social-ecological resilience perspective.
Annotation of toponyms in TEI digital literary editions and linking to the web of data

Directory of Open Access Journals (Sweden)

Frontini, Francesca

2016-07-01

Full Text Available This paper aims to discuss the challenges and benefits of the annotation of place names in literary texts and literary criticism. We shall first highlight the problems of encoding spatial information in digital editions using the TEI format by means of two manual annotation experiments and the discussion of various cases. This will lead to the question of how to use existing semantic web resources to complement and enrich toponym mark-up, in particular to provide mentions with precise georeferencing. Finally the automatic annotation of a large corpus will show the potential of visualizing places from texts, by illustrating an analysis of the evolution of literary life from the spatial and geographical point of view.
Construction of coffee transcriptome networks based on gene annotation semantics

Directory of Open Access Journals (Sweden)

Castillo Luis F.

2012-12-01

Full Text Available Gene annotation is a process that encompasses multiple approaches on the analysis of nucleic acids or protein sequences in order to assign structural and functional characteristics to gene models. When thousands of gene models are being described in an organism genome, construction and visualization of gene networks impose novel challenges in the understanding of complex expression patterns and the generation of new knowledge in genomics research. In order to take advantage of accumulated text data after conventional gene sequence analysis, this work applied semantics in combination with visualization tools to build transcriptome networks from a set of coffee gene annotations. A set of selected coffee transcriptome sequences, chosen by the quality of the sequence comparison reported by Basic Local Alignment Search Tool (BLAST and Interproscan, were filtered out by coverage, identity, length of the query, and e-values. Meanwhile, term descriptors for molecular biology and biochemistry were obtained along the Wordnet dictionary in order to construct a Resource Description Framework (RDF using Ruby scripts and Methontology to find associations between concepts. Relationships between sequence annotations and semantic concepts were graphically represented through a total of 6845 oriented vectors, which were reduced to 745 non-redundant associations. A large gene network connecting transcripts by way of relational concepts was created where detailed connections remain to be validated for biological significance based on current biochemical and genetics frameworks. Besides reusing text information in the generation of gene connections and for data mining purposes, this tool development opens the possibility to visualize complex and abundant transcriptome data, and triggers the formulation of new hypotheses in metabolic pathways analysis.
Black English Annotations for Elementary Reading Programs.

Science.gov (United States)

Prasad, Sandre

This report describes a program that uses annotations in the teacher's editions of existing reading programs to indicate the characteristics of black English that may interfere with the reading process of black children. The first part of the report provides a rationale for the annotation approach, explaining that the discrepancy between written…
Special Issue: Annotated Bibliography for Volumes XIX-XXXII.

Science.gov (United States)

Pullin, Richard A.

1998-01-01

This annotated bibliography lists 310 articles from the "Journal of Cooperative Education" from Volumes XIX-XXXII, 1983-1997. Annotations are presented in the order they appear in the journal; author and subject indexes are provided. (JOW)
The Bologna Annotation Resource (BAR 3.0): improving protein functional annotation.

Science.gov (United States)

Profiti, Giuseppe; Martelli, Pier Luigi; Casadio, Rita

2017-07-03

BAR 3.0 updates our server BAR (Bologna Annotation Resource) for predicting protein structural and functional features from sequence. We increase data volume, query capabilities and information conveyed to the user. The core of BAR 3.0 is a graph-based clustering procedure of UniProtKB sequences, following strict pairwise similarity criteria (sequence identity ≥40% with alignment coverage ≥90%). Each cluster contains the available annotation downloaded from UniProtKB, GO, PFAM and PDB. After statistical validation, GO terms and PFAM domains are cluster-specific and annotate new sequences entering the cluster after satisfying similarity constraints. BAR 3.0 includes 28 869 663 sequences in 1 361 773 clusters, of which 22.2% (22 241 661 sequences) and 47.4% (24 555 055 sequences) have at least one validated GO term and one PFAM domain, respectively. 1.4% of the clusters (36% of all sequences) include PDB structures and the cluster is associated to a hidden Markov model that allows building template-target alignment suitable for structural modeling. Some other 3 399 026 sequences are singletons. BAR 3.0 offers an improved search interface, allowing queries by UniProtKB-accession, Fasta sequence, GO-term, PFAM-domain, organism, PDB and ligand/s. When evaluated on the CAFA2 targets, BAR 3.0 largely outperforms our previous version and scores among state-of-the-art methods. BAR 3.0 is publicly available and accessible at http://bar.biocomp.unibo.it/bar3. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Annotated corpus and the empirical evaluation of probability estimates of grammatical forms

Directory of Open Access Journals (Sweden)

Ševa Nada

2003-01-01

Full Text Available The aim of the present study is to demonstrate the usage of an annotated corpus in the field of experimental psycholinguistics. Specifically, we demonstrate how the manually annotated Corpus of Serbian Language (Kostić, Đ. 2001 can be used for probability estimates of grammatical forms, which allow the control of independent variables in psycholinguistic experiments. We address the issue of processing Serbian inflected forms within two subparadigms of feminine nouns. In regression analysis, almost all processing variability of inflected forms has been accounted for by the amount of information (i.e. bits carried by the presented forms. In spite of the fact that probability distributions of inflected forms for the two paradigms differ, it was shown that the best prediction of processing variability is obtained by the probabilities derived from the predominant subparadigm which encompasses about 80% of feminine nouns. The relevance of annotated corpora in experimental psycholinguistics is discussed more in detail .
BAT: An open-source, web-based audio events annotation tool

OpenAIRE

Blai Meléndez-Catalan, Emilio Molina, Emilia Gómez

2017-01-01

In this paper we present BAT (BMAT Annotation Tool), an open-source, web-based tool for the manual annotation of events in audio recordings developed at BMAT (Barcelona Music and Audio Technologies). The main feature of the tool is that it provides an easy way to annotate the salience of simultaneous sound sources. Additionally, it allows to define multiple ontologies to adapt to multiple tasks and offers the possibility to cross-annotate audio data. Moreover, it is easy to install and deploy...
Experimental annotation of post-translational features and translated coding regions in the pathogen Salmonella Typhimurium

Energy Technology Data Exchange (ETDEWEB)

Ansong, Charles; Tolic, Nikola; Purvine, Samuel O.; Porwollik, Steffen; Jones, Marcus B.; Yoon, Hyunjin; Payne, Samuel H.; Martin, Jessica L.; Burnet, Meagan C.; Monroe, Matthew E.; Venepally, Pratap; Smith, Richard D.; Peterson, Scott; Heffron, Fred; Mcclelland, Michael; Adkins, Joshua N.

2011-08-25

Complete and accurate genome annotation is crucial for comprehensive and systematic studies of biological systems. For example systems biology-oriented genome scale modeling efforts greatly benefit from accurate annotation of protein-coding genes to develop proper functioning models. However, determining protein-coding genes for most new genomes is almost completely performed by inference, using computational predictions with significant documented error rates (> 15%). Furthermore, gene prediction programs provide no information on biologically important post-translational processing events critical for protein function. With the ability to directly measure peptides arising from expressed proteins, mass spectrometry-based proteomics approaches can be used to augment and verify coding regions of a genomic sequence and importantly detect post-translational processing events. In this study we utilized “shotgun” proteomics to guide accurate primary genome annotation of the bacterial pathogen Salmonella Typhimurium 14028 to facilitate a systems-level understanding of Salmonella biology. The data provides protein-level experimental confirmation for 44% of predicted protein-coding genes, suggests revisions to 48 genes assigned incorrect translational start sites, and uncovers 13 non-annotated genes missed by gene prediction programs. We also present a comprehensive analysis of post-translational processing events in Salmonella, revealing a wide range of complex chemical modifications (70 distinct modifications) and confirming more than 130 signal peptide and N-terminal methionine cleavage events in Salmonella. This study highlights several ways in which proteomics data applied during the primary stages of annotation can improve the quality of genome annotations, especially with regards to the annotation of mature protein products.
Annotating images by mining image search results.

Science.gov (United States)

Wang, Xin-Jing; Zhang, Lei; Li, Xirong; Ma, Wei-Ying

2008-11-01

Although it has been studied for years by the computer vision and machine learning communities, image annotation is still far from practical. In this paper, we propose a novel attempt at model-free image annotation, which is a data-driven approach that annotates images by mining their search results. Some 2.4 million images with their surrounding text are collected from a few photo forums to support this approach. The entire process is formulated in a divide-and-conquer framework where a query keyword is provided along with the uncaptioned image to improve both the effectiveness and efficiency. This is helpful when the collected data set is not dense everywhere. In this sense, our approach contains three steps: 1) the search process to discover visually and semantically similar search results, 2) the mining process to identify salient terms from textual descriptions of the search results, and 3) the annotation rejection process to filter out noisy terms yielded by Step 2. To ensure real-time annotation, two key techniques are leveraged-one is to map the high-dimensional image visual features into hash codes, the other is to implement it as a distributed system, of which the search and mining processes are provided as Web services. As a typical result, the entire process finishes in less than 1 second. Since no training data set is required, our approach enables annotating with unlimited vocabulary and is highly scalable and robust to outliers. Experimental results on both real Web images and a benchmark image data set show the effectiveness and efficiency of the proposed algorithm. It is also worth noting that, although the entire approach is illustrated within the divide-and conquer framework, a query keyword is not crucial to our current implementation. We provide experimental results to prove this.
Collaborative Paper-Based Annotation of Lecture Slides

Science.gov (United States)

Steimle, Jurgen; Brdiczka, Oliver; Muhlhauser, Max

2009-01-01

In a study of notetaking in university courses, we found that the large majority of students prefer paper to computer-based media like Tablet PCs for taking notes and making annotations. Based on this finding, we developed CoScribe, a concept and system which supports students in making collaborative handwritten annotations on printed lecture…
Music journals in South Africa 1854-2010: an annotated bibliography

African Journals Online (AJOL)

Music journals in South Africa 1854-2010: an annotated bibliography. ... The article focuses on presenting an annotated bibliography of music journalism in South Africa from as early as 1854 until 2010. Most of ... Key words: annotated bibliography, electronic journals, music journals, periodicals, South African music history ...

Landscape Studio

DEFF Research Database (Denmark)

Hansen, Peter Lundsgaard

2017-01-01

Landscape studio documents is the biography of the method 'design conversation' and contributes to the way we work with landscapes. The blog communicates renewed landscape didactics and leads to the innovation of design practices.......Landscape studio documents is the biography of the method 'design conversation' and contributes to the way we work with landscapes. The blog communicates renewed landscape didactics and leads to the innovation of design practices....
Identifying and exploiting trait-relevant tissues with multiple functional annotations in genome-wide association studies

Science.gov (United States)

Zhang, Shujun

2018-01-01

Genome-wide association studies (GWASs) have identified many disease associated loci, the majority of which have unknown biological functions. Understanding the mechanism underlying trait associations requires identifying trait-relevant tissues and investigating associations in a trait-specific fashion. Here, we extend the widely used linear mixed model to incorporate multiple SNP functional annotations from omics studies with GWAS summary statistics to facilitate the identification of trait-relevant tissues, with which to further construct powerful association tests. Specifically, we rely on a generalized estimating equation based algorithm for parameter inference, a mixture modeling framework for trait-tissue relevance classification, and a weighted sequence kernel association test constructed based on the identified trait-relevant tissues for powerful association analysis. We refer to our analytic procedure as the Scalable Multiple Annotation integration for trait-Relevant Tissue identification and usage (SMART). With extensive simulations, we show how our method can make use of multiple complementary annotations to improve the accuracy for identifying trait-relevant tissues. In addition, our procedure allows us to make use of the inferred trait-relevant tissues, for the first time, to construct more powerful SNP set tests. We apply our method for an in-depth analysis of 43 traits from 28 GWASs using tissue-specific annotations in 105 tissues derived from ENCODE and Roadmap. Our results reveal new trait-tissue relevance, pinpoint important annotations that are informative of trait-tissue relationship, and illustrate how we can use the inferred trait-relevant tissues to construct more powerful association tests in the Wellcome trust case control consortium study. PMID:29377896
Identifying and exploiting trait-relevant tissues with multiple functional annotations in genome-wide association studies.

Directory of Open Access Journals (Sweden)

Xingjie Hao

2018-01-01

Full Text Available Genome-wide association studies (GWASs have identified many disease associated loci, the majority of which have unknown biological functions. Understanding the mechanism underlying trait associations requires identifying trait-relevant tissues and investigating associations in a trait-specific fashion. Here, we extend the widely used linear mixed model to incorporate multiple SNP functional annotations from omics studies with GWAS summary statistics to facilitate the identification of trait-relevant tissues, with which to further construct powerful association tests. Specifically, we rely on a generalized estimating equation based algorithm for parameter inference, a mixture modeling framework for trait-tissue relevance classification, and a weighted sequence kernel association test constructed based on the identified trait-relevant tissues for powerful association analysis. We refer to our analytic procedure as the Scalable Multiple Annotation integration for trait-Relevant Tissue identification and usage (SMART. With extensive simulations, we show how our method can make use of multiple complementary annotations to improve the accuracy for identifying trait-relevant tissues. In addition, our procedure allows us to make use of the inferred trait-relevant tissues, for the first time, to construct more powerful SNP set tests. We apply our method for an in-depth analysis of 43 traits from 28 GWASs using tissue-specific annotations in 105 tissues derived from ENCODE and Roadmap. Our results reveal new trait-tissue relevance, pinpoint important annotations that are informative of trait-tissue relationship, and illustrate how we can use the inferred trait-relevant tissues to construct more powerful association tests in the Wellcome trust case control consortium study.
An annotated corpus with nanomedicine and pharmacokinetic parameters.

Science.gov (United States)

Lewinski, Nastassja A; Jimenez, Ivan; McInnes, Bridget T

2017-01-01

A vast amount of data on nanomedicines is being generated and published, and natural language processing (NLP) approaches can automate the extraction of unstructured text-based data. Annotated corpora are a key resource for NLP and information extraction methods which employ machine learning. Although corpora are available for pharmaceuticals, resources for nanomedicines and nanotechnology are still limited. To foster nanotechnology text mining (NanoNLP) efforts, we have constructed a corpus of annotated drug product inserts taken from the US Food and Drug Administration's Drugs@FDA online database. In this work, we present the development of the Engineered Nanomedicine Database corpus to support the evaluation of nanomedicine entity extraction. The data were manually annotated for 21 entity mentions consisting of nanomedicine physicochemical characterization, exposure, and biologic response information of 41 Food and Drug Administration-approved nanomedicines. We evaluate the reliability of the manual annotations and demonstrate the use of the corpus by evaluating two state-of-the-art named entity extraction systems, OpenNLP and Stanford NER. The annotated corpus is available open source and, based on these results, guidelines and suggestions for future development of additional nanomedicine corpora are provided.
Plann: A command-line application for annotating plastome sequences1

Science.gov (United States)

Huang, Daisie I.; Cronk, Quentin C. B.

2015-01-01

Premise of the study: Plann automates the process of annotating a plastome sequence in GenBank format for either downstream processing or for GenBank submission by annotating a new plastome based on a similar, well-annotated plastome. Methods and Results: Plann is a Perl script to be executed on the command line. Plann compares a new plastome sequence to the features annotated in a reference plastome and then shifts the intervals of any matching features to the locations in the new plastome. Plann’s output can be used in the National Center for Biotechnology Information’s tbl2asn to create a Sequin file for GenBank submission. Conclusions: Unlike Web-based annotation packages, Plann is a locally executable script that will accurately annotate a plastome sequence to a locally specified reference plastome. Because it executes from the command line, it is ready to use in other software pipelines and can be easily rerun as a draft plastome is improved. PMID:26312193
Dynamic Changes of Landscape Pattern and Vulnerability Analysis in Qingyi River Basin

Science.gov (United States)

Li, Ziwei; Xie, Chaoying; He, Xiaohui; Guo, Hengliang; Wang, Li

2017-11-01

Environmental vulnerability research is one of the core areas of global environmental change research. Over the past 10 years, ecologically fragile zones or transition zones had been significantly affected by environmental degradation and climate change and human activities. In this paper, we analyzed the spatial and temporal changes of landscape pattern and landscape vulnerability degree in Qingyi River Basin by calculating the landscape sensitivity index and landscape restoration degree index based on Landsat images of 2005, 2010 and 2015. The results showed that: (1) The top conversion area was farmland, woodland and grassland area decreased, city land and rural residential land increased fastest. (2) The fragility of the landscape pattern along the Qingyi River gradually increased between 2005 and 2015, the downstream area was influenced by the influence of human activities. (3) Landscape pattern changes and fragility are mainly affected by urbanization. These findings are helpful for understanding the evolution of landscape pattern as well as urban ecology, which both have significant implications for urban planning and minimize the potential environmental impacts of urbanization in Qingyi River Basin.
Assessment of disease named entity recognition on a corpus of annotated sentences.

Science.gov (United States)

Jimeno, Antonio; Jimenez-Ruiz, Ernesto; Lee, Vivian; Gaudan, Sylvain; Berlanga, Rafael; Rebholz-Schuhmann, Dietrich

2008-04-11

that dictionary look-up already provides competitive results indicating that the use of disease terminology is highly standardized throughout the terminologies and the literature. MetaMap generates precise results at the expense of insufficient recall while our statistical method obtains better recall at a lower precision rate. Even better results in terms of precision are achieved by combining at least two of the three methods leading, but this approach again lowers recall. Altogether, our analysis gives a better understanding of the complexity of disease annotations in the literature. MetaMap and the dictionary based approach are available through the Whatizit web service infrastructure (Rebholz-Schuhmann D, Arregui M, Gaudan S, Kirsch H, Jimeno A: Text processing through Web services: Calling Whatizit. Bioinformatics 2008, 24:296-298).
Human Needs as an Approach to Designed Landscapes

Directory of Open Access Journals (Sweden)

Dalia Aly

2018-03-01

Full Text Available The traditional approach of landscape architecture has always focused on the aesthetic and visual aspects of landscapes while giving less attention to other aspects. This view has limited the benefits that can be derived from designed landscapes, despite the wide-ranging potential they carry for humans; socially, environmentally and economically. As a result, many researchers and practitioners are currently challenging this view to develop a more holistic and multidimensional approach. The present research therefore aims at proposing a new perspective for public designed landscapes based on fundamental human needs. The study methodology was comprised of critical content analysis for three main domains: sustainable development, human needs in specific relation to public landscapes, and significant approaches to fundamental human needs. Reconciliation among these domains was achieved based on a modified version of Max-Neef’s matrix of fundamental human needs. Human needs in public landscapes were merged into the matrix to reach a comprehensive yet specific perspective. The study concluded with a conceptual framework that can provide a wider perspective to human needs in designed landscapes. It proposes a new tool for the analysis of the benefits of public landscapes and their value for humans, which can be further used in various applications.
Passive Solar Landscape Design: Its Impact on Fossil Fuel Consumption Through Landscape Design

OpenAIRE

Boelt, Robin Wiatt

2006-01-01

Gas, electricity, heating and cooling buildings - comfort â our lives revolve around fossil fuels. Technology and the demands of living in todayâ s society add to our gigantic fossil fuel appetite. With gas prices topping three dollars per gallon, changes must be made. This thesis project presents an analysis of passive solar landscape design (PSLD) principles used to create microclimates within the landscape, and thereby increasing human comfort both indoors and outdoors. The ...
Avoiding inconsistencies over time and tracking difficulties in Applied Biosystems AB1700™/Panther™ probe-to-gene annotations

Directory of Open Access Journals (Sweden)

Benecke Arndt

2005-12-01

Full Text Available Abstract Background Significant inconsistencies between probe-to-gene annotations between different releases of probe set identifiers by commercial microarray platform solutions have been reported. Such inconsistencies lead to misleading or ambiguous interpretation of published gene expression results. Results We report here similar inconsistencies in the probe-to-gene annotation of Applied Biosystems AB1700 data, demonstrating that this is not an isolated concern. Moreover, the online information source PANTHER does not provide information required to track such inconsistencies, hence, even correctly annotated datasets, when resubmitted after PANTHER was updated to a new probe-to-gene annotation release, will generate differing results without any feedback on the origin of the change. Conclusion The importance of unequivocal annotation of microarray experiments can not be underestimated. Inconsistencies greatly diminish the usefulness of the technology. Novel methods in the analysis of transcriptome profiles often rely on large disparate datasets stemming from multiple sources. The predictive and analytic power of such approaches rapidly diminishes if only least-common subsets can be used for analysis. We present here the information that needs to be provided together with the raw AB1700 data, and the information required together with the biologic interpretation of such data to avoid inconsistencies and tracking difficulties.
Evaluation of web-based annotation of ophthalmic images for multicentric clinical trials.

Science.gov (United States)

Chalam, K V; Jain, P; Shah, V A; Shah, Gaurav Y

2006-06-01

An Internet browser-based annotation system can be used to identify and describe features in digitalized retinal images, in multicentric clinical trials, in real time. In this web-based annotation system, the user employs a mouse to draw and create annotations on a transparent layer, that encapsulates the observations and interpretations of a specific image. Multiple annotation layers may be overlaid on a single image. These layers may correspond to annotations by different users on the same image or annotations of a temporal sequence of images of a disease process, over a period of time. In addition, geometrical properties of annotated figures may be computed and measured. The annotations are stored in a central repository database on a server, which can be retrieved by multiple users in real time. This system facilitates objective evaluation of digital images and comparison of double-blind readings of digital photographs, with an identifiable audit trail. Annotation of ophthalmic images allowed clinically feasible and useful interpretation to track properties of an area of fundus pathology. This provided an objective method to monitor properties of pathologies over time, an essential component of multicentric clinical trials. The annotation system also allowed users to view stereoscopic images that are stereo pairs. This web-based annotation system is useful and valuable in monitoring patient care, in multicentric clinical trials, telemedicine, teaching and routine clinical settings.
Snpdat: Easy and rapid annotation of results from de novo snp discovery projects for model and non-model organisms

Directory of Open Access Journals (Sweden)

Doran Anthony G

2013-02-01

Full Text Available Abstract Background Single nucleotide polymorphisms (SNPs are the most abundant genetic variant found in vertebrates and invertebrates. SNP discovery has become a highly automated, robust and relatively inexpensive process allowing the identification of many thousands of mutations for model and non-model organisms. Annotating large numbers of SNPs can be a difficult and complex process. Many tools available are optimised for use with organisms densely sampled for SNPs, such as humans. There are currently few tools available that are species non-specific or support non-model organism data. Results Here we present SNPdat, a high throughput analysis tool that can provide a comprehensive annotation of both novel and known SNPs for any organism with a draft sequence and annotation. Using a dataset of 4,566 SNPs identified in cattle using high-throughput DNA sequencing we demonstrate the annotations performed and the statistics that can be generated by SNPdat. Conclusions SNPdat provides users with a simple tool for annotation of genomes that are either not supported by other tools or have a small number of annotated SNPs available. SNPdat can also be used to analyse datasets from organisms which are densely sampled for SNPs. As a command line tool it can easily be incorporated into existing SNP discovery pipelines and fills a niche for analyses involving non-model organisms that are not supported by many available SNP annotation tools. SNPdat will be of great interest to scientists involved in SNP discovery and analysis projects, particularly those with limited bioinformatics experience.
Essential Annotation Schema for Ecology (EASE)-A framework supporting the efficient data annotation and faceted navigation in ecology.

Science.gov (United States)

Pfaff, Claas-Thido; Eichenberg, David; Liebergesell, Mario; König-Ries, Birgitta; Wirth, Christian

2017-01-01

Ecology has become a data intensive science over the last decades which often relies on the reuse of data in cross-experimental analyses. However, finding data which qualifies for the reuse in a specific context can be challenging. It requires good quality metadata and annotations as well as efficient search strategies. To date, full text search (often on the metadata only) is the most widely used search strategy although it is known to be inaccurate. Faceted navigation is providing a filter mechanism which is based on fine granular metadata, categorizing search objects along numeric and categorical parameters relevant for their discovery. Selecting from these parameters during a full text search creates a system of filters which allows to refine and improve the results towards more relevance. We developed a framework for the efficient annotation and faceted navigation in ecology. It consists of an XML schema for storing the annotation of search objects and is accompanied by a vocabulary focused on ecology to support the annotation process. The framework consolidates ideas which originate from widely accepted metadata standards, textbooks, scientific literature, and vocabularies as well as from expert knowledge contributed by researchers from ecology and adjacent disciplines.
Essential Annotation Schema for Ecology (EASE-A framework supporting the efficient data annotation and faceted navigation in ecology.

Directory of Open Access Journals (Sweden)

Claas-Thido Pfaff

Full Text Available Ecology has become a data intensive science over the last decades which often relies on the reuse of data in cross-experimental analyses. However, finding data which qualifies for the reuse in a specific context can be challenging. It requires good quality metadata and annotations as well as efficient search strategies. To date, full text search (often on the metadata only is the most widely used search strategy although it is known to be inaccurate. Faceted navigation is providing a filter mechanism which is based on fine granular metadata, categorizing search objects along numeric and categorical parameters relevant for their discovery. Selecting from these parameters during a full text search creates a system of filters which allows to refine and improve the results towards more relevance. We developed a framework for the efficient annotation and faceted navigation in ecology. It consists of an XML schema for storing the annotation of search objects and is accompanied by a vocabulary focused on ecology to support the annotation process. The framework consolidates ideas which originate from widely accepted metadata standards, textbooks, scientific literature, and vocabularies as well as from expert knowledge contributed by researchers from ecology and adjacent disciplines.
Unraveling Landscape Complexity: Land Use/Land Cover Changes and Landscape Pattern Dynamics (1954-2008) in Contrasting Peri-Urban and Agro-Forest Regions of Northern Italy.

Science.gov (United States)

Smiraglia, D; Ceccarelli, T; Bajocco, S; Perini, L; Salvati, L

2015-10-01

This study implements an exploratory data analysis of landscape metrics and a change detection analysis of land use and population density to assess landscape dynamics (1954-2008) in two physiographic zones (plain and hilly-mountain area) of Emilia Romagna, northern Italy. The two areas are characterized by different landscape types: a mixed urban-rural landscape dominated by arable land and peri-urban settlements in the plain and a traditional agro-forest landscape in the hilly-mountain area with deciduous and conifer forests, scrublands, meadows, and crop mosaic. Urbanization and, to a lesser extent, agricultural intensification were identified as the processes underlying landscape change in the plain. Land abandonment determining natural forestation and re-forestation driven by man was identified as the process of change most representative of the hilly-mountain area. Trends in landscape metrics indicate a shift toward more fragmented and convoluted patterns in both areas. Number of patches, the interspersion and juxtaposition index, and the large patch index are the metrics discriminating the two areas in terms of landscape patterns in 1954. In 2008, mean patch size, edge density, interspersion and juxtaposition index, and mean Euclidean nearest neighbor distance were the metrics with the most different spatial patterns in the two areas. The exploratory data analysis of landscape metrics contributed to link changes over time in both landscape composition and configuration providing a comprehensive picture of landscape transformations in a wealthy European region. Evidence from this study are hoped to inform sustainable land management designed for homogeneous landscape units in similar socioeconomic contexts.
Detecting the Land-Cover Changes Induced by Large-Physical Disturbances Using Landscape Metrics, Spatial Sampling, Simulation and Spatial Analysis

Directory of Open Access Journals (Sweden)

Hone-Jay Chu

2009-08-01

Full Text Available The objectives of the study are to integrate the conditional Latin Hypercube Sampling (cLHS, sequential Gaussian simulation (SGS and spatial analysis in remotely sensed images, to monitor the effects of large chronological disturbances on spatial characteristics of landscape changes including spatial heterogeneity and variability. The multiple NDVI images demonstrate that spatial patterns of disturbed landscapes were successfully delineated by spatial analysis such as variogram, Moran’I and landscape metrics in the study area. The hybrid method delineates the spatial patterns and spatial variability of landscapes caused by these large disturbances. The cLHS approach is applied to select samples from Normalized Difference Vegetation Index (NDVI images from SPOT HRV images in the Chenyulan watershed of Taiwan, and then SGS with sufficient samples is used to generate maps of NDVI images. In final, the NDVI simulated maps are verified using indexes such as the correlation coefficient and mean absolute error (MAE. Therefore, the statistics and spatial structures of multiple NDVI images present a very robust behavior, which advocates the use of the index for the quantification of the landscape spatial patterns and land cover change. In addition, the results transferred by Open Geospatial techniques can be accessed from web-based and end-user applications of the watershed management.
Thinking big: linking rivers to landscapes

Science.gov (United States)

Joan O’Callaghan; Ashley E. Steel; Kelly M. Burnett

2012-01-01

Exploring relationships between landscape characteristics and rivers is an emerging field, enabled by the proliferation of satellite date, advances in statistical analysis, and increased emphasis on large-scale monitoring. Landscapes features such as road networks, underlying geology, and human developments, determine the characteristics of the rivers flowing through...
Annotating Logical Forms for EHR Questions.

Science.gov (United States)

Roberts, Kirk; Demner-Fushman, Dina

2016-05-01

This paper discusses the creation of a semantically annotated corpus of questions about patient data in electronic health records (EHRs). The goal is to provide the training data necessary for semantic parsers to automatically convert EHR questions into a structured query. A layered annotation strategy is used which mirrors a typical natural language processing (NLP) pipeline. First, questions are syntactically analyzed to identify multi-part questions. Second, medical concepts are recognized and normalized to a clinical ontology. Finally, logical forms are created using a lambda calculus representation. We use a corpus of 446 questions asking for patient-specific information. From these, 468 specific questions are found containing 259 unique medical concepts and requiring 53 unique predicates to represent the logical forms. We further present detailed characteristics of the corpus, including inter-annotator agreement results, and describe the challenges automatic NLP systems will face on this task.
Managing and Querying Image Annotation and Markup in XML.

Science.gov (United States)

Wang, Fusheng; Pan, Tony; Sharma, Ashish; Saltz, Joel

2010-01-01

Proprietary approaches for representing annotations and image markup are serious barriers for researchers to share image data and knowledge. The Annotation and Image Markup (AIM) project is developing a standard based information model for image annotation and markup in health care and clinical trial environments. The complex hierarchical structures of AIM data model pose new challenges for managing such data in terms of performance and support of complex queries. In this paper, we present our work on managing AIM data through a native XML approach, and supporting complex image and annotation queries through native extension of XQuery language. Through integration with xService, AIM databases can now be conveniently shared through caGrid.
Managing and Querying Image Annotation and Markup in XML

Science.gov (United States)

Wang, Fusheng; Pan, Tony; Sharma, Ashish; Saltz, Joel

2010-01-01

Proprietary approaches for representing annotations and image markup are serious barriers for researchers to share image data and knowledge. The Annotation and Image Markup (AIM) project is developing a standard based information model for image annotation and markup in health care and clinical trial environments. The complex hierarchical structures of AIM data model pose new challenges for managing such data in terms of performance and support of complex queries. In this paper, we present our work on managing AIM data through a native XML approach, and supporting complex image and annotation queries through native extension of XQuery language. Through integration with xService, AIM databases can now be conveniently shared through caGrid. PMID:21218167

Annotation of mammalian primary microRNAs

Directory of Open Access Journals (Sweden)

Enright Anton J

2008-11-01

Full Text Available Abstract Background MicroRNAs (miRNAs are important regulators of gene expression and have been implicated in development, differentiation and pathogenesis. Hundreds of miRNAs have been discovered in mammalian genomes. Approximately 50% of mammalian miRNAs are expressed from introns of protein-coding genes; the primary transcript (pri-miRNA is therefore assumed to be the host transcript. However, very little is known about the structure of pri-miRNAs expressed from intergenic regions. Here we annotate transcript boundaries of miRNAs in human, mouse and rat genomes using various transcription features. The 5' end of the pri-miRNA is predicted from transcription start sites, CpG islands and 5' CAGE tags mapped in the upstream flanking region surrounding the precursor miRNA (pre-miRNA. The 3' end of the pri-miRNA is predicted based on the mapping of polyA signals, and supported by cDNA/EST and ditags data. The predicted pri-miRNAs are also analyzed for promoter and insulator-associated regulatory regions. Results We define sets of conserved and non-conserved human, mouse and rat pre-miRNAs using bidirectional BLAST and synteny analysis. Transcription features in their flanking regions are used to demarcate the 5' and 3' boundaries of the pri-miRNAs. The lengths and boundaries of primary transcripts are highly conserved between orthologous miRNAs. A significant fraction of pri-miRNAs have lengths between 1 and 10 kb, with very few introns. We annotate a total of 59 pri-miRNA structures, which include 82 pre-miRNAs. 36 pri-miRNAs are conserved in all 3 species. In total, 18 of the confidently annotated transcripts express more than one pre-miRNA. The upstream regions of 54% of the predicted pri-miRNAs are found to be associated with promoter and insulator regulatory sequences. Conclusion Little is known about the primary transcripts of intergenic miRNAs. Using comparative data, we are able to identify the boundaries of a significant proportion of
RCAS: an RNA centric annotation system for transcriptome-wide regions of interest.

Science.gov (United States)

Uyar, Bora; Yusuf, Dilmurat; Wurmus, Ricardo; Rajewsky, Nikolaus; Ohler, Uwe; Akalin, Altuna

2017-06-02

In the field of RNA, the technologies for studying the transcriptome have created a tremendous potential for deciphering the puzzles of the RNA biology. Along with the excitement, the unprecedented volume of RNA related omics data is creating great challenges in bioinformatics analyses. Here, we present the RNA Centric Annotation System (RCAS), an R package, which is designed to ease the process of creating gene-centric annotations and analysis for the genomic regions of interest obtained from various RNA-based omics technologies. The design of RCAS is modular, which enables flexible usage and convenient integration with other bioinformatics workflows. RCAS is an R/Bioconductor package but we also created graphical user interfaces including a Galaxy wrapper and a stand-alone web service. The application of RCAS on published datasets shows that RCAS is not only able to reproduce published findings but also helps generate novel knowledge and hypotheses. The meta-gene profiles, gene-centric annotation, motif analysis and gene-set analysis provided by RCAS provide contextual knowledge which is necessary for understanding the functional aspects of different biological events that involve RNAs. In addition, the array of different interfaces and deployment options adds the convenience of use for different levels of users. RCAS is available at http://bioconductor.org/packages/release/bioc/html/RCAS.html and http://rcas.mdc-berlin.de. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
A gradient analysis on urban sprawl and urban landscape pattern between 1985 and 2000 in the Pearl River Delta, China

Science.gov (United States)

Dai, Erfu; Wu, Zhuo; Du, Xiaodian

2017-04-01

Urbanization is an irreversible trend worldwide, especially in rapidly developing China. Accelerated urbanization has resulted in rapid urban sprawl and urban landscape pattern changes. Quantifying the spatiotemporal dynamics of urban land use and landscape pattern not only can reveal the characteristics of social transfer and economic development, but also can provide insights into the driving mechanisms of land use changes. In this study, we integrated remote sensing (RS), geographic information system (GIS), landscape metrics, and gradient analysis to quantitatively compare the spatiotemporal dynamics of land use, urban sprawl, and landscape pattern for nine cities in the Pearl River Delta from 1985‒2000. For the whole study region, urbanization was obvious. The results show an increase in urban buildup land and shrinkage of cropland in the Pearl River Delta. However, the nine cities differed greatly in terms of the process and magnitude of urban sprawl for both the spatial and temporal dimensions. This was most evident for the cities of Guangzhou and Shenzhen. Gradient analysis on urban landscape changes could deepen understanding of the stages of urban development and provide a scientific foundation for future urban planning and land management strategies in China.
MULTI-TEMPORAL ANALYSIS OF LANDSCAPES AND URBAN AREAS

Directory of Open Access Journals (Sweden)

E. Nocerino

2012-07-01

Full Text Available This article presents a 4D modelling approach that employs multi-temporal and historical aerial images to derive spatio-temporal information for scenes and landscapes. Such imagery represent a unique data source, which combined with photo interpretation and reality-based 3D reconstruction techniques, can offer a more complete modelling procedure because it adds the fourth dimension of time to 3D geometrical representation and thus, allows urban planners, historians, and others to identify, describe, and analyse changes in individual scenes and buildings as well as across landscapes. Particularly important to this approach are historical aerial photos, which provide data about the past that can be collected, processed, and then integrated as a database. The proposed methodology employs both historical (1945 and more recent (1973 and 2000s aerial images from the Trentino region in North-eastern Italy in order to create a multi-temporal database of information to assist researchers in many disciplines such as topographic mapping, geology, geography, architecture, and archaeology as they work to reconstruct building phases and to understand landscape transformations (Fig. 1.
eHistology image and annotation data from the Kaufman Atlas of Mouse Development.

Science.gov (United States)

Baldock, Richard A; Armit, Chris

2017-12-20

"The Atlas of Mouse Development" by Kaufman is a classic paper atlas that is the de facto standard for the definition of mouse embryo anatomy in the context of standard histological images. We have re-digitised the original H&E stained tissue sections used for the book at high resolution and transferred the hand-drawn annotations to digital form. We have augmented the annotations with standard ontological assignments (EMAPA anatomy) and made the data freely available via an online viewer (eHistology) and from the University of Edinburgh DataShare archive. The dataset captures and preserves the definitive anatomical knowledge of the original atlas, provides a core image set for deeper community annotation and teaching, and delivers a unique high-quality set of high-resolution histological images through mammalian development for manual and automated analysis. © The Authors 2017. Published by Oxford University Press.
landscape incorporation in the environmental impact studies

International Nuclear Information System (INIS)

Gutierrez G, Luz Angela

2000-01-01

A general overview on landscape analysis showing the two principal approaches to their study, the article emphasize on the need of taking landscape in consideration on the making of the environmental impact study of any project of development
Multiview Hessian regularization for image annotation.

Science.gov (United States)

Liu, Weifeng; Tao, Dacheng

2013-07-01

The rapid development of computer hardware and Internet technology makes large scale data dependent models computationally tractable, and opens a bright avenue for annotating images through innovative machine learning algorithms. Semisupervised learning (SSL) therefore received intensive attention in recent years and was successfully deployed in image annotation. One representative work in SSL is Laplacian regularization (LR), which smoothes the conditional distribution for classification along the manifold encoded in the graph Laplacian, however, it is observed that LR biases the classification function toward a constant function that possibly results in poor generalization. In addition, LR is developed to handle uniformly distributed data (or single-view data), although instances or objects, such as images and videos, are usually represented by multiview features, such as color, shape, and texture. In this paper, we present multiview Hessian regularization (mHR) to address the above two problems in LR-based image annotation. In particular, mHR optimally combines multiple HR, each of which is obtained from a particular view of instances, and steers the classification function that varies linearly along the data manifold. We apply mHR to kernel least squares and support vector machines as two examples for image annotation. Extensive experiments on the PASCAL VOC'07 dataset validate the effectiveness of mHR by comparing it with baseline algorithms, including LR and HR.
Ten steps to get started in Genome Assembly and Annotation

Science.gov (United States)

Dominguez Del Angel, Victoria; Hjerde, Erik; Sterck, Lieven; Capella-Gutierrez, Salvadors; Notredame, Cederic; Vinnere Pettersson, Olga; Amselem, Joelle; Bouri, Laurent; Bocs, Stephanie; Klopp, Christophe; Gibrat, Jean-Francois; Vlasova, Anna; Leskosek, Brane L.; Soler, Lucile; Binzer-Panchal, Mahesh; Lantz, Henrik

2018-01-01

As a part of the ELIXIR-EXCELERATE efforts in capacity building, we present here 10 steps to facilitate researchers getting started in genome assembly and genome annotation. The guidelines given are broadly applicable, intended to be stable over time, and cover all aspects from start to finish of a general assembly and annotation project. Intrinsic properties of genomes are discussed, as is the importance of using high quality DNA. Different sequencing technologies and generally applicable workflows for genome assembly are also detailed. We cover structural and functional annotation and encourage readers to also annotate transposable elements, something that is often omitted from annotation workflows. The importance of data management is stressed, and we give advice on where to submit data and how to make your results Findable, Accessible, Interoperable, and Reusable (FAIR). PMID:29568489
Sharing Map Annotations in Small Groups: X Marks the Spot

Science.gov (United States)

Congleton, Ben; Cerretani, Jacqueline; Newman, Mark W.; Ackerman, Mark S.

Advances in location-sensing technology, coupled with an increasingly pervasive wireless Internet, have made it possible (and increasingly easy) to access and share information with context of one’s geospatial location. We conducted a four-phase study, with 27 students, to explore the practices surrounding the creation, interpretation and sharing of map annotations in specific social contexts. We found that annotation authors consider multiple factors when deciding how to annotate maps, including the perceived utility to the audience and how their contributions will reflect on the image they project to others. Consumers of annotations value the novelty of information, but must be convinced of the author’s credibility. In this paper we describe our study, present the results, and discuss implications for the design of software for sharing map annotations.
Qualifying Urban Landscapes

DEFF Research Database (Denmark)

Clemmensen, Thomas Juel; Nielsen, Tom; Daugaard, Morten

2010-01-01

The article presents an attempt to develop alternatives to the dominant planning and design principles used in building and rebuilding the contemporary urban landscape. The basic idea is that the ‘forces of modernisation’ driving current development might result in a broader and more interesting...... for contemporary urban landscape design practice....... to the task of constructing and improving things. With this goal, a set of objectives based in important insights from recent urban theory are formulated constituting the normative spine of the analysis of a number of found situations as basis for formulating eight generic concepts of qualification...
Annotation of the Domestic Pig Genome by Quantitative Proteogenomics.

Science.gov (United States)

Marx, Harald; Hahne, Hannes; Ulbrich, Susanne E; Schnieke, Angelika; Rottmann, Oswald; Frishman, Dmitrij; Kuster, Bernhard

2017-08-04

The pig is one of the earliest domesticated animals in the history of human civilization and represents one of the most important livestock animals. The recent sequencing of the Sus scrofa genome was a major step toward the comprehensive understanding of porcine biology, evolution, and its utility as a promising large animal model for biomedical and xenotransplantation research. However, the functional and structural annotation of the Sus scrofa genome is far from complete. Here, we present mass spectrometry-based quantitative proteomics data of nine juvenile organs and six embryonic stages between 18 and 39 days after gestation. We found that the data provide evidence for and improve the annotation of 8176 protein-coding genes including 588 novel and 321 refined gene models. The analysis of tissue-specific proteins and the temporal expression profiles of embryonic proteins provides an initial functional characterization of expressed protein interaction networks and modules including as yet uncharacterized proteins. Comparative transcript and protein expression analysis to human organs reveal a moderate conservation of protein translation across species. We anticipate that this resource will facilitate basic and applied research on Sus scrofa as well as its porcine relatives.
MiMiR: a comprehensive solution for storage, annotation and exchange of microarray data

Directory of Open Access Journals (Sweden)

Rahman Fatimah

2005-11-01

Full Text Available Abstract Background The generation of large amounts of microarray data presents challenges for data collection, annotation, exchange and analysis. Although there are now widely accepted formats, minimum standards for data content and ontologies for microarray data, only a few groups are using them together to build and populate large-scale databases. Structured environments for data management are crucial for making full use of these data. Description The MiMiR database provides a comprehensive infrastructure for microarray data annotation, storage and exchange and is based on the MAGE format. MiMiR is MIAME-supportive, customised for use with data generated on the Affymetrix platform and includes a tool for data annotation using ontologies. Detailed information on the experiment, methods, reagents and signal intensity data can be captured in a systematic format. Reports screens permit the user to query the database, to view annotation on individual experiments and provide summary statistics. MiMiR has tools for automatic upload of the data from the microarray scanner and export to databases using MAGE-ML. Conclusion MiMiR facilitates microarray data management, annotation and exchange, in line with international guidelines. The database is valuable for underpinning research activities and promotes a systematic approach to data handling. Copies of MiMiR are freely available to academic groups under licence.
An ontology-based annotation of cardiac implantable electronic devices to detect therapy changes in a national registry.

Science.gov (United States)

Rosier, Arnaud; Mabo, Philippe; Chauvin, Michel; Burgun, Anita

2015-05-01

The patient population benefitting from cardiac implantable electronic devices (CIEDs) is increasing. This study introduces a device annotation method that supports the consistent description of the functional attributes of cardiac devices and evaluates how this method can detect device changes from a CIED registry. We designed the Cardiac Device Ontology, an ontology of CIEDs and device functions. We annotated 146 cardiac devices with this ontology and used it to detect therapy changes with respect to atrioventricular pacing, cardiac resynchronization therapy, and defibrillation capability in a French national registry of patients with implants (STIDEFIX). We then analyzed a set of 6905 device replacements from the STIDEFIX registry. Ontology-based identification of therapy changes (upgraded, downgraded, or similar) was accurate (6905 cases) and performed better than straightforward analysis of the registry codes (F-measure 1.00 versus 0.75 to 0.97). This study demonstrates the feasibility and effectiveness of ontology-based functional annotation of devices in the cardiac domain. Such annotation allowed a better description and in-depth analysis of STIDEFIX. This method was useful for the automatic detection of therapy changes and may be reused for analyzing data from other device registries.
The heritage and landscapes: new concepts for old ideas?

Directory of Open Access Journals (Sweden)

Vanessa Gayego Bello Figueiredo

2013-12-01

Full Text Available This article investigates the relationship between landscape and heritage and brings a brief critical analysis of the United Nations Educational, Scientific and Cultural Organization (UNESCO practice since the institutionalization of cultural landscape category, created on the World Heritage list in 1992, until 2012. The text is structured in three parts. The first presents a brief historical approach about the concept of Western landscape. The second presents recent formulations on the cultural landscape based on international conventions, such as the Council of Europe (1995 and the Landscape European Convention (2000. The third part focuses on the analysis of the World Heritage Committee work, comprising the main characteristics and values of cultural landscapes listed. Finally, the study reveals how the employment of this new concept is still reflecting old conceptions of landscape and preservation, although points towards perspective in the heritage policies, especially as regards the own expansion of the heritage concept and the approximation between the natural and cultural, material and immaterial dimensions.
The Effects of Multimedia Annotations on Iranian EFL Learners’ L2 Vocabulary Learning

Directory of Open Access Journals (Sweden)

Saeideh Ahangari

2010-05-01

Full Text Available In our modern technological world, Computer-Assisted Language learning (CALL is a new realm towards learning a language in general, and learning L2 vocabulary in particular. It is assumed that the use of multimedia annotations promotes language learners’ vocabulary acquisition. Therefore, this study set out to investigate the effects of different multimedia annotations (still picture annotations, dynamic picture annotations, and written annotations on L2 vocabulary learning. To fulfill this objective, the researchers selected sixty four EFL learners as the participants of this study. The participants were randomly assigned to one of the four groups: a control group that received no annotations and three experimental groups that received: still picture annotations, dynamic picture annotations, and written annotations. Each participant was required to take a pre-test. A vocabulary post- test was also designed and administered to the participants in order to assess the efficacy of each annotation. First for each group a paired t-test was conducted between their pre and post test scores in order to observe their improvement; then through an ANCOVA test the performance of four groups was compared. The results showed that using multimedia annotations resulted in a significant difference in the participants’ vocabulary learning. Based on the results of the present study, multimedia annotations are suggested as a vocabulary teaching strategy.
Landscape metrics for three-dimension urban pattern recognition

Science.gov (United States)

Liu, M.; Hu, Y.; Zhang, W.; Li, C.

2017-12-01

Understanding how landscape pattern determines population or ecosystem dynamics is crucial for managing our landscapes. Urban areas are becoming increasingly dominant social-ecological systems, so it is important to understand patterns of urbanization. Most studies of urban landscape pattern examine land-use maps in two dimensions because the acquisition of 3-dimensional information is difficult. We used Brista software based on Quickbird images and aerial photos to interpret the height of buildings, thus incorporating a 3-dimensional approach. We estimated the feasibility and accuracy of this approach. A total of 164,345 buildings in the Liaoning central urban agglomeration of China, which included seven cities, were measured. Twelve landscape metrics were proposed or chosen to describe the urban landscape patterns in 2- and 3-dimensional scales. The ecological and social meaning of landscape metrics were analyzed with multiple correlation analysis. The results showed that classification accuracy compared with field surveys was 87.6%, which means this method for interpreting building height was acceptable. The metrics effectively reflected the urban architecture in relation to number of buildings, area, height, 3-D shape and diversity aspects. We were able to describe the urban characteristics of each city with these metrics. The metrics also captured ecological and social meanings. The proposed landscape metrics provided a new method for urban landscape analysis in three dimensions.
Aesthetic appreciation of the cultural landscape through social media : An analysis of revealed preference in the Dutch river landscape

NARCIS (Netherlands)

Tieskens, Koen F.; Van Zanten, Boris T.; Schulp, Catharina J.E.; Verburg, Peter H.

2018-01-01

Aesthetic enjoyment and perception are increasingly recognized as important values of cultural landscapes. The study of these values transcends mere physical attributes of the landscape and requires assessment of its social meaning. In recent years the usage of social media has gained momentum to
Evaluating Functional Annotations of Enzymes Using the Gene Ontology.

Science.gov (United States)

Holliday, Gemma L; Davidson, Rebecca; Akiva, Eyal; Babbitt, Patricia C

2017-01-01

The Gene Ontology (GO) (Ashburner et al., Nat Genet 25(1):25-29, 2000) is a powerful tool in the informatics arsenal of methods for evaluating annotations in a protein dataset. From identifying the nearest well annotated homologue of a protein of interest to predicting where misannotation has occurred to knowing how confident you can be in the annotations assigned to those proteins is critical. In this chapter we explore what makes an enzyme unique and how we can use GO to infer aspects of protein function based on sequence similarity. These can range from identification of misannotation or other errors in a predicted function to accurate function prediction for an enzyme of entirely unknown function. Although GO annotation applies to any gene products, we focus here a describing our approach for hierarchical classification of enzymes in the Structure-Function Linkage Database (SFLD) (Akiva et al., Nucleic Acids Res 42(Database issue):D521-530, 2014) as a guide for informed utilisation of annotation transfer based on GO terms.
Anticipating forest and range land development in central Oregon (USA) for landscape analysis, with an example application involving mule deer

Science.gov (United States)

Jeffrey D. Kline; Alissa Moses; Theresa Burcsu

2010-01-01

Forest policymakers, public lands managers, and scientists in the Pacific Northwest (USA) seek ways to evaluate the landscape-level effects of policies and management through the multidisciplinary development and application of spatially explicit methods and models. The Interagency Mapping and Analysis Project (IMAP) is an ongoing effort to generate landscape-wide...
SNP mining porcine ESTs with MAVIANT, a novel tool for SNP evaluation and annotation

DEFF Research Database (Denmark)

Panitz, Frank; Stengaard, Henrik; Hornshoj, Henrik

2007-01-01

MOTIVATION: Single nucleotide polymorphisms (SNPs) analysis is an important means to study genetic variation. A fast and cost-efficient approach to identify large numbers of novel candidates is the SNP mining of large scale sequencing projects. The increasing availability of sequence trace data...... manual annotation, which is immediately accessible and can be easily shared with external collaborators. RESULTS: Large-scale SNP mining of polymorphisms bases on porcine EST sequences yielded more than 7900 candidate SNPs in coding regions (cSNPs), which were annotated relative to the human genome. Non...

AutoFACT: An Automatic Functional Annotation and Classification Tool

Directory of Open Access Journals (Sweden)

Lang B Franz

2005-06-01

Full Text Available Abstract Background Assignment of function to new molecular sequence data is an essential step in genomics projects. The usual process involves similarity searches of a given sequence against one or more databases, an arduous process for large datasets. Results We present AutoFACT, a fully automated and customizable annotation tool that assigns biologically informative functions to a sequence. Key features of this tool are that it (1 analyzes nucleotide and protein sequence data; (2 determines the most informative functional description by combining multiple BLAST reports from several user-selected databases; (3 assigns putative metabolic pathways, functional classes, enzyme classes, GeneOntology terms and locus names; and (4 generates output in HTML, text and GFF formats for the user's convenience. We have compared AutoFACT to four well-established annotation pipelines. The error rate of functional annotation is estimated to be only between 1–2%. Comparison of AutoFACT to the traditional top-BLAST-hit annotation method shows that our procedure increases the number of functionally informative annotations by approximately 50%. Conclusion AutoFACT will serve as a useful annotation tool for smaller sequencing groups lacking dedicated bioinformatics staff. It is implemented in PERL and runs on LINUX/UNIX platforms. AutoFACT is available at http://megasun.bch.umontreal.ca/Software/AutoFACT.htm.
LeARN: a platform for detecting, clustering and annotating non-coding RNAs

Directory of Open Access Journals (Sweden)

Schiex Thomas

2008-01-01

Full Text Available Abstract Background In the last decade, sequencing projects have led to the development of a number of annotation systems dedicated to the structural and functional annotation of protein-coding genes. These annotation systems manage the annotation of the non-protein coding genes (ncRNAs in a very crude way, allowing neither the edition of the secondary structures nor the clustering of ncRNA genes into families which are crucial for appropriate annotation of these molecules. Results LeARN is a flexible software package which handles the complete process of ncRNA annotation by integrating the layers of automatic detection and human curation. Conclusion This software provides the infrastructure to deal properly with ncRNAs in the framework of any annotation project. It fills the gap between existing prediction software, that detect independent ncRNA occurrences, and public ncRNA repositories, that do not offer the flexibility and interactivity required for annotation projects. The software is freely available from the download section of the website http://bioinfo.genopole-toulouse.prd.fr/LeARN
European landscape architecture and territorial strategies for water landscapes

DEFF Research Database (Denmark)

Diedrich, Lisa Babette

2010-01-01

This article sums up the author’s lecture at the 2009 Sydney Resilient Water Landscapes Symposium and presents a series of realized or planned European landscape architectural and urbanistic projects on water landscapes taken from the recently published book On Site/ Landscape Architecture Europe...... and accompanying reflections. The hypothesis is that further scientific research can help defining weaknesses and strengths of the existing water landscape designs in terms of resilience, extract principles and tools, improve the weak ones and communicate the strong ones and develop general quality criteria...... and tools for future resilient water landscapes....
Essential Annotation Schema for Ecology (EASE)—A framework supporting the efficient data annotation and faceted navigation in ecology

Science.gov (United States)

Eichenberg, David; Liebergesell, Mario; König-Ries, Birgitta; Wirth, Christian

2017-01-01

Ecology has become a data intensive science over the last decades which often relies on the reuse of data in cross-experimental analyses. However, finding data which qualifies for the reuse in a specific context can be challenging. It requires good quality metadata and annotations as well as efficient search strategies. To date, full text search (often on the metadata only) is the most widely used search strategy although it is known to be inaccurate. Faceted navigation is providing a filter mechanism which is based on fine granular metadata, categorizing search objects along numeric and categorical parameters relevant for their discovery. Selecting from these parameters during a full text search creates a system of filters which allows to refine and improve the results towards more relevance. We developed a framework for the efficient annotation and faceted navigation in ecology. It consists of an XML schema for storing the annotation of search objects and is accompanied by a vocabulary focused on ecology to support the annotation process. The framework consolidates ideas which originate from widely accepted metadata standards, textbooks, scientific literature, and vocabularies as well as from expert knowledge contributed by researchers from ecology and adjacent disciplines. PMID:29023519
Transcriptome sequencing and annotation for the Jamaican fruit bat (Artibeus jamaicensis.

Directory of Open Access Journals (Sweden)

Timothy I Shaw

Full Text Available The Jamaican fruit bat (Artibeus jamaicensis is one of the most common bats in the tropical Americas. It is thought to be a potential reservoir host of Tacaribe virus, an arenavirus closely related to the South American hemorrhagic fever viruses. We performed transcriptome sequencing and annotation from lung, kidney and spleen tissues using 454 and Illumina platforms to develop this species as an animal model. More than 100,000 contigs were assembled, with 25,000 genes that were functionally annotated. Of the remaining unannotated contigs, 80% were found within bat genomes or transcriptomes. Annotated genes are involved in a broad range of activities ranging from cellular metabolism to genome regulation through ncRNAs. Reciprocal BLAST best hits yielded 8,785 sequences that are orthologous to mouse, rat, cattle, horse and human. Species tree analysis of sequences from 2,378 loci was used to achieve 95% bootstrap support for the placement of bat as sister to the clade containing horse, dog, and cattle. Through substitution rate estimation between bat and human, 32 genes were identified with evidence for positive selection. We also identified 466 immune-related genes, which may be useful for studying Tacaribe virus infection of this species. The Jamaican fruit bat transcriptome dataset is a resource that should provide additional candidate markers for studying bat evolution and ecology, and tools for analysis of the host response and pathology of disease.
Analysis on Key Points of Construction and Management of Municipal Landscape Engineering

Science.gov (United States)

Liang, Mingxia; Fei, Cheng

2018-02-01

At present, China has made great efforts to promote the construction of ecological civilization and promote the development of ecological protection and environmental construction. It has important practical significance to maintain the ecological balance and environmental quality of our country. Especially with the gradual improvement in people’s awareness of environmental protection, so that the green of the city also put forward higher requirements at the same time with the rising of the level of urbanization. In the process of urban landscape construction, the rational planning of urban landscaping involves a lot of subject knowledge. In the green process, we should fully consider the system of urban development and construction in China, based on the design of urban development and long-term planning of the landscaping project. In addition, we must also consider the traditional layout of the city area and the physical and geographical situation and so on, to enhance the objective and scientific nature of urban landscape. Therefore, it is of great practical significance to ensure the quality of landscaping in the effective management of municipal landscape engineering.
College and University Rankings: Part 2--An Annotated Bibliography of Analysis, Criticism, and Evaluation.

Science.gov (United States)

Hattendorf, Lynn C.

1987-01-01

This annotated bibliography of recent articles and books on academic rankings updates an article in the Spring 1986 "RQ." Items are listed by subject and ranking in general; individual guides; subject areas including accounting, advertising, biogeography, business, communications, data communications, economics, music, publishing,…
ONEMercury: Towards Automatic Annotation of Earth Science Metadata

Science.gov (United States)

Tuarob, S.; Pouchard, L. C.; Noy, N.; Horsburgh, J. S.; Palanisamy, G.

2012-12-01

Earth sciences have become more data-intensive, requiring access to heterogeneous data collected from multiple places, times, and thematic scales. For example, research on climate change may involve exploring and analyzing observational data such as the migration of animals and temperature shifts across the earth, as well as various model-observation inter-comparison studies. Recently, DataONE, a federated data network built to facilitate access to and preservation of environmental and ecological data, has come to exist. ONEMercury has recently been implemented as part of the DataONE project to serve as a portal for discovering and accessing environmental and observational data across the globe. ONEMercury harvests metadata from the data hosted by multiple data repositories and makes it searchable via a common search interface built upon cutting edge search engine technology, allowing users to interact with the system, intelligently filter the search results on the fly, and fetch the data from distributed data sources. Linking data from heterogeneous sources always has a cost. A problem that ONEMercury faces is the different levels of annotation in the harvested metadata records. Poorly annotated records tend to be missed during the search process as they lack meaningful keywords. Furthermore, such records would not be compatible with the advanced search functionality offered by ONEMercury as the interface requires a metadata record be semantically annotated. The explosion of the number of metadata records harvested from an increasing number of data repositories makes it impossible to annotate the harvested records manually, urging the need for a tool capable of automatically annotating poorly curated metadata records. In this paper, we propose a topic-model (TM) based approach for automatic metadata annotation. Our approach mines topics in the set of well annotated records and suggests keywords for poorly annotated records based on topic similarity. We utilize the
Changing Landscape

DEFF Research Database (Denmark)

Tunby Gulbrandsen, Ib; Kamstrup, Andreas; Koed Madsen, Anders

with an analysis of the changing organizational landscape created by new ICT’s like Google, Facebook, Wikipedia, iPods, smart phones and Wi-Fi. Based on five netno- and ethno-graphic investigations of the intertwinement of ICT’s and organizational work, we point to three features that have changed the scene: new...
A Selected Annotated Bibliography on Work Time Options.

Science.gov (United States)

Ivantcho, Barbara

This annotated bibliography is divided into three sections. Section I contains annotations of general publications on work time options. Section II presents resources on flexitime and the compressed work week. In Section III are found resources related to these reduced work time options: permanent part-time employment, job sharing, voluntary…
Systematic interpretation of microarray data using experiment annotations

Directory of Open Access Journals (Sweden)

Frohme Marcus

2006-12-01

Full Text Available Abstract Background Up to now, microarray data are mostly assessed in context with only one or few parameters characterizing the experimental conditions under study. More explicit experiment annotations, however, are highly useful for interpreting microarray data, when available in a statistically accessible format. Results We provide means to preprocess these additional data, and to extract relevant traits corresponding to the transcription patterns under study. We found correspondence analysis particularly well-suited for mapping such extracted traits. It visualizes associations both among and between the traits, the hereby annotated experiments, and the genes, revealing how they are all interrelated. Here, we apply our methods to the systematic interpretation of radioactive (single channel and two-channel data, stemming from model organisms such as yeast and drosophila up to complex human cancer samples. Inclusion of technical parameters allows for identification of artifacts and flaws in experimental design. Conclusion Biological and clinical traits can act as landmarks in transcription space, systematically mapping the variance of large datasets from the predominant changes down toward intricate details.
The National Cancer Informatics Program (NCIP) Annotation and Image Markup (AIM) Foundation model.

Science.gov (United States)

Mongkolwat, Pattanasak; Kleper, Vladimir; Talbot, Skip; Rubin, Daniel

2014-12-01

Knowledge contained within in vivo imaging annotated by human experts or computer programs is typically stored as unstructured text and separated from other associated information. The National Cancer Informatics Program (NCIP) Annotation and Image Markup (AIM) Foundation information model is an evolution of the National Institute of Health's (NIH) National Cancer Institute's (NCI) Cancer Bioinformatics Grid (caBIG®) AIM model. The model applies to various image types created by various techniques and disciplines. It has evolved in response to the feedback and changing demands from the imaging community at NCI. The foundation model serves as a base for other imaging disciplines that want to extend the type of information the model collects. The model captures physical entities and their characteristics, imaging observation entities and their characteristics, markups (two- and three-dimensional), AIM statements, calculations, image source, inferences, annotation role, task context or workflow, audit trail, AIM creator details, equipment used to create AIM instances, subject demographics, and adjudication observations. An AIM instance can be stored as a Digital Imaging and Communications in Medicine (DICOM) structured reporting (SR) object or Extensible Markup Language (XML) document for further processing and analysis. An AIM instance consists of one or more annotations and associated markups of a single finding along with other ancillary information in the AIM model. An annotation describes information about the meaning of pixel data in an image. A markup is a graphical drawing placed on the image that depicts a region of interest. This paper describes fundamental AIM concepts and how to use and extend AIM for various imaging disciplines.
The propagation of varied timescale perturbations in landscapes

Science.gov (United States)

Bingham, N.; Johnson, K. N.; Bookhagen, B.; Chadwick, O.

2016-12-01

The classic assumption of steady-state landscapes greatly simplifies models of earth-surface processes. Theoretically, steady-state denotes time independence, but in real landscapes steady-state requires a timescale over which to assume (or document) no change. In the past, poor spatiotemporal resolution of eroding landscapes necessitated that shorter timescale perturbations be ignored in favor of regional formulations of rock uplift = erosion, 105, 6 years. Now, novel techniques and technologies provide an opportunity to define local landscape response to various timescales of perturbations; thus, allowing us to consider multiple steady-states on adjacent watersheds or even along a single watershed. This study seeks to identify the physical propagation of varied timescale perturbations in landscapes in order to provide an updated geomorphic context for interpreting critical zone processes. At our study site - Santa Cruz Island (SCI), CA - perturbations include sea level and climate fluctuations over 105 years coupled with pulses of overgrazing and extreme storm events during the last 200 years. Comprehensive knickpoint location maps and dated marine and fill terraces tighten the spatiotemporal constraints on erosion for SCI. In addition, the island hosts a wide range of lithologies, allowing us to compare lithologic effects on landscape response to perturbations. Our study uses lidar point clouds and high resolution (0.25 and 1 m) digital elevation model analysis to segment landscapes by the degree of their response to perturbations. Landscape response is measured by increases in topographic roughness. We ascertain roughness by analyzing the changes in different terrain attributes on multiple spatial scales: catchment, sub-catchments and individual hillslopes. Terrain attributes utilized include slope, curvature, local relief, flowpath length and contributing catchment area. Statistical analysis of these properties indicates narrower ranges in values for regions
Prepare-Participate-Connect: Active Learning with Video Annotation

Science.gov (United States)

Colasante, Meg; Douglas, Kathy

2016-01-01

Annotation of video provides students with the opportunity to view and engage with audiovisual content in an interactive and participatory way rather than in passive-receptive mode. This article discusses research into the use of video annotation in four vocational programs at RMIT University in Melbourne, which allowed students to interact with…
Developing Annotation Solutions for Online Data Driven Learning

Science.gov (United States)

Perez-Paredes, Pascual; Alcaraz-Calero, Jose M.

2009-01-01

Although "annotation" is a widely-researched topic in Corpus Linguistics (CL), its potential role in Data Driven Learning (DDL) has not been addressed in depth by Foreign Language Teaching (FLT) practitioners. Furthermore, most of the research in the use of DDL methods pays little attention to annotation in the design and implementation…
Effects of Reviewing Annotations and Homework Solutions on Math Learning Achievement

Science.gov (United States)

Hwang, Wu-Yuin; Chen, Nian-Shing; Shadiev, Rustam; Li, Jin-Sing

2011-01-01

Previous studies have demonstrated that making annotations can be a meaningful and useful learning method that promote metacognition and enhance learning achievement. A web-based annotation system, Virtual Pen (VPEN), which provides for the creation and review of annotations and homework solutions, has been developed to foster learning process…
TOPSAN: use of a collaborative environment for annotating, analyzing and disseminating data on JCSG and PSI structures

International Nuclear Information System (INIS)

Krishna, S. Sri; Weekes, Dana; Bakolitsa, Constantina; Elsliger, Marc-André; Wilson, Ian A.; Godzik, Adam; Wooley, John

2010-01-01

Specific use cases of TOPSAN, an innovative collaborative platform for creating, sharing and distributing annotations and insights about protein structures, such as those determined by high-throughput structural genomics in the Protein Structure Initiative (PSI), are described. TOPSAN is the main annotation platform for JCSG structures and serves as a conduit for initiating collaborations with the biological community, as illustrated in this special issue of Acta Crystallographica Section F. Developed at the JCSG with the goal of opening a dialogue on the novel protein structures with the broader biological community, TOPSAN is a unique tool for fostering distributed collaborations and provides an efficient pathway to peer-reviewed publications. The NIH Protein Structure Initiative centers, such as the Joint Center for Structural Genomics (JCSG), have developed highly efficient technological platforms that are capable of experimentally determining the three-dimensional structures of hundreds of proteins per year. However, the overwhelming majority of the almost 5000 protein structures determined by these centers have yet to be described in the peer-reviewed literature. In a high-throughput structural genomics environment, the process of structure determination occurs independently of any associated experimental characterization of function, which creates a challenge for the annotation and analysis of structures and the publication of these results. This challenge has been addressed by developing TOPSAN (‘The Open Protein Structure Annotation Network’), which enables the generation of knowledge via collaborations among globally distributed contributors supported by automated amalgamation of available information. TOPSAN currently provides annotations for all protein structures determined by the JCSG in addition to preliminary annotations on a large number of structures from the other PSI production centers. TOPSAN-enabled collaborations have resulted in
Spatiotemporal Analysis of Urban Land Cover Changes in Kigali, Rwanda Using Multitemporal Landsat Data and Landscape Metrics

Science.gov (United States)

Mugiraneza, T.; Haas, J.; Ban, Y.

2017-11-01

Mapping urbanization and ensuing environmental impacts using satellite data combined with landscape metrics has become a hot research topic. The objectives of the study are to analyze the spatio-temporal evolution of urbanization patterns of Kigali, Rwanda over the last three decades (from 1984 to 2015) using multitemporal Landsat data and to assess the associated environmental impact using landscape metrics. Landsat images, Normalized Difference Vegetation Index (NDVI), Grey Level Co-occurrence Matrix (GLCM) variance texture and digital elevation model (DEM) data were classified using a support vector machine (SVM). Eight landscape indices were derived from classified images for urbanization environment impact assessment. Seven land cover classes were derived with an overall accuracy exceeding 88 % with Kappa Coefficients around 0.8. As most prominent changes, cropland was reduced considerably in favour of built-up areas that increased from 2,349 ha to 11,579 ha between 1984 and 2015. During those 31 years, the increased number of patches in most land cover classes illustrated landscape fragmentation, especially for forest. The landscape configuration indices demonstrate that in general the land cover pattern remained stable for cropland but it was highly changed in built-up areas. Satellite-based analysis and quantification of urbanization and its effects using landscape metrics are found to be interesting for grassroots and provide a cost-effective method for urban information production. This information can be used for e.g. potential design and implementation of early warning systems that cater for urbanization effects.
Protannotator: a semiautomated pipeline for chromosome-wise functional annotation of the "missing" human proteome.

Science.gov (United States)

Islam, Mohammad T; Garg, Gagan; Hancock, William S; Risk, Brian A; Baker, Mark S; Ranganathan, Shoba

2014-01-03

The chromosome-centric human proteome project (C-HPP) aims to define the complete set of proteins encoded in each human chromosome. The neXtProt database (September 2013) lists 20,128 proteins for the human proteome, of which 3831 human proteins (∼19%) are considered "missing" according to the standard metrics table (released September 27, 2013). In support of the C-HPP initiative, we have extended the annotation strategy developed for human chromosome 7 "missing" proteins into a semiautomated pipeline to functionally annotate the "missing" human proteome. This pipeline integrates a suite of bioinformatics analysis and annotation software tools to identify homologues and map putative functional signatures, gene ontology, and biochemical pathways. From sequential BLAST searches, we have primarily identified homologues from reviewed nonhuman mammalian proteins with protein evidence for 1271 (33.2%) "missing" proteins, followed by 703 (18.4%) homologues from reviewed nonhuman mammalian proteins and subsequently 564 (14.7%) homologues from reviewed human proteins. Functional annotations for 1945 (50.8%) "missing" proteins were also determined. To accelerate the identification of "missing" proteins from proteomics studies, we generated proteotypic peptides in silico. Matching these proteotypic peptides to ENCODE proteogenomic data resulted in proteomic evidence for 107 (2.8%) of the 3831 "missing proteins, while evidence from a recent membrane proteomic study supported the existence for another 15 "missing" proteins. The chromosome-wise functional annotation of all "missing" proteins is freely available to the scientific community through our web server (http://biolinfo.org/protannotator).
A Set of Annotation Interfaces for Alignment of Parallel Corpora

Directory of Open Access Journals (Sweden)

Singh Anil Kumar

2014-09-01

Full Text Available Annotation interfaces for parallel corpora which fit in well with other tools can be very useful. We describe a set of annotation interfaces which fulfill this criterion. This set includes a sentence alignment interface, two different word or word group alignment interfaces and an initial version of a parallel syntactic annotation alignment interface. These tools can be used for manual alignment, or they can be used to correct automatic alignments. Manual alignment can be performed in combination with certain kinds of linguistic annotation. Most of these interfaces use a representation called the Shakti Standard Format that has been found to be very robust and has been used for large and successful projects. It ties together the different interfaces, so that the data created by them is portable across all tools which support this representation. The existence of a query language for data stored in this representation makes it possible to build tools that allow easy search and modification of annotated parallel data.

LocusTrack: Integrated visualization of GWAS results and genomic annotation.

Science.gov (United States)

Cuellar-Partida, Gabriel; Renteria, Miguel E; MacGregor, Stuart

2015-01-01

Genome-wide association studies (GWAS) are an important tool for the mapping of complex traits and diseases. Visual inspection of genomic annotations may be used to generate insights into the biological mechanisms underlying GWAS-identified loci. We developed LocusTrack, a web-based application that annotates and creates plots of regional GWAS results and incorporates user-specified tracks that display annotations such as linkage disequilibrium (LD), phylogenetic conservation, chromatin state, and other genomic and regulatory elements. Currently, LocusTrack can integrate annotation tracks from the UCSC genome-browser as well as from any tracks provided by the user. LocusTrack is an easy-to-use application and can be accessed at the following URL: http://gump.qimr.edu.au/general/gabrieC/LocusTrack/. Users can upload and manage GWAS results and select from and/or provide annotation tracks using simple and intuitive menus. LocusTrack scripts and associated data can be downloaded from the website and run locally.
"Annotated Lectures": Student-Instructor Interaction in Large-Scale Global Education

Directory of Open Access Journals (Sweden)

Roger Diehl

2009-10-01

Full Text Available We describe an "Annotated Lectures" system, which will be used in a global virtual teaching and student collaboration event on embodied intelligence presented by the University of Zurich. The lectures will be broadcasted via video-conference to lecture halls of different universities around the globe. Among other collaboration features, an "Annotated Lectures" system will be implemented in a 3D collaborative virtual environment and used by the participating students to make annotations to the video-recorded lectures, which will be sent to and answered by their supervisors, and forwarded to the lecturers in an aggregated way. The "Annotated Lectures" system aims to overcome the issues of limited studentinstructor interaction in large-scale education, and to foster an intercultural and multidisciplinary discourse among students who review the lectures in a group. After presenting the concept of the "Annotated Lectures" system, we discuss a prototype version including a description of the technical components and its expected benefit for large-scale global education.
Annotation an effective device for student feedback: a critical review of the literature.

Science.gov (United States)

Ball, Elaine C

2010-05-01

The paper examines hand-written annotation, its many features, difficulties and strengths as a feedback tool. It extends and clarifies what modest evidence is in the public domain and offers an evaluation of how to use annotation effectively in the support of student feedback [Marshall, C.M., 1998a. The Future of Annotation in a Digital (paper) World. Presented at the 35th Annual GLSLIS Clinic: Successes and Failures of Digital Libraries, June 20-24, University of Illinois at Urbana-Champaign, March 24, pp. 1-20; Marshall, C.M., 1998b. Toward an ecology of hypertext annotation. Hypertext. In: Proceedings of the Ninth ACM Conference on Hypertext and Hypermedia, June 20-24, Pittsburgh Pennsylvania, US, pp. 40-49; Wolfe, J.L., Nuewirth, C.M., 2001. From the margins to the centre: the future of annotation. Journal of Business and Technical Communication, 15(3), 333-371; Diyanni, R., 2002. One Hundred Great Essays. Addison-Wesley, New York; Wolfe, J.L., 2002. Marginal pedagogy: how annotated texts affect writing-from-source texts. Written Communication, 19(2), 297-333; Liu, K., 2006. Annotation as an index to critical writing. Urban Education, 41, 192-207; Feito, A., Donahue, P., 2008. Minding the gap annotation as preparation for discussion. Arts and Humanities in Higher Education, 7(3), 295-307; Ball, E., 2009. A participatory action research study on handwritten annotation feedback and its impact on staff and students. Systemic Practice and Action Research, 22(2), 111-124; Ball, E., Franks, H., McGrath, M., Leigh, J., 2009. Annotation is a valuable tool to enhance learning and assessment in student essays. Nurse Education Today, 29(3), 284-291]. Although a significant number of studies examine annotation, this is largely related to on-line tools and computer mediated communication and not hand-written annotation as comment, phrase or sign written on the student essay to provide critique. Little systematic research has been conducted to consider how this latter form
BioCause: Annotating and analysing causality in the biomedical domain.

Science.gov (United States)

Mihăilă, Claudiu; Ohta, Tomoko; Pyysalo, Sampo; Ananiadou, Sophia

2013-01-16

Biomedical corpora annotated with event-level information represent an important resource for domain-specific information extraction (IE) systems. However, bio-event annotation alone cannot cater for all the needs of biologists. Unlike work on relation and event extraction, most of which focusses on specific events and named entities, we aim to build a comprehensive resource, covering all statements of causal association present in discourse. Causality lies at the heart of biomedical knowledge, such as diagnosis, pathology or systems biology, and, thus, automatic causality recognition can greatly reduce the human workload by suggesting possible causal connections and aiding in the curation of pathway models. A biomedical text corpus annotated with such relations is, hence, crucial for developing and evaluating biomedical text mining. We have defined an annotation scheme for enriching biomedical domain corpora with causality relations. This schema has subsequently been used to annotate 851 causal relations to form BioCause, a collection of 19 open-access full-text biomedical journal articles belonging to the subdomain of infectious diseases. These documents have been pre-annotated with named entity and event information in the context of previous shared tasks. We report an inter-annotator agreement rate of over 60% for triggers and of over 80% for arguments using an exact match constraint. These increase significantly using a relaxed match setting. Moreover, we analyse and describe the causality relations in BioCause from various points of view. This information can then be leveraged for the training of automatic causality detection systems. Augmenting named entity and event annotations with information about causal discourse relations could benefit the development of more sophisticated IE systems. These will further influence the development of multiple tasks, such as enabling textual inference to detect entailments, discovering new facts and providing new
Overcoming function annotation errors in the Gram-positive pathogen Streptococcus suis by a proteomics-driven approach

Directory of Open Access Journals (Sweden)

Bárcena José A

2008-12-01

Full Text Available Abstract Background Annotation of protein-coding genes is a key step in sequencing projects. Protein functions are mainly assigned on the basis of the amino acid sequence alone by searching of homologous proteins. However, fully automated annotation processes often lead to wrong prediction of protein functions, and therefore time-intensive manual curation is often essential. Here we describe a fast and reliable way to correct function annotation in sequencing projects, focusing on surface proteomes. We use a proteomics approach, previously proven to be very powerful for identifying new vaccine candidates against Gram-positive pathogens. It consists of shaving the surface of intact cells with two proteases, the specific cleavage-site trypsin and the unspecific proteinase K, followed by LC/MS/MS analysis of the resulting peptides. The identified proteins are contrasted by computational analysis and their sequences are inspected to correct possible errors in function prediction. Results When applied to the zoonotic pathogen Streptococcus suis, of which two strains have been recently sequenced and annotated, we identified a set of surface proteins without cytoplasmic contamination: all the proteins identified had exporting or retention signals towards the outside and/or the cell surface, and viability of protease-treated cells was not affected. The combination of both experimental evidences and computational methods allowed us to determine that two of these proteins are putative extracellular new adhesins that had been previously attributed a wrong cytoplasmic function. One of them is a putative component of the pilus of this bacterium. Conclusion We illustrate the complementary nature of laboratory-based and computational methods to examine in concert the localization of a set of proteins in the cell, and demonstrate the utility of this proteomics-based strategy to experimentally correct function annotation errors in sequencing projects. This
haploR: an R package for querying web-based annotation tools.

Science.gov (United States)

Zhbannikov, Ilya Y; Arbeev, Konstantin; Ukraintseva, Svetlana; Yashin, Anatoliy I

2017-01-01

We developed haploR , an R package for querying web based genome annotation tools HaploReg and RegulomeDB. haploR gathers information in a data frame which is suitable for downstream bioinformatic analyses. This will facilitate post-genome wide association studies streamline analysis for rapid discovery and interpretation of genetic associations.
WaveformECG: A Platform for Visualizing, Annotating, and Analyzing ECG Data.

Science.gov (United States)

Winslow, Raimond L; Granite, Stephen; Jurado, Christian

2016-01-01

The electrocardiogram (ECG) is the most commonly collected data in cardiovascular research because of the ease with which it can be measured and because changes in ECG waveforms reflect underlying aspects of heart disease. Accessed through a browser, WaveformECG is an open source platform supporting interactive analysis, visualization, and annotation of ECGs.
Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation.

Science.gov (United States)

Clark, Alex M; Bunin, Barry A; Litterman, Nadia K; Schürer, Stephan C; Visser, Ubbo

2014-01-01

Bioinformatics and computer aided drug design rely on the curation of a large number of protocols for biological assays that measure the ability of potential drugs to achieve a therapeutic effect. These assay protocols are generally published by scientists in the form of plain text, which needs to be more precisely annotated in order to be useful to software methods. We have developed a pragmatic approach to describing assays according to the semantic definitions of the BioAssay Ontology (BAO) project, using a hybrid of machine learning based on natural language processing, and a simplified user interface designed to help scientists curate their data with minimum effort. We have carried out this work based on the premise that pure machine learning is insufficiently accurate, and that expecting scientists to find the time to annotate their protocols manually is unrealistic. By combining these approaches, we have created an effective prototype for which annotation of bioassay text within the domain of the training set can be accomplished very quickly. Well-trained annotations require single-click user approval, while annotations from outside the training set domain can be identified using the search feature of a well-designed user interface, and subsequently used to improve the underlying models. By drastically reducing the time required for scientists to annotate their assays, we can realistically advocate for semantic annotation to become a standard part of the publication process. Once even a small proportion of the public body of bioassay data is marked up, bioinformatics researchers can begin to construct sophisticated and useful searching and analysis algorithms that will provide a diverse and powerful set of tools for drug discovery researchers.
Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation

Directory of Open Access Journals (Sweden)

Alex M. Clark

2014-08-01

Full Text Available Bioinformatics and computer aided drug design rely on the curation of a large number of protocols for biological assays that measure the ability of potential drugs to achieve a therapeutic effect. These assay protocols are generally published by scientists in the form of plain text, which needs to be more precisely annotated in order to be useful to software methods. We have developed a pragmatic approach to describing assays according to the semantic definitions of the BioAssay Ontology (BAO project, using a hybrid of machine learning based on natural language processing, and a simplified user interface designed to help scientists curate their data with minimum effort. We have carried out this work based on the premise that pure machine learning is insufficiently accurate, and that expecting scientists to find the time to annotate their protocols manually is unrealistic. By combining these approaches, we have created an effective prototype for which annotation of bioassay text within the domain of the training set can be accomplished very quickly. Well-trained annotations require single-click user approval, while annotations from outside the training set domain can be identified using the search feature of a well-designed user interface, and subsequently used to improve the underlying models. By drastically reducing the time required for scientists to annotate their assays, we can realistically advocate for semantic annotation to become a standard part of the publication process. Once even a small proportion of the public body of bioassay data is marked up, bioinformatics researchers can begin to construct sophisticated and useful searching and analysis algorithms that will provide a diverse and powerful set of tools for drug discovery researchers.
Automatic Function Annotations for Hoare Logic

Directory of Open Access Journals (Sweden)

Daniel Matichuk

2012-11-01

Full Text Available In systems verification we are often concerned with multiple, inter-dependent properties that a program must satisfy. To prove that a program satisfies a given property, the correctness of intermediate states of the program must be characterized. However, this intermediate reasoning is not always phrased such that it can be easily re-used in the proofs of subsequent properties. We introduce a function annotation logic that extends Hoare logic in two important ways: (1 when proving that a function satisfies a Hoare triple, intermediate reasoning is automatically stored as function annotations, and (2 these function annotations can be exploited in future Hoare logic proofs. This reduces duplication of reasoning between the proofs of different properties, whilst serving as a drop-in replacement for traditional Hoare logic to avoid the costly process of proof refactoring. We explain how this was implemented in Isabelle/HOL and applied to an experimental branch of the seL4 microkernel to significantly reduce the size and complexity of existing proofs.
Automatically Annotated Mapping for Indoor Mobile Robot Applications

DEFF Research Database (Denmark)

Özkil, Ali Gürcan; Howard, Thomas J.

2012-01-01

This paper presents a new and practical method for mapping and annotating indoor environments for mobile robot use. The method makes use of 2D occupancy grid maps for metric representation, and topology maps to indicate the connectivity of the ‘places-of-interests’ in the environment. Novel use...... localization and mapping in topology space, and fuses camera and robot pose estimations to build an automatically annotated global topo-metric map. It is developed as a framework for a hospital service robot and tested in a real hospital. Experiments show that the method is capable of producing globally...... consistent, automatically annotated hybrid metric-topological maps that is needed by mobile service robots....
AGORA : Organellar genome annotation from the amino acid and nucleotide references.

Science.gov (United States)

Jung, Jaehee; Kim, Jong Im; Jeong, Young-Sik; Yi, Gangman

2018-03-29

Next-generation sequencing (NGS) technologies have led to the accumulation of highthroughput sequence data from various organisms in biology. To apply gene annotation of organellar genomes for various organisms, more optimized tools for functional gene annotation are required. Almost all gene annotation tools are mainly focused on the chloroplast genome of land plants or the mitochondrial genome of animals.We have developed a web application AGORA for the fast, user-friendly, and improved annotations of organellar genomes. AGORA annotates genes based on a BLAST-based homology search and clustering with selected reference sequences from the NCBI database or user-defined uploaded data. AGORA can annotate the functional genes in almost all mitochondrion and plastid genomes of eukaryotes. The gene annotation of a genome with an exon-intron structure within a gene or inverted repeat region is also available. It provides information of start and end positions of each gene, BLAST results compared with the reference sequence, and visualization of gene map by OGDRAW. Users can freely use the software, and the accessible URL is https://bigdata.dongguk.edu/gene_project/AGORA/.The main module of the tool is implemented by the python and php, and the web page is built by the HTML and CSS to support all browsers. gangman@dongguk.edu.
An Approach to Function Annotation for Proteins of Unknown Function (PUFs in the Transcriptome of Indian Mulberry.

Directory of Open Access Journals (Sweden)

K H Dhanyalakshmi

Full Text Available The modern sequencing technologies are generating large volumes of information at the transcriptome and genome level. Translation of this information into a biological meaning is far behind the race due to which a significant portion of proteins discovered remain as proteins of unknown function (PUFs. Attempts to uncover the functional significance of PUFs are limited due to lack of easy and high throughput functional annotation tools. Here, we report an approach to assign putative functions to PUFs, identified in the transcriptome of mulberry, a perennial tree commonly cultivated as host of silkworm. We utilized the mulberry PUFs generated from leaf tissues exposed to drought stress at whole plant level. A sequence and structure based computational analysis predicted the probable function of the PUFs. For rapid and easy annotation of PUFs, we developed an automated pipeline by integrating diverse bioinformatics tools, designated as PUFs Annotation Server (PUFAS, which also provides a web service API (Application Programming Interface for a large-scale analysis up to a genome. The expression analysis of three selected PUFs annotated by the pipeline revealed abiotic stress responsiveness of the genes, and hence their potential role in stress acclimation pathways. The automated pipeline developed here could be extended to assign functions to PUFs from any organism in general. PUFAS web server is available at http://caps.ncbs.res.in/pufas/ and the web service is accessible at http://capservices.ncbs.res.in/help/pufas.
Visibility Analysis of the Oriental Pearl Based on Digital Landscape Simulation – View from East Daming Road of Shanghai

Directory of Open Access Journals (Sweden)

S. Liu

2015-08-01

Full Text Available As the demand for visual quality of environment increases, visual analysis therefore plays progressively important role in current urban landscape construction and management. Guided by the City Image theory, this paper presents a covered scene index “X” to describe the visibility of the target scene, and formulates a digital analysis model based on ArcGIS and 3D simulation. This method is applied to the viewpoint analysis from the East Daming Road of the North Bund to the Oriental Pearl in Shanghai and optimized solutions are proposed according to the results. It turns out that this simple and objective technique can serve as a good tool for the reference of urban landscape planning and management.
Sustaining ecosystem services in cultural landscapes

DEFF Research Database (Denmark)

Plieninger, Tobias; van der Horst, Dan; Schleyer, Christian

2014-01-01

Classical conservation approaches focus on the man-made degradation of ecosystems and tend to neglect the socialecological values that human land uses have imprinted on many environments. Throughout the world, ingenious land-use practices have generated unique cultural landscapes...... research and management. With this paper, we introduce a special feature that aims to enhance the theoretical, empirical and practical knowledge of how to safeguard the resilience of ecosystem services in cultural landscapes. It concludes (1) that the usefulness of the ecosystem services approach...... to the analysis and management of cultural landscapes should be reviewed more critically; (2) that conventional ecosystem services assessment needs to be complemented by socio-cultural valuation; (3) that cultural landscapes are inherently changing, so that a dynamic view on ecosystem services and a focus...
Systemic Planning: An Annotated Bibliography and Literature Guide. Exchange Bibliography No. 91.

Science.gov (United States)

Catanese, Anthony James

Systemic planning is an operational approach to using scientific rigor and qualitative judgment in a complementary manner. It integrates rigorous techniques and methods from systems analysis, cybernetics, decision theory, and work programing. The annotated reference sources in this bibliography include those works that have been most influential…
THE IDENTIFICATION OF THE LOOKOUT POINTS WITH A ROLE IN THE TOURISM VALORISATION OF LANDSCAPE IN THE DISTRICT OF CICEU. VIEWSHED ANALYSIS

Directory of Open Access Journals (Sweden)

Alexandra-Camelia POTRA

2016-11-01

Full Text Available The increase of the visibility of a territory, in order to highlight the potential of the landscape, seen as a resource which can ensure the performance of various social-economic activities in a territory becomes achievable by arranging lookout points. Conducting the visibility analysis of such points is a common topic in the works of territorial planning, namely in terms of valorisation of landscapes and the visual impact assessment of their characteristics. The natural and the anthropic component of the District of Ciceu work together to create a valuable resource of the landscape, which are most often surprising by the presence of pitoresque valleys preserving the territorial specificities. The present study aims, in a first phase, that by the methods and tools specific to the onsite step (observation, mapping, to identify possible locations, including vestiges and historic buildings, suitable for arranging representative lookout points to present the elements of the landscape in the District of Ciceu. In order to achieve the visibility analysis, in order to highlight the assessment potential of district’s ”landscape” from the mapped lookout points, was resorted to ViewshedAnalysys method of the ArcGIS software. The use of the mapping leads to the materialization of theresults of the different cartographic representations, useful for the landscape recovery activities. The results of the study consist in generating the visibility areas of the lookout points, development of tourist routes that integrate the lookout points mapped on-site, namely to capitalize the landscape elements of the District of Ciceu.
Studying Oogenesis in a Non-model Organism Using Transcriptomics: Assembling, Annotating, and Analyzing Your Data.

Science.gov (United States)

Carter, Jean-Michel; Gibbs, Melanie; Breuker, Casper J

2016-01-01

This chapter provides a guide to processing and analyzing RNA-Seq data in a non-model organism. This approach was implemented for studying oogenesis in the Speckled Wood Butterfly Pararge aegeria. We focus in particular on how to perform a more informative primary annotation of your non-model organism by implementing our multi-BLAST annotation strategy. We also provide a general guide to other essential steps in the next-generation sequencing analysis workflow. Before undertaking these methods, we recommend you familiarize yourself with command line usage and fundamental concepts of database handling. Most of the operations in the primary annotation pipeline can be performed in Galaxy (or equivalent standalone versions of the tools) and through the use of common database operations (e.g. to remove duplicates) but other equivalent programs and/or custom scripts can be implemented for further automation.
Landscape services as boundary concept in landscape governance: Building social capital in collaboration and adapting the landscape

NARCIS (Netherlands)

Westerink, Judith; Opdam, Paul; Rooij, Van Sabine; Steingröver, Eveliene

2017-01-01

The landscape services concept provides a lens to study relations within the social-ecological networks that landscapes are, and to identify stakeholders as either providers or beneficiaries. However, landscape services can also be used as a boundary concept in collaborative landscape governance. We
An annotated corpus with nanomedicine and pharmacokinetic parameters

Directory of Open Access Journals (Sweden)

Lewinski NA

2017-10-01

Full Text Available Nastassja A Lewinski,1 Ivan Jimenez,1 Bridget T McInnes2 1Department of Chemical and Life Science Engineering, Virginia Commonwealth University, Richmond, VA, 2Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA Abstract: A vast amount of data on nanomedicines is being generated and published, and natural language processing (NLP approaches can automate the extraction of unstructured text-based data. Annotated corpora are a key resource for NLP and information extraction methods which employ machine learning. Although corpora are available for pharmaceuticals, resources for nanomedicines and nanotechnology are still limited. To foster nanotechnology text mining (NanoNLP efforts, we have constructed a corpus of annotated drug product inserts taken from the US Food and Drug Administration’s Drugs@FDA online database. In this work, we present the development of the Engineered Nanomedicine Database corpus to support the evaluation of nanomedicine entity extraction. The data were manually annotated for 21 entity mentions consisting of nanomedicine physicochemical characterization, exposure, and biologic response information of 41 Food and Drug Administration-approved nanomedicines. We evaluate the reliability of the manual annotations and demonstrate the use of the corpus by evaluating two state-of-the-art named entity extraction systems, OpenNLP and Stanford NER. The annotated corpus is available open source and, based on these results, guidelines and suggestions for future development of additional nanomedicine corpora are provided. Keywords: nanotechnology, informatics, natural language processing, text mining, corpora

An Annotation Agnostic Algorithm for Detecting Nascent RNA Transcripts in GRO-Seq.

Science.gov (United States)

Azofeifa, Joseph G; Allen, Mary A; Lladser, Manuel E; Dowell, Robin D

2017-01-01

We present a fast and simple algorithm to detect nascent RNA transcription in global nuclear run-on sequencing (GRO-seq). GRO-seq is a relatively new protocol that captures nascent transcripts from actively engaged polymerase, providing a direct read-out on bona fide transcription. Most traditional assays, such as RNA-seq, measure steady state RNA levels which are affected by transcription, post-transcriptional processing, and RNA stability. GRO-seq data, however, presents unique analysis challenges that are only beginning to be addressed. Here, we describe a new algorithm, Fast Read Stitcher (FStitch), that takes advantage of two popular machine-learning techniques, hidden Markov models and logistic regression, to classify which regions of the genome are transcribed. Given a small user-defined training set, our algorithm is accurate, robust to varying read depth, annotation agnostic, and fast. Analysis of GRO-seq data without a priori need for annotation uncovers surprising new insights into several aspects of the transcription process.
Annotated Tsunami bibliography: 1962-1976

International Nuclear Information System (INIS)

Pararas-Carayannis, G.; Dong, B.; Farmer, R.

1982-08-01

This compilation contains annotated citations to nearly 3000 tsunami-related publications from 1962 to 1976 in English and several other languages. The foreign-language citations have English titles and abstracts
LANDSCAPE PATTERN VERSUS FARMING DEVELOPMENT. THE CASE OF THE POST-MARL HOLLOWS (PMH LANDSCAPE IN POLAND

Directory of Open Access Journals (Sweden)

Iwona MARKUSZEWSKA

2017-04-01

Full Text Available This article discusses the presence of unique landscape elements (post-marl hollows in an intensively operated farming region. The selected case study (the environs of Krotoszyn in the Wielkopolska region, Poland represents an example of a human-designed landscape in which post-marl hollows (PMH were created as a consequence of the soil-marling process. PMH are vital ecological habitats which have an influence on the high quality of the rural landscape. Currently, as PMH are undergoing the vanishing process, it seems essential to find a solution to maintain these unique landscape elements. The main aim of this paper is to analyse human-nature relationships during the current changes taking place in the agricultural landscape in order to get the answer to the question: how is this affecting the PMH? The study evaluates farmers’ pro-environmental behaviour and defines their identity with the local landscape that has been beneficial in predicting the forthcoming shaping of the landscape. The data describing the PMH were collected in the years 1998-2016, while conducting cartographic analysis, a literature review and repetitive field research. Additionally, remote sensing and geoportal database were used to analyse the PMH changes. Also, interviews and discussions with farmers and representatives of the local administrative body were conducted. Results indicated that local rural communities are little concerned about the environment, particularly when it pertains to the issue of conservation of the local landscape heritage (i.e. PMH. In addition, the lack of support from the administrative body makes more difficult the opportunity to maintain a unique landscape pattern, as described in this paper.
IIS--Integrated Interactome System: a web-based platform for the annotation, analysis and visualization of protein-metabolite-gene-drug interactions by integrating a variety of data sources and tools.

Science.gov (United States)

Carazzolle, Marcelo Falsarella; de Carvalho, Lucas Miguel; Slepicka, Hugo Henrique; Vidal, Ramon Oliveira; Pereira, Gonçalo Amarante Guimarães; Kobarg, Jörg; Meirelles, Gabriela Vaz

2014-01-01

High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two
Metingear: a development environment for annotating genome-scale metabolic models.

Science.gov (United States)

May, John W; James, A Gordon; Steinbeck, Christoph

2013-09-01

Genome-scale metabolic models often lack annotations that would allow them to be used for further analysis. Previous efforts have focused on associating metabolites in the model with a cross reference, but this can be problematic if the reference is not freely available, multiple resources are used or the metabolite is added from a literature review. Associating each metabolite with chemical structure provides unambiguous identification of the components and a more detailed view of the metabolism. We have developed an open-source desktop application that simplifies the process of adding database cross references and chemical structures to genome-scale metabolic models. Annotated models can be exported to the Systems Biology Markup Language open interchange format. Source code, binaries, documentation and tutorials are freely available at http://johnmay.github.com/metingear. The application is implemented in Java with bundles available for MS Windows and Macintosh OS X.
Landscape Potential Analysis for Ecotourism Destination in the Resort Ii Salak Mountain, Halimun-Salak National Park

Science.gov (United States)

Kusumoarto, A.; Gunawan, A.; Nurazizah, G. R.

2017-10-01

The Resort II Salak Mountain has variety of landscape potential for created as ecotourism destination, especially the potential of the waterfall (curug) and sulphur crater (Kawah Ratu). The aim of this study was to identify and analyze the potential resources of the landscape to be created as ecotourism destination, Resort II Salak Mountain. This research was conducted through two phases: 1) identification of the attractions location that have potential resources for ecotourism destination, and 2) analysis of the level of potential resource of the landscape in each location using Analysis of Tourist Attraction Operational Destination (ATAOD). The study showed Resort II Salak Mountain has many ecotourism objects which have been used for ecotourism activities, such as hot spring baths, Curug Cigamea, Curug Ngumpet, Curug Seribu, Curug Pangeran, Curug Muara, Curug Cihurang, Kawah Ratu, camping ground, Curug Kondang and Curug Alami. The location of all waterfalls -curug, spread widely in the core zone for ecotourism. In the other hand, camping ground is located in the business zone, while Kawah Ratu is located in the natural forest, which is included in the buffer zone of Halimun-Salak National Park (HSNP). The result showed that the ecotourism objects with the highest potential value are Kawah Ratu, Curug Seribu, Curug Muara, Curug Kondang and Curug Ngumpet.
Annotation-Based Whole Genomic Prediction and Selection

DEFF Research Database (Denmark)

Kadarmideen, Haja; Do, Duy Ngoc; Janss, Luc

Genomic selection is widely used in both animal and plant species, however, it is performed with no input from known genomic or biological role of genetic variants and therefore is a black box approach in a genomic era. This study investigated the role of different genomic regions and detected QTLs...... in their contribution to estimated genomic variances and in prediction of genomic breeding values by applying SNP annotation approaches to feed efficiency. Ensembl Variant Predictor (EVP) and Pig QTL database were used as the source of genomic annotation for 60K chip. Genomic prediction was performed using the Bayes...... classes. Predictive accuracy was 0.531, 0.532, 0.302, and 0.344 for DFI, RFI, ADG and BF, respectively. The contribution per SNP to total genomic variance was similar among annotated classes across different traits. Predictive performance of SNP classes did not significantly differ from randomized SNP...
Alternative future analysis for assessing the potential impact of climate change on urban landscape dynamics.

Science.gov (United States)

He, Chunyang; Zhao, Yuanyuan; Huang, Qingxu; Zhang, Qiaofeng; Zhang, Da

2015-11-01

Assessing the impact of climate change on urban landscape dynamics (ULD) is the foundation for adapting to climate change and maintaining urban landscape sustainability. This paper demonstrates an alternative future analysis by coupling a system dynamics (SD) and a cellular automata (CA) model. The potential impact of different climate change scenarios on ULD from 2009 to 2030 was simulated and evaluated in the Beijing-Tianjin-Tangshan megalopolis cluster area (BTT-MCA). The results suggested that the integrated model, which combines the advantages of the SD and CA model, has the strengths of spatial quantification and flexibility. Meanwhile, the results showed that the influence of climate change would become more severe over time. In 2030, the potential urban area affected by climate change will be 343.60-1260.66 km(2) (5.55 -20.37 % of the total urban area, projected by the no-climate-change-effect scenario). Therefore, the effects of climate change should not be neglected when designing and managing urban landscape. Copyright © 2015 Elsevier B.V. All rights reserved.
Creating New Medical Ontologies for Image Annotation A Case Study

CERN Document Server

Stanescu, Liana; Brezovan, Marius; Mihai, Cristian Gabriel

2012-01-01

Creating New Medical Ontologies for Image Annotation focuses on the problem of the medical images automatic annotation process, which is solved in an original manner by the authors. All the steps of this process are described in detail with algorithms, experiments and results. The original algorithms proposed by authors are compared with other efficient similar algorithms. In addition, the authors treat the problem of creating ontologies in an automatic way, starting from Medical Subject Headings (MESH). They have presented some efficient and relevant annotation models and also the basics of the annotation model used by the proposed system: Cross Media Relevance Models. Based on a text query the system will retrieve the images that contain objects described by the keywords.
Elucidating high-dimensional cancer hallmark annotation via enriched ontology.

Science.gov (United States)

Yan, Shankai; Wong, Ka-Chun

2017-09-01

Cancer hallmark annotation is a promising technique that could discover novel knowledge about cancer from the biomedical literature. The automated annotation of cancer hallmarks could reveal relevant cancer transformation processes in the literature or extract the articles that correspond to the cancer hallmark of interest. It acts as a complementary approach that can retrieve knowledge from massive text information, advancing numerous focused studies in cancer research. Nonetheless, the high-dimensional nature of cancer hallmark annotation imposes a unique challenge. To address the curse of dimensionality, we compared multiple cancer hallmark annotation methods on 1580 PubMed abstracts. Based on the insights, a novel approach, UDT-RF, which makes use of ontological features is proposed. It expands the feature space via the Medical Subject Headings (MeSH) ontology graph and utilizes novel feature selections for elucidating the high-dimensional cancer hallmark annotation space. To demonstrate its effectiveness, state-of-the-art methods are compared and evaluated by a multitude of performance metrics, revealing the full performance spectrum on the full set of cancer hallmarks. Several case studies are conducted, demonstrating how the proposed approach could reveal novel insights into cancers. https://github.com/cskyan/chmannot. Copyright © 2017 Elsevier Inc. All rights reserved.
Rfam: annotating families of non-coding RNA sequences.

Science.gov (United States)

Daub, Jennifer; Eberhardt, Ruth Y; Tate, John G; Burge, Sarah W

2015-01-01

The primary task of the Rfam database is to collate experimentally validated noncoding RNA (ncRNA) sequences from the published literature and facilitate the prediction and annotation of new homologues in novel nucleotide sequences. We group homologous ncRNA sequences into "families" and related families are further grouped into "clans." We collate and manually curate data cross-references for these families from other databases and external resources. Our Web site offers researchers a simple interface to Rfam and provides tools with which to annotate their own sequences using our covariance models (CMs), through our tools for searching, browsing, and downloading information on Rfam families. In this chapter, we will work through examples of annotating a query sequence, collating family information, and searching for data.
Fifty-year spatiotemporal analysis of landscape changes in the Mont Saint-Hilaire UNESCO Biosphere Reserve (Quebec, Canada).

Science.gov (United States)

Béliveau, Marc; Germain, Daniel; Ianăş, Ana-Neli

2017-05-01

Diachronic analysis with a GIS-based classification of land-use changes based on aerial photographs, orthophotos, topographic maps, geotechnical reports, urban plans, and using landscape metrics has permitted insight into the driving forces responsible for landscape fragmentation in the Mont Saint-Hilaire (MSH) Biosphere Reserve over the period 1958-2015. Although the occurrence of exogenous factors, such as extreme weather and fires, can have a significant influence on the fragmentation of the territory in time and space, the accelerated development of the built environment (+470%) is nevertheless found to be primarily responsible for landscape fragmentation and the loss of areas formerly occupied by orchards, agriculture, and woodlands. The landscape metrics used corroborate these results, with a simplification of the shape of polygons, and once again reveal the difficulties of harmonizing different land uses. MSH has become somewhat of a forest island in a sea of residential development and agriculture. To counter this isolation of fragmented habitat components, forest corridors have been proposed and developed for the Biosphere Reserve and particularly for the core area. Two corridors, to the north and south, are used to connect the protected area and other wooded areas at the regional scale, in order to promote genetic exchange between populations of various species. In that regard, the forest buffer zone around the hill continues to play a key role and has great ecological value for species and ecological preservation and conservation. However, appropriate management and landscape preservation actions should recognize and focus on landscape composition and the associated geographical configuration.
Qualifying Urban Landscapes

DEFF Research Database (Denmark)

Juel Clemmensen, Thomas; Daugaard, Morten; Nielsen, Tom

This paper is based on a research project aimed at contributing to the qualification of the aesthetical value of the contemporary urban landscape. We see our work as part of a tradition within the architectural profession of making explorative projects, which combines analysis of the contemporary...
Experimental-confirmation and functional-annotation of predicted proteins in the chicken genome

Directory of Open Access Journals (Sweden)

McCarthy Fiona M

2007-11-01

Full Text Available Abstract Background The chicken genome was sequenced because of its phylogenetic position as a non-mammalian vertebrate, its use as a biomedical model especially to study embryology and development, its role as a source of human disease organisms and its importance as the major source of animal derived food protein. However, genomic sequence data is, in itself, of limited value; generally it is not equivalent to understanding biological function. The benefit of having a genome sequence is that it provides a basis for functional genomics. However, the sequence data currently available is poorly structurally and functionally annotated and many genes do not have standard nomenclature assigned. Results We analysed eight chicken tissues and improved the chicken genome structural annotation by providing experimental support for the in vivo expression of 7,809 computationally predicted proteins, including 30 chicken proteins that were only electronically predicted or hypothetical translations in human. To improve functional annotation (based on Gene Ontology, we mapped these identified proteins to their human and mouse orthologs and used this orthology to transfer Gene Ontology (GO functional annotations to the chicken proteins. The 8,213 orthology-based GO annotations that we produced represent an 8% increase in currently available chicken GO annotations. Orthologous chicken products were also assigned standardized nomenclature based on current chicken nomenclature guidelines. Conclusion We demonstrate the utility of high-throughput expression proteomics for rapid experimental structural annotation of a newly sequenced eukaryote genome. These experimentally-supported predicted proteins were further annotated by assigning the proteins with standardized nomenclature and functional annotation. This method is widely applicable to a diverse range of species. Moreover, information from one genome can be used to improve the annotation of other genomes and
On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report.

Directory of Open Access Journals (Sweden)

Paul D Thomas

Full Text Available A recent paper (Nehrt et al., PLoS Comput. Biol. 7:e1002073, 2011 has proposed a metric for the "functional similarity" between two genes that uses only the Gene Ontology (GO annotations directly derived from published experimental results. Applying this metric, the authors concluded that paralogous genes within the mouse genome or the human genome are more functionally similar on average than orthologous genes between these genomes, an unexpected result with broad implications if true. We suggest, based on both theoretical and empirical considerations, that this proposed metric should not be interpreted as a functional similarity, and therefore cannot be used to support any conclusions about the "ortholog conjecture" (or, more properly, the "ortholog functional conservation hypothesis". First, we reexamine the case studies presented by Nehrt et al. as examples of orthologs with divergent functions, and come to a very different conclusion: they actually exemplify how GO annotations for orthologous genes provide complementary information about conserved biological functions. We then show that there is a global ascertainment bias in the experiment-based GO annotations for human and mouse genes: particular types of experiments tend to be performed in different model organisms. We conclude that the reported statistical differences in annotations between pairs of orthologous genes do not reflect differences in biological function, but rather complementarity in experimental approaches. Our results underscore two general considerations for researchers proposing novel types of analysis based on the GO: 1 that GO annotations are often incomplete, potentially in a biased manner, and subject to an "open world assumption" (absence of an annotation does not imply absence of a function, and 2 that conclusions drawn from a novel, large-scale GO analysis should whenever possible be supported by careful, in-depth examination of examples, to help ensure the
Annotating risk factors for heart disease in clinical narratives for diabetic patients.

Science.gov (United States)

Stubbs, Amber; Uzuner, Özlem

2015-12-01

The 2014 i2b2/UTHealth natural language processing shared task featured a track focused on identifying risk factors for heart disease (specifically, Cardiac Artery Disease) in clinical narratives. For this track, we used a "light" annotation paradigm to annotate a set of 1304 longitudinal medical records describing 296 patients for risk factors and the times they were present. We designed the annotation task for this track with the goal of balancing annotation load and time with quality, so as to generate a gold standard corpus that can benefit a clinically-relevant task. We applied light annotation procedures and determined the gold standard using majority voting. On average, the agreement of annotators with the gold standard was above 0.95, indicating high reliability. The resulting document-level annotations generated for each record in each longitudinal EMR in this corpus provide information that can support studies of progression of heart disease risk factors in the included patients over time. These annotations were used in the Risk Factor track of the 2014 i2b2/UTHealth shared task. Participating systems achieved a mean micro-averaged F1 measure of 0.815 and a maximum F1 measure of 0.928 for identifying these risk factors in patient records. Copyright © 2015 Elsevier Inc. All rights reserved.
Landscape dynamics analysis in Iasi Metropolitan Area (Romania using remote sensing data

Directory of Open Access Journals (Sweden)

CÃTÃLIN CÎMPIANU

2013-08-01

Full Text Available The present paper focuses on the observation and quantification of land cover changes in Iasi Metropolitan Area during 1993-2009. The analysis is centered upon the built-up space dynamics and includes the detection of its extension directions and the measurement of its structural changes by landscape metrics. In order to obtain the land cover data, some remote sensing images were processed by supervised classification and Normalized Difference Vegetation Index (NDVI. In the end of the study, a synthetic statistical analysis of the change dynamics is performed at commune level, in order to compare the administrative units by the intensity of land cover dynamics.
Annotating Emotions in Meetings

NARCIS (Netherlands)

Reidsma, Dennis; Heylen, Dirk K.J.; Ordelman, Roeland J.F.

We present the results of two trials testing procedures for the annotation of emotion and mental state of the AMI corpus. The first procedure is an adaptation of the FeelTrace method, focusing on a continuous labelling of emotion dimensions. The second method is centered around more discrete
WImpiBLAST: web interface for mpiBLAST to help biologists perform large-scale annotation using high performance computing.

Directory of Open Access Journals (Sweden)

Parichit Sharma

Full Text Available The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture
WImpiBLAST: web interface for mpiBLAST to help biologists perform large-scale annotation using high performance computing.

Science.gov (United States)

Sharma, Parichit; Mantri, Shrikant S

2014-01-01

The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design

The Analysis of Tree Species Distribution Information Extraction and Landscape Pattern Based on Remote Sensing Images

Directory of Open Access Journals (Sweden)

Yi Zeng

2017-08-01

Full Text Available The forest ecosystem is the largest land vegetation type, which plays the role of unreplacement with its unique value. And in the landscape scale, the research on forest landscape pattern has become the current hot spot, wherein the study of forest canopy structure is very important. They determines the process and the strength of forests energy flow, which influences the adjustments of ecosystem for climate and species diversity to some extent. The extraction of influencing factors of canopy structure and the analysis of the vegetation distribution pattern are especially important. To solve the problems, remote sensing technology, which is superior to other technical means because of its fine timeliness and large-scale monitoring, is applied to the study. Taking Lingkong Mountain as the study area, the paper uses the remote sensing image to analyze the forest distribution pattern and obtains the spatial characteristics of canopy structure distribution, and DEM data are as the basic data to extract the influencing factors of canopy structure. In this paper, pattern of trees distribution is further analyzed by using terrain parameters, spatial analysis tools and surface processes quantitative simulation. The Hydrological Analysis tool is used to build distributed hydrological model, and corresponding algorithm is applied to determine surface water flow path, rivers network and basin boundary. Results show that forest vegetation distribution of dominant tree species present plaque on the landscape scale and their distribution have spatial heterogeneity which is related to terrain factors closely. After the overlay analysis of aspect, slope and forest distribution pattern respectively, the most suitable area for stand growth and the better living condition are obtained.
Consumer energy research: an annotated bibliography. Vol. 3

Energy Technology Data Exchange (ETDEWEB)

Anderson, D.C.; McDougall, G.H.G.

1983-04-01

This annotated bibliography attempts to provide a comprehensive package of existing information in consumer related energy research. A concentrated effort was made to collect unpublished material as well as material from journals and other sources, including governments, utilities research institutes and private firms. A deliberate effort was made to include agencies outside North America. For the most part the bibliography is limited to annotations of empiracal studies. However, it includes a number of descriptive reports which appear to make a significant contribution to understanding consumers and energy use. The format of the annotations displays the author, date of publication, title and source of the study. Annotations of empirical studies are divided into four parts: objectives, methods, variables and findings/implications. Care was taken to provide a reasonable amount of detail in the annotations to enable the reader to understand the methodology, the results and the degree to which the implications fo the study can be generalized to other situations. Studies are arranged alphabetically by author. The content of the studies reviewed is classified in a series of tables which are intended to provide a summary of sources, types and foci of the various studies. These tables are intended to aid researchers interested in specific topics to locate those studies most relevant to their work. The studies are categorized using a number of different classification criteria, for example, methodology used, type of energy form, type of policy initiative, and type of consumer activity. A general overview of the studies is also presented. 17 tabs.
Expanded microbial genome coverage and improved protein family annotation in the COG database.

Science.gov (United States)

Galperin, Michael Y; Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V

2015-01-01

Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the
Systematically profiling and annotating long intergenic non-coding RNAs in human embryonic stem cell.

Science.gov (United States)

Tang, Xing; Hou, Mei; Ding, Yang; Li, Zhaohui; Ren, Lichen; Gao, Ge

2013-01-01

While more and more long intergenic non-coding RNAs (lincRNAs) were identified to take important roles in both maintaining pluripotency and regulating differentiation, how these lincRNAs may define and drive cell fate decisions on a global scale are still mostly elusive. Systematical profiling and comprehensive annotation of embryonic stem cells lincRNAs may not only bring a clearer big picture of these novel regulators but also shed light on their functionalities. Based on multiple RNA-Seq datasets, we systematically identified 300 human embryonic stem cell lincRNAs (hES lincRNAs). Of which, one forth (78 out of 300) hES lincRNAs were further identified to be biasedly expressed in human ES cells. Functional analysis showed that they were preferentially involved in several early-development related biological processes. Comparative genomics analysis further suggested that around half of the identified hES lincRNAs were conserved in mouse. To facilitate further investigation of these hES lincRNAs, we constructed an online portal for biologists to access all their sequences and annotations interactively. In addition to navigation through a genome browse interface, users can also locate lincRNAs through an advanced query interface based on both keywords and expression profiles, and analyze results through multiple tools. By integrating multiple RNA-Seq datasets, we systematically characterized and annotated 300 hES lincRNAs. A full functional web portal is available freely at http://scbrowse.cbi.pku.edu.cn. As the first global profiling and annotating of human embryonic stem cell lincRNAs, this work aims to provide a valuable resource for both experimental biologists and bioinformaticians.
Landscape Analysis of Drone Congregation Areas of the Honey Bee, Apis mellifera

Science.gov (United States)

Galindo-Cardona, Alberto; Monmany, A. Carolina; Moreno-Jackson, Rafiné; Rivera-Rivera, Carlos; Huertas-Dones, Carlos; Caicedo-Quiroga, Laura; Giray, Tugrul

2012-01-01

Male honey bees fly and gather at Drone Congregation Areas (DCAs), where drones and queens mate in flight. DCAs occur in places with presumably characteristic features. Using previously described landscape characteristics and observations on flight direction of drones in nearby apiaries, 36 candidate locations were chosen across the main island of Puerto Rico. At these locations, the presence or absence of DCAs was tested by lifting a helium balloon equipped with queen-sex-pheromone-impregnated bait, and visually determining the presence of high numbers of drones. Because of the wide distribution of honey bees in Puerto Rico, it was expected that most of the potential DCAs would be used as such by drones and queens from nearby colonies. Eight DCAs were found in the 36 candidate locations. Locations with and without DCAs were compared in a landscape analysis including characteristics that were described to be associated with DCAs and others. Aspect (direction of slope) and density of trails were found to be significantly associated with the presence of DCAs. PMID:23451901
Image annotation based on positive-negative instances learning

Science.gov (United States)

Zhang, Kai; Hu, Jiwei; Liu, Quan; Lou, Ping

2017-07-01

Automatic image annotation is now a tough task in computer vision, the main sense of this tech is to deal with managing the massive image on the Internet and assisting intelligent retrieval. This paper designs a new image annotation model based on visual bag of words, using the low level features like color and texture information as well as mid-level feature as SIFT, and mixture the pic2pic, label2pic and label2label correlation to measure the correlation degree of labels and images. We aim to prune the specific features for each single label and formalize the annotation task as a learning process base on Positive-Negative Instances Learning. Experiments are performed using the Corel5K Dataset, and provide a quite promising result when comparing with other existing methods.
An Annotated Dataset of 14 Meat Images

DEFF Research Database (Denmark)

Stegmann, Mikkel Bille

2002-01-01

This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given.......This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....
Editorial: What do we mean by "landscape"?

Science.gov (United States)

Paul H. Gobster; Wei-Ning. Xiang

2012-01-01

As a prelude to revising the Aims and Scope of Landscape and Urban Planning (LAND), our last editorial discussed the journalâs "intellectual landscape" as revealed by an analysis of conceptual and proximal relationships between articles published in LAND and 50 other research journals. The six conceptual themes we identified--ecology, planning and management...
Using Annotated Conceptual Models to Derive Information System Implementations

Directory of Open Access Journals (Sweden)

Anthony Berglas

1994-05-01

Full Text Available Producing production quality information systems from conceptual descriptions is a time consuming process that employs many of the world's programmers. Although most of this programming is fairly routine, the process has not been amenable to simple automation because conceptual models do not provide sufficient parameters to make all the implementation decisions that are required, and numerous special cases arise in practice. Most commercial CASE tools address these problems by essentially implementing a waterfall model in which the development proceeds from analysis through design, layout and coding phases in a partially automated manner, but the analyst/programmer must heavily edit each intermediate stage. This paper demonstrates that by recognising the nature of information systems, it is possible to specify applications completely using a conceptual model that has een annotated with additional parameters that guide automated implementation. More importantly, it will be argued that a manageable number of annotations are sufficient to implement realistic applications, and techniques will be described that enabled the author's commercial CASE tool, the Intelligent Develope to automated implementation without requiring complex theorem proving technology.
EST-PAC a web package for EST annotation and protein sequence prediction

Directory of Open Access Journals (Sweden)

Strahm Yvan

2006-10-01

Full Text Available Abstract With the decreasing cost of DNA sequencing technology and the vast diversity of biological resources, researchers increasingly face the basic challenge of annotating a larger number of expressed sequences tags (EST from a variety of species. This typically consists of a series of repetitive tasks, which should be automated and easy to use. The results of these annotation tasks need to be stored and organized in a consistent way. All these operations should be self-installing, platform independent, easy to customize and amenable to using distributed bioinformatics resources available on the Internet. In order to address these issues, we present EST-PAC a web oriented multi-platform software package for expressed sequences tag (EST annotation. EST-PAC provides a solution for the administration of EST and protein sequence annotations accessible through a web interface. Three aspects of EST annotation are automated: 1 searching local or remote biological databases for sequence similarities using Blast services, 2 predicting protein coding sequence from EST data and, 3 annotating predicted protein sequences with functional domain predictions. In practice, EST-PAC integrates the BLASTALL suite, EST-Scan2 and HMMER in a relational database system accessible through a simple web interface. EST-PAC also takes advantage of the relational database to allow consistent storage, powerful queries of results and, management of the annotation process. The system allows users to customize annotation strategies and provides an open-source data-management environment for research and education in bioinformatics.
Genome-Wide Detection and Analysis of Multifunctional Genes

Science.gov (United States)

Pritykin, Yuri; Ghersi, Dario; Singh, Mona

2015-01-01

Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms—H. sapiens, D. melanogaster, and S. cerevisiae—and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality. PMID:26436655
BEACON: automated tool for Bacterial GEnome Annotation ComparisON

KAUST Repository

Kalkatawi, Manal M.; Alam, Intikhab; Bajic, Vladimir B.

2015-01-01

We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/
ANALYSIS ON THE INFLUENCE OF ACCUMULATION EFFECT OF LANDSCAPE COLOR ON TRAFFIC SAFETY IN THE FOGGY SECTIONS OF EXPRESSWAYS

OpenAIRE

Xilei Li; Boming Tang; Qianghui Song

2017-01-01

The landscape color of expressways has a significant impact on a driver's visual response, thus affecting the incidence of expressways traffic accidents. Although this problem has been analyzed and discussed from different angles, there is rare quantitative analysis on the influence of accumulation effect of landscape color on the traffic safety in foggy sections of expressways. In this paper, the color combination and stroboflash of fog lamps on both sides of the road were designed from the ...
[Application of land economic ecological niche in landscape pattern analysis at county level: A case study of Jinghe County in Xinjiang, China].

Science.gov (United States)

Yu, Hai-yang; Zhang, Fei; Wang, Juan; Zhou, Mei

2015-12-01

The theory of land economic ecological niche was used to analyze the regional landscape pattern in this article, with an aim to provide a new method for the characterization and representation of landscape pattern. The Jinghe County region, which is ecologically fragile, was selected as an example for the study, and the Landsat images of 1990, 1998, 2011 and 2013 were selected as remote sensing data. The land economic ecological niche of land use types calculated by ecostate-ecorole theory, combined with landscape ecology theory, was discussed in application of land economic ecological niche in county landscape pattern analysis. The results showed that, during the study period, the correlations between land economic ecological niche of farmland, construction land, and grassland with the parameters, including landscape patch number (NP), aggregated index (AI), fragmented index (FN) and fractal dimension (FD), were significant. Regional landscape was driven by the changes of land economic ecological niche, and the trend of economic development could be represented by land economic ecological niche change in Jinghe County. Land economic ecological niche was closely related with the land use types which could yield direct economic benefits, which could well explain the landscape pattern characteristics in Jinghe County when combined with the landscape indices.
Connecting Brabant's cover sand landscapes through landscape history

Science.gov (United States)

Heskes, Erik; van den Ancker, Hanneke; Jungerius, Pieter Dirk; Harthoorn, Jaap; Maes, Bert; Leenders, Karel; de Jongh, Piet; Kluiving, Sjoerd; van den Oetelaar, Ger

2015-04-01

Noord-Brabant has the largest variety of cover sand landscapes in The Netherlands, and probably in Western Europe. During the Last Ice Age the area was not covered by land ice and a polar desert developed in which sand dunes buried the existing river landscapes. Some of these polar dune landscapes experienced a geomorphological and soil development that remained virtually untouched up to the present day, such as the low parabolic dunes of the Strabrechtse Heide or the later and higher dunes of the Oisterwijkse Vennen. As Noord-Brabant lies on the fringe of a tectonic basin, the thickness of cover sand deposits in the Centrale Slenk, part of a rift through Europe, amounts up to 20 metres. Cover sand deposits along the fault lines cause the special phenomenon of 'wijst' to develop, in which the higher grounds are wetter than the boarding lower grounds. Since 4000 BC humans settled in these cover sand landscapes and made use of its small-scale variety. An example are the prehistoric finds on the flanks and the historic towns on top of the 'donken' in northwest Noord-Brabant, where the cover sand landscapes are buried by river and marine deposits and only the peaks of the dunes protrude as donken. Or the church of Handel that is built beside a 'wijst' source and a site of pilgrimage since living memory. Or the 'essen' and plaggen agriculture that developed along the stream valleys of Noord-Brabant from 1300 AD onwards, giving rise to geomorphological features as 'randwallen' and plaggen soils of more than a metre thickness. Each region of Brabant each has its own approach in attracting tourists and has not yet used this common landscape history to connect, manage and promote their territories. We propose a landscape-historical approach to develop a national or European Geopark Brabants' cover sand landscapes, in which each region focuses on a specific part of the landscape history of Brabant, that stretches from the Late Weichselian polar desert when the dune
Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression.

Science.gov (United States)

Arnaiz, Olivier; Van Dijk, Erwin; Bétermier, Mireille; Lhuillier-Akakpo, Maoussi; de Vanssay, Augustin; Duharcourt, Sandra; Sallet, Erika; Gouzy, Jérôme; Sperling, Linda

2017-06-26

The 15 sibling species of the Paramecium aurelia cryptic species complex emerged after a whole genome duplication that occurred tens of millions of years ago. Given extensive knowledge of the genetics and epigenetics of Paramecium acquired over the last century, this species complex offers a uniquely powerful system to investigate the consequences of whole genome duplication in a unicellular eukaryote as well as the genetic and epigenetic mechanisms that drive speciation. High quality Paramecium gene models are important for research using this system. The major aim of the work reported here was to build an improved gene annotation pipeline for the Paramecium lineage. We generated oriented RNA-Seq transcriptome data across the sexual process of autogamy for the model species Paramecium tetraurelia. We determined, for the first time in a ciliate, candidate P. tetraurelia transcription start sites using an adapted Cap-Seq protocol. We developed TrUC, multi-threaded Perl software that in conjunction with TopHat mapping of RNA-Seq data to a reference genome, predicts transcription units for the annotation pipeline. We used EuGene software to combine annotation evidence. The high quality gene structural annotations obtained for P. tetraurelia were used as evidence to improve published annotations for 3 other Paramecium species. The RNA-Seq data were also used for differential gene expression analysis, providing a gene expression atlas that is more sensitive than the previously established microarray resource. We have developed a gene annotation pipeline tailored for the compact genomes and tiny introns of Paramecium species. A novel component of this pipeline, TrUC, predicts transcription units using Cap-Seq and oriented RNA-Seq data. TrUC could prove useful beyond Paramecium, especially in the case of high gene density. Accurate predictions of 3' and 5' UTR will be particularly valuable for studies of gene expression (e.g. nucleosome positioning, identification of cis
Landscape Valuation of Environmental Amenities throughoutthe Application of Direct and Indirect Methods

Directory of Open Access Journals (Sweden)

Luís Loures

2015-01-01

Full Text Available Landscape design, construction and management should no longer be the result of superficial approaches based exclusively on designers’ and planners’ ideas. This research starts with the assumption that the aesthetic component constitutes an essential attribute for better understanding and evaluating landscapes. This study analyzes the aesthetic quality and economic valuation of the Lower Guadiana river landscape, through the application of direct and indirect landscape evaluation methods. In order to gauge not only experts’ opinion, it is supported by the application of public participation techniques about the opinion and perceptions of the site visitors/users. The present research considered the analysis of six landscape subunits regarding landscape quality, fragility and visual absorption capacity. The obtained results showed that there are significant differences between the perceptions of the general public and experts’ analysis. Touristic Complexes and Golf Courses had high visual quality, while Agricultural and Production Areas had high visual fragility. Moreover, the performed analysis made clear that the combined use of landscape assessment methods is suited to this type of study, since it enables quantifying the value of existence, management and maintenance of a particular environmental assets and/or services.
Diversity Indices as Measures of Functional Annotation Methods in Metagenomics Studies

KAUST Repository

Jankovic, Boris R.

2016-01-26

Applications of high-throughput techniques in metagenomics studies produce massive amounts of data. Fragments of genomic, transcriptomic and proteomic molecules are all found in metagenomics samples. Laborious and meticulous effort in sequencing and functional annotation are then required to, amongst other objectives, reconstruct a taxonomic map of the environment that metagenomics samples were taken from. In addition to computational challenges faced by metagenomics studies, the analysis is further complicated by the presence of contaminants in the samples, potentially resulting in skewed taxonomic analysis. The functional annotation in metagenomics can utilize all available omics data and therefore different methods that are associated with a particular type of data. For example, protein-coding DNA, non-coding RNA or ribosomal RNA data can be used in such an analysis. These methods would have their advantages and disadvantages and the question of comparison among them naturally arises. There are several criteria that can be used when performing such a comparison. Loosely speaking, methods can be evaluated in terms of computational complexity or in terms of the expected biological accuracy. We propose that the concept of diversity that is used in the ecosystems and species diversity studies can be successfully used in evaluating certain aspects of the methods employed in metagenomics studies. We show that when applying the concept of Hill’s diversity, the analysis of variations in the diversity order provides valuable clues into the robustness of methods used in the taxonomical analysis.
Automatic medical image annotation and keyword-based image retrieval using relevance feedback.

Science.gov (United States)

Ko, Byoung Chul; Lee, JiHyeon; Nam, Jae-Yeal

2012-08-01

This paper presents novel multiple keywords annotation for medical images, keyword-based medical image retrieval, and relevance feedback method for image retrieval for enhancing image retrieval performance. For semantic keyword annotation, this study proposes a novel medical image classification method combining local wavelet-based center symmetric-local binary patterns with random forests. For keyword-based image retrieval, our retrieval system use the confidence score that is assigned to each annotated keyword by combining probabilities of random forests with predefined body relation graph. To overcome the limitation of keyword-based image retrieval, we combine our image retrieval system with relevance feedback mechanism based on visual feature and pattern classifier. Compared with other annotation and relevance feedback algorithms, the proposed method shows both improved annotation performance and accurate retrieval results.
Transcriptator: An Automated Computational Pipeline to Annotate Assembled Reads and Identify Non Coding RNA.

Directory of Open Access Journals (Sweden)

Kumar Parijat Tripathi

Full Text Available RNA-seq is a new tool to measure RNA transcript counts, using high-throughput sequencing at an extraordinary accuracy. It provides quantitative means to explore the transcriptome of an organism of interest. However, interpreting this extremely large data into biological knowledge is a problem, and biologist-friendly tools are lacking. In our lab, we developed Transcriptator, a web application based on a computational Python pipeline with a user-friendly Java interface. This pipeline uses the web services available for BLAST (Basis Local Search Alignment Tool, QuickGO and DAVID (Database for Annotation, Visualization and Integrated Discovery tools. It offers a report on statistical analysis of functional and Gene Ontology (GO annotation's enrichment. It helps users to identify enriched biological themes, particularly GO terms, pathways, domains, gene/proteins features and protein-protein interactions related informations. It clusters the transcripts based on functional annotations and generates a tabular report for functional and gene ontology annotations for each submitted transcript to the web server. The implementation of QuickGo web-services in our pipeline enable the users to carry out GO-Slim analysis, whereas the integration of PORTRAIT (Prediction of transcriptomic non coding RNA (ncRNA by ab initio methods helps to identify the non coding RNAs and their regulatory role in transcriptome. In summary, Transcriptator is a useful software for both NGS and array data. It helps the users to characterize the de-novo assembled reads, obtained from NGS experiments for non-referenced organisms, while it also performs the functional enrichment analysis of differentially expressed transcripts/genes for both RNA-seq and micro-array experiments. It generates easy to read tables and interactive charts for better understanding of the data. The pipeline is modular in nature, and provides an opportunity to add new plugins in the future. Web application is

MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects.

Science.gov (United States)

Holt, Carson; Yandell, Mark

2011-12-22

Second-generation sequencing technologies are precipitating major shifts with regards to what kinds of genomes are being sequenced and how they are annotated. While the first generation of genome projects focused on well-studied model organisms, many of today's projects involve exotic organisms whose genomes are largely terra incognita. This complicates their annotation, because unlike first-generation projects, there are no pre-existing 'gold-standard' gene-models with which to train gene-finders. Improvements in genome assembly and the wide availability of mRNA-seq data are also creating opportunities to update and re-annotate previously published genome annotations. Today's genome projects are thus in need of new genome annotation tools that can meet the challenges and opportunities presented by second-generation sequencing technologies. We present MAKER2, a genome annotation and data management tool designed for second-generation genome projects. MAKER2 is a multi-threaded, parallelized application that can process second-generation datasets of virtually any size. We show that MAKER2 can produce accurate annotations for novel genomes where training-data are limited, of low quality or even non-existent. MAKER2 also provides an easy means to use mRNA-seq data to improve annotation quality; and it can use these data to update legacy annotations, significantly improving their quality. We also show that MAKER2 can evaluate the quality of genome annotations, and identify and prioritize problematic annotations for manual review. MAKER2 is the first annotation engine specifically designed for second-generation genome projects. MAKER2 scales to datasets of any size, requires little in the way of training data, and can use mRNA-seq data to improve annotation quality. It can also update and manage legacy genome annotation datasets.
Annotation: The Savant Syndrome

Science.gov (United States)

Heaton, Pamela; Wallace, Gregory L.

2004-01-01

Background: Whilst interest has focused on the origin and nature of the savant syndrome for over a century, it is only within the past two decades that empirical group studies have been carried out. Methods: The following annotation briefly reviews relevant research and also attempts to address outstanding issues in this research area.…
Assessment of landscape diversity and determination of landscape hotspots - a case of Slovenia

Science.gov (United States)

Perko, Drago; Ciglič, Rok; Hrvatin, Mauro

2017-04-01

Areas with high landscape diversity can be regarded as landscape hotspots, and vice versa areas with low landscape diversity can be marked as landscape coldspots. The main purpose of this paper is to use quantitative geoinformatical approach and identify parts of our test area (the country of Slovenia) that can be described as very diverse according to natural landscapes and natural elements. We used different digital raster data of natural elements and landscape classifications and defined landscape diversity and landscape hotspots. We defined diversity for each raster pixel by counting the number of different unique types of landscape elements and types of landscapes in its neighborhood. Namely, the method was used separately to define diversity according to natural elements (types of relief forms, rocks, and vegetation) and diversity according to existing geographical landscape classifications of Slovenia (types of landscapes). In both cases one-tenth of Slovenia's surface with the highest landscape diversity was defined as landscape hotspots. The same applies to the coldspots. Additionally we tested the same method of counting different types of landscapes in certain radius also for the area of Europe in order to find areas that are more diverse at continental level. By doing so we were able to find areas that have similar level of diversity as Slovenia according to different European landscape classifications. Areas with landscape diversity may have an advantage in economic development, especially in tourism. Such areas are also important for biodiversity, habitat, and species diversity. On the other hand, localities where various natural influences mix can also be areas where it is hard to transfer best practices from one place to another because of the varying responses of the landscapes to human intervention. Thus it is important to know where areas with high landscape diversity are.
Two-generation analysis of pollen flow across a landscape. I. Male gamete heterogeneity among females.

Science.gov (United States)

Smouse, P E; Dyer, R J; Westfall, R D; Sork, V L

2001-02-01

Gene flow is a key factor in the spatial genetic structure in spatially distributed species. Evolutionary biologists interested in microevolutionary processess and conservation biologists interested in the impact of landscape change require a method that measures the real time process of gene movement. We present a novel two-generation (parent-offspring) approach to the study of genetic structure (TwoGener) that allows us to quantify heterogeneity among the male gamete pools sampled by maternal trees scattered across the landscape and to estimate mean pollination distance and effective neighborhood size. First, we describe the model's elements: genetic distance matrices to estimate intergametic distances, molecular analysis of variance to determine whether pollen profiles differ among mothers, and optimal sampling considerations. Second, we evaluate the model's effectiveness by simulating spatially distributed populations. Spatial heterogeneity in male gametes can be estimated by phiFT, a male gametic analogue of Wright's F(ST) and an inverse function of mean pollination distance. We illustrate TwoGener in cases where the male gamete can be categorically or ambiguously determined. This approach does not require the high level of genetic resolution needed by parentage analysis, but the ambiguous case is vulnerable to bias in the absence of adequate genetic resolution. Finally, we apply TwoGener to an empirical study of Quercus alba in Missouri Ozark forests. We find that phiFT = 0.06, translating into about eight effective pollen donors per female and an effective pollination neighborhood as a circle of radius about 17 m. Effective pollen movement in Q. alba is more restricted than previously realized, even though pollen is capable of moving large distances. This case study illustrates that, with a modest investment in field survey and laboratory analysis, the TwoGener approach permits inferences about landscape-level gene movements.
First generation annotations for the fathead minnow (Pimephales promelas) genome

Science.gov (United States)

Ab initio gene prediction and evidence alignment were used to produce the first annotations for the fathead minnow SOAPdenovo genome assembly. Additionally, a genome browser hosted at genome.setac.org provides simplified access to the annotation data in context with fathead minno...
Patterns and drivers of land use change in selected European rural landscapes

DEFF Research Database (Denmark)

Kristensen, Søren Bech Pilgaard; Busck, Anne Gravsholt; van der Sluis, Theo

2016-01-01

concerns are less dominant and many landscape and land use changes are undertaken to improve public goods or fulfil personal and family ambitions and values. This paper investigates the patterns of farm-level land use changes that occurred between 2002 and 2012 in three different landscape regions...... with their engagement in land use changes. Common to all areas is that agricultural production is under pressure due to physical or socio-economic challenges. The results indicate that relatively more nature or landscape features have been added by landowners than removed by them in the six study areas. Furthermore......, the analysis revealed that full-time landowners were responsible for the largest proportion of landscape change and that the areas involved differed greatly. The analysis also underlined the variety of European landscapes, as many landscape activities exhibited strong geographical patterns. A multivariate...
Learning topography with Tangible Landscape games

Science.gov (United States)

Petrasova, A.; Tabrizian, P.; Harmon, B. A.; Petras, V.; Millar, G.; Mitasova, H.; Meentemeyer, R. K.

2017-12-01

Understanding topography and its representations is crucial for correct interpretation and modeling of surface processes. However, novice earth science and landscape architecture students often find reading topographic maps challenging. As a result, many students struggle to comprehend more complex spatial concepts and processes such as flow accumulation or sediment transport.We developed and tested a new method for teaching hydrology, geomorphology, and grading using Tangible Landscape—a tangible interface for geospatial modeling. Tangible Landscape couples a physical and digital model of a landscape through a real-time cycle of hands-on modeling, 3D scanning, geospatial computation, and projection. With Tangible Landscape students can sculpt a projection-augmented topographic model of a landscape with their hands and use a variety of tangible objects to immediately see how they are changing geospatial analytics such as contours, profiles, water flow, or landform types. By feeling and manipulating the shape of the topography, while seeing projected geospatial analytics, students can intuitively learn about 3D topographic form, its representations, and how topography controls physical processes. Tangible Landscape is powered by GRASS GIS, an open source geospatial platform with extensive libraries for geospatial modeling and analysis. As such, Tangible Landscape can be used to design a wide range of learning experiences across a large number of geoscience disciplines.As part of a graduate level course that teaches grading, 16 students participated in a series of workshops, which were developed as serious games to encourage learning through structured play. These serious games included 1) diverting rain water to a specified location with minimal changes to landscape, 2) building different combinations of landforms, and 3) reconstructing landscapes based on projected contour information with feedback.In this poster, we will introduce Tangible Landscape, and
Experimental quantum control landscapes: Inherent monotonicity and artificial structure

International Nuclear Information System (INIS)

Roslund, Jonathan; Rabitz, Herschel

2009-01-01

Unconstrained searches over quantum control landscapes are theoretically predicted to generally exhibit trap-free monotonic behavior. This paper makes an explicit experimental demonstration of this intrinsic monotonicity for two controlled quantum systems: frequency unfiltered and filtered second-harmonic generation (SHG). For unfiltered SHG, the landscape is randomly sampled and interpolation of the data is found to be devoid of landscape traps up to the level of data noise. In the case of narrow-band-filtered SHG, trajectories are taken on the landscape to reveal a lack of traps. Although the filtered SHG landscape is trap free, it exhibits a rich local structure. A perturbation analysis around the top of these landscapes provides a basis to understand their topology. Despite the inherent trap-free nature of the landscapes, practical constraints placed on the controls can lead to the appearance of artificial structure arising from the resultant forced sampling of the landscape. This circumstance and the likely lack of knowledge about the detailed local landscape structure in most quantum control applications suggests that the a priori identification of globally successful (un)constrained curvilinear control variables may be a challenging task.
Control landscapes for observable preparation with open quantum systems

International Nuclear Information System (INIS)

Wu Rebing; Pechen, Alexander; Rabitz, Herschel; Hsieh, Michael; Tsou, Benjamin

2008-01-01

A quantum control landscape is defined as the observable as a function(al) of the system control variables. Such landscapes were introduced to provide a basis to understand the increasing number of successful experiments controlling quantum dynamics phenomena. This paper extends the concept to encompass the broader context of the environment having an influence. For the case that the open system dynamics are fully controllable, it is shown that the control landscape for open systems can be lifted to the analysis of an equivalent auxiliary landscape of a closed composite system that contains the environmental interactions. This inherent connection can be analyzed to provide relevant information about the topology of the original open system landscape. Application to the optimization of an observable expectation value reveals the same landscape simplicity observed in former studies on closed systems. In particular, no false suboptimal traps exist in the system control landscape when seeking to optimize an observable, even in the presence of complex environments. Moreover, a quantitative study of the control landscape of a system interacting with a thermal environment shows that the enhanced controllability attainable with open dynamics significantly broadens the range of the achievable observable values over the control landscape
Integrative analysis of functional genomic annotations and sequencing data to identify rare causal variants via hierarchical modeling

Directory of Open Access Journals (Sweden)

Marinela eCapanu

2015-05-01

Full Text Available Identifying the small number of rare causal variants contributing to disease has beena major focus of investigation in recent years, but represents a formidable statisticalchallenge due to the rare frequencies with which these variants are observed. In thiscommentary we draw attention to a formal statistical framework, namely hierarchicalmodeling, to combine functional genomic annotations with sequencing data with theobjective of enhancing our ability to identify rare causal variants. Using simulations weshow that in all configurations studied, the hierarchical modeling approach has superiordiscriminatory ability compared to a recently proposed aggregate measure of deleteriousness,the Combined Annotation-Dependent Depletion (CADD score, supportingour premise that aggregate functional genomic measures can more accurately identifycausal variants when used in conjunction with sequencing data through a hierarchicalmodeling approach
Classification of Farmland Landscape Structure in Multiple Scales

Science.gov (United States)

Jiang, P.; Cheng, Q.; Li, M.

2017-12-01

Farmland is one of the basic terrestrial resources that support the development and survival of human beings and thus plays a crucial role in the national security of every country. Pattern change is the intuitively spatial representation of the scale and quality variation of farmland. Through the characteristic development of spatial shapes as well as through changes in system structures, functions and so on, farmland landscape patterns may indicate the landscape health level. Currently, it is still difficult to perform positioning analyses of landscape pattern changes that reflect the landscape structure variations of farmland with an index model. Depending on a number of spatial properties such as locations and adjacency relations, distance decay, fringe effect, and on the model of patch-corridor-matrix that is applied, this study defines a type system of farmland landscape structure on the national, provincial, and city levels. According to such a definition, the classification model of farmland landscape-structure type at the pixel scale is developed and validated based on mathematical-morphology concepts and on spatial-analysis methods. Then, the laws that govern farmland landscape-pattern change in multiple scales are analyzed from the perspectives of spatial heterogeneity, spatio-temporal evolution, and function transformation. The result shows that the classification model of farmland landscape-structure type can reflect farmland landscape-pattern change and its effects on farmland production function. Moreover, farmland landscape change in different scales displayed significant disparity in zonality, both within specific regions and in urban-rural areas.
Landscape degradation at different spatial scales caused by aridification

Directory of Open Access Journals (Sweden)

Meyer Burghard Christian

2017-12-01

Full Text Available Landscape responses to degradation caused by aridification bring the landscape system into a new equilibrium state. The system transformation may entail irreversible changes to its constituting parameters. This paper analyses the impact of aridification on landscape degradation processes in the sand-covered landscapes of the Hungarian Danube-Tisza Interfluve region at the regional, landscape, and local site scales. Changes in groundwater level (well data, lake surface area (Modified Normalized Difference Water Index and vegetation cover (Enhanced Vegetation Index were analysed over time periods of 12–60 years. Significant regional variation in decreasing groundwater levels is observed and limits the regional applicability of this indicator. Applying the lake surface area parameter from remote sensing data demonstrated greater utility, identifying several local lakes in the landscapes which have dried out. Analysis of the vegetation response indicated minor changes over the 2000–2014 time period and did not indicate a landscape system change. Landscape degradation as a result of changes in groundwater, vegetation, land cover and land use is clearly identified exclusively in local lake areas, but at the landscape scale, changes in the water balance are found in phases of system stability and transformation. Thresholds are identified to support policy and management towards landscape degradation neutrality.
Annotation of phenotypic diversity: decoupling data curation and ontology curation using Phenex.

Science.gov (United States)

Balhoff, James P; Dahdul, Wasila M; Dececchi, T Alexander; Lapp, Hilmar; Mabee, Paula M; Vision, Todd J

2014-01-01

Phenex (http://phenex.phenoscape.org/) is a desktop application for semantically annotating the phenotypic character matrix datasets common in evolutionary biology. Since its initial publication, we have added new features that address several major bottlenecks in the efficiency of the phenotype curation process: allowing curators during the data curation phase to provisionally request terms that are not yet available from a relevant ontology; supporting quality control against annotation guidelines to reduce later manual review and revision; and enabling the sharing of files for collaboration among curators. We decoupled data annotation from ontology development by creating an Ontology Request Broker (ORB) within Phenex. Curators can use the ORB to request a provisional term for use in data annotation; the provisional term can be automatically replaced with a permanent identifier once the term is added to an ontology. We added a set of annotation consistency checks to prevent common curation errors, reducing the need for later correction. We facilitated collaborative editing by improving the reliability of Phenex when used with online folder sharing services, via file change monitoring and continual autosave. With the addition of these new features, and in particular the Ontology Request Broker, Phenex users have been able to focus more effectively on data annotation. Phenoscape curators using Phenex have reported a smoother annotation workflow, with much reduced interruptions from ontology maintenance and file management issues.
Planetary Landscape Geography

Science.gov (United States)

Hargitai, H.

INTRODUCTION Landscape is one of the most often used category in physical ge- ography. The term "landshap" was introduced by Dutch painters in the 15-16th cen- tury. [1] The elements that build up a landscape (or environment) on Earth consists of natural (biogenic and abiogenic - lithologic, atmospheric, hydrologic) and artificial (antropogenic) factors. Landscape is a complex system of these different elements. The same lithology makes different landscapes under different climatic conditions. If the same conditions are present, the same landscape type will appear. Landscapes build up a hierarchic system and cover the whole surface. On Earth, landscapes can be classified and qualified according to their characteristics: relief forms (morphology), and its potential economic value. Aesthetic and subjective parameters can also be considered. Using the data from landers and data from orbiters we can now classify planetary landscapes (these can be used as geologic mapping units as well). By looking at a unknown landscape, we can determine the processes that created it and its development history. This was the case in the Pathfinder/Sojourner panoramas. [2]. DISCUSSION Planetary landscape evolution. We can draw a raw landscape develop- ment history by adding the different landscape building elements to each other. This has a strong connection with the planet's thermal evolution (age of the planet or the present surface materials) and with orbital parameters (distance from the central star, orbit excentricity etc). This way we can build a complex system in which we use differ- ent evolutional stages of lithologic, atmospheric, hydrologic and biogenic conditions which determine the given - Solar System or exoplanetary - landscape. Landscape elements. "Simple" landscapes can be found on asteroids: no linear horizon is present (not differentiated body, only impact structures), no atmosphere (therefore no atmospheric scattering - black sky as part of the landscape) and no
Applying information network analysis to fire-prone landscapes: implications for community resilience

Directory of Open Access Journals (Sweden)

Derric B. Jacobs

2017-03-01

Full Text Available Resilient communities promote trust, have well-developed networks, and can adapt to change. For rural communities in fire-prone landscapes, current resilience strategies may prove insufficient in light of increasing wildfire risks due to climate change. It is argued that, given the complexity of climate change, adaptations are best addressed at local levels where specific social, cultural, political, and economic conditions are matched with local risks and opportunities. Despite the importance of social networks as key attributes of community resilience, research using social network analysis on coupled human and natural systems is scarce. Furthermore, the extent to which local communities in fire-prone areas understand climate change risks, accept the likelihood of potential changes, and have the capacity to develop collaborative mitigation strategies is underexamined, yet these factors are imperative to community resiliency. We apply a social network framework to examine information networks that affect perceptions of wildfire and climate change in Central Oregon. Data were collected using a mailed questionnaire. Analysis focused on the residents' information networks that are used to gain awareness of governmental activities and measures of community social capital. A two-mode network analysis was used to uncover information exchanges. Results suggest that the general public develops perceptions about climate change based on complex social and cultural systems rather than as patrons of scientific inquiry and understanding. It appears that perceptions about climate change itself may not be the limiting factor in these communities' adaptive capacity, but rather how they perceive local risks. We provide a novel methodological approach in understanding rural community adaptation and resilience in fire-prone landscapes and offer a framework for future studies.
Automated evaluation of annotators for museum collections using subjective login

NARCIS (Netherlands)

Ceolin, D.; Nottamkandath, A.; Fokkink, W.J.; Dimitrakos, Th.; Moona, R.; Patel, Dh.; Harrison McKnight, D.

2012-01-01

Museums are rapidly digitizing their collections, and face a huge challenge to annotate every digitized artifact in store. Therefore they are opening up their archives for receiving annotations from experts world-wide. This paper presents an architecture for choosing the most eligible set of
Orchid: a novel management, annotation and machine learning framework for analyzing cancer mutations.

Science.gov (United States)

Cario, Clinton L; Witte, John S

2018-03-15

As whole-genome tumor sequence and biological annotation datasets grow in size, number and content, there is an increasing basic science and clinical need for efficient and accurate data management and analysis software. With the emergence of increasingly sophisticated data stores, execution environments and machine learning algorithms, there is also a need for the integration of functionality across frameworks. We present orchid, a python based software package for the management, annotation and machine learning of cancer mutations. Building on technologies of parallel workflow execution, in-memory database storage and machine learning analytics, orchid efficiently handles millions of mutations and hundreds of features in an easy-to-use manner. We describe the implementation of orchid and demonstrate its ability to distinguish tissue of origin in 12 tumor types based on 339 features using a random forest classifier. Orchid and our annotated tumor mutation database are freely available at https://github.com/wittelab/orchid. Software is implemented in python 2.7, and makes use of MySQL or MemSQL databases. Groovy 2.4.5 is optionally required for parallel workflow execution. JWitte@ucsf.edu. Supplementary data are available at Bioinformatics online.
Annotation of the protein coding regions of the equine genome

DEFF Research Database (Denmark)

Hestand, Matthew S.; Kalbfleisch, Theodore S.; Coleman, Stephen J.

2015-01-01

Current gene annotation of the horse genome is largely derived from in silico predictions and cross-species alignments. Only a small number of genes are annotated based on equine EST and mRNA sequences. To expand the number of equine genes annotated from equine experimental evidence, we sequenced m...... and appear to be small errors in the equine reference genome, since they are also identified as homozygous variants by genomic DNA resequencing of the reference horse. Taken together, we provide a resource of equine mRNA structures and protein coding variants that will enhance equine and cross...
Changing Landscapes, Changing Landscape's Story

Czech Academy of Sciences Publication Activity Database

Lapka, Miloslav; Cudlínová, Eva

2003-01-01

Roč. 28, č. 3 (2003), s. 323-328 ISSN 0142-6397. [Symposium on Sustainable Landscapes in an Enlarged Europe. Nové Hrady, 12.09.2001-14.09.2001] R&D Projects: GA MŠk ME 530 Grant - others:GA-(XE) QLK5-CT-2000-01211-SPRITE Institutional research plan: CEZ:AV0Z5039906 Keywords : Landscape stability * narrative approach * socio-economic typology Subject RIV: DO - Wilderness Conservation
A Classification of Landscape Services to Support Local Landscape Planning

Directory of Open Access Journals (Sweden)

María Vallés-Planells

2014-03-01

Full Text Available The ecosystem services approach has been proven successful to measure the contributions of nature and greenery to human well-being. Ecosystems have an effect on quality of life, but landscapes also, as a broader concept, may contribute to people's well-being. The concept of landscape services, compared to ecosystem services, involves the social dimension of landscape and the spatial pattern resulting from both natural and human processes in the provision of benefits for human-well being. Our aim is to develop a classification for landscape services. The proposed typology of services is built on the Common International Classification of Ecosystem Services (CICES and on a critical review of existing literature on human well-being dimensions, existing ecosystem service classifications, and landscape perception. Three themes of landscape services are defined, each divided into several groups: provisioning, regulation and maintenance, cultural and social life fulfillment, with the latter focusing on health, enjoyment, and personal and social fulfillment. A special emphasis is made on cultural services, which are especially important when applied to landscape and which have received less attention.

Genome Annotation and Transcriptomics of Oil-Producing Algae

Science.gov (United States)

2015-03-16

AFRL-OSR-VA-TR-2015-0103 GENOME ANNOTATION AND TRANSCRIPTOMICS OF OIL-PRODUCING ALGAE Sabeeha Merchant UNIVERSITY OF CALIFORNIA LOS ANGELES Final...2010 To 12-31-2014 4. TITLE AND SUBTITLE GENOME ANNOTATION AND TRANSCRIPTOMICS OF OIL-PRODUCING ALGAE 5a. CONTRACT NUMBER FA9550-10-1-0095 5b...NOTES 14. ABSTRACT Most algae accumulate triacylglycerols (TAGs) when they are starved for essential nutrients like N, S, P (or Si in the case of some
Analysis on landscape pattern change and ecosystem services value of modern agriculture corridor: a case study of Jingcheng Highway

Science.gov (United States)

Liu, Bao; Gu, Xiaohe; Zhang, Jing; Du, Chong; Di, Xingcui

2009-10-01

Based on SPOT images and other geoscience data, this paper gets the land use and land cover information of Modern Agriculture Corridor from 2006 to 2008 by RS and GIS technology and makes analysis of land use changes in landscape ecology view. Then we build a quantitative evaluation model which select vegetation coverage as adjustment coefficient to monitor the changes of ecological services value. The results show that: In the aspect of landscape pattern index, the landscape heterogeneity of the region is increasing, the land use types become various, degree of landscape fragmentation has increased; woodland, farmland and construction land play a leading role in the dynamic changes of landscape. In the view of ecosystem service value, the total value of ecosystem services of Modern Agriculture Corridor from 2006 to 2008 are respectively 186, 188, 193 million Yuan, and the annual average rate is 2%; ecosystem qualities are different in different seasons, and quality in summer is best which has 33% contribution to the full-year value of ecosystem services; the average contribution rates of forest and waters ecosystems are the highest, respectively 37% and 33%; increase of woodland, grassland and water area is the main reason that enhancing ecosystem services.
Annotating smart environment sensor data for activity learning.

Science.gov (United States)

Szewcyzk, S; Dwan, K; Minor, B; Swedlove, B; Cook, D

2009-01-01

The pervasive sensing technologies found in smart homes offer unprecedented opportunities for providing health monitoring and assistance to individuals experiencing difficulties living independently at home. In order to monitor the functional health of smart home residents, we need to design technologies that recognize and track the activities that people perform at home. Machine learning techniques can perform this task, but the software algorithms rely upon large amounts of sample data that is correctly labeled with the corresponding activity. Labeling, or annotating, sensor data with the corresponding activity can be time consuming, may require input from the smart home resident, and is often inaccurate. Therefore, in this paper we investigate four alternative mechanisms for annotating sensor data with a corresponding activity label. We evaluate the alternative methods along the dimensions of annotation time, resident burden, and accuracy using sensor data collected in a real smart apartment.
LANDSCAPE PLANNING IN UKRAINE: THE FIRST LANDSCAPE-PLANNING PROGRAM

Directory of Open Access Journals (Sweden)

Leonid Rudenko

2013-01-01

Full Text Available The paper presents the results of the first, in Ukraine; project on landscape planning widely accepted in European countries. Under the project implemented in 2010–2013, a landscape-planning program has been developed for the Cherkassy oblast. This is the first document of this kind in Ukraine. The program is mainly based on the experience of the German and Russian schools of landscape planning and on research and assessment conducted by the authors, which allowed identifying approaches to landscape planning, principles of the national policy, and characteristics and potential of environmentally friendly planning in Ukraine. The paper discusses the main phases of the work on the development of the landscape program for the oblast. It also identifies the main stages and key concepts and principles of landscape planning. The paper presents the results of integrated research on the identification and classification of conflicts in land use and the integral concept of the developmental goals for the oblast. The results can be the foundation for adopting management decisions and development of action plans for the lower hierarchal branches.
Experimental evidence of reorganizing landscape under changing climatic forcing

Science.gov (United States)

Singh, A.; Tejedor, A.; Zaliapin, I. V.; Reinhardt, L.; Foufoula-Georgiou, E.

2015-12-01

Quantification of the dynamics of landscape reorganization under changing climatic forcing is important to understand geomorphic transport laws under transient conditions, assess response of landscapes to external perturbations for future predictive modeling, and for interpreting past climate from stratigraphic record. For such an analysis, however, real landscape observations are limited. To this end, a series of controlled laboratory experiments on evolving landscape were conducted at the St. Anthony Falls laboratory at the University of Minnesota. High resolution elevation data at a temporal resolution of 5 mins and spatial resolution of 0.5 mm were collected as the landscape approached steady state (constant uplift and precipitation rate) and in the transient state (under the same uplift and 5 times precipitation rate). Our results reveal rapid topographic re-organization under a five-fold increase in precipitation with the fluvial regime encroaching into the previously debris dominated regime, widening and aggradation of channels and valleys, and accelerated erosion happening at hillslope scales. To better understand the initiation of the observed reorganization, we perform a connectivity and clustering analysis of the erosional and depositional events, showing strikingly different spatial patterns on landscape evolution under steady-state (SS) and transient-state (TS), even when the time under SS is renormalized to match the total volume of eroded and deposited sediment in TS. Our results suggest a regime shift in the behavior of transport processes on the landscape at the intermediate scales i.e., from supply-limited to transport-limited.
Annotation Method (AM): SE7_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE7_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data
Annotation Method (AM): SE36_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE36_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat
Annotation Method (AM): SE14_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE14_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat
Annotation Method (AM): SE33_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE33_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat
Annotation Method (AM): SE12_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE12_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat
Annotation Method (AM): SE20_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE20_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat
Annotation Method (AM): SE2_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE2_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data
Annotation Method (AM): SE28_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE28_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat
Annotation Method (AM): SE11_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE11_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat
Annotation Method (AM): SE17_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE17_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat
Annotation Method (AM): SE10_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE10_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat
Annotation Method (AM): SE4_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE4_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data
Annotation Method (AM): SE9_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE9_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data
Annotation Method (AM): SE3_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE3_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data
Annotation Method (AM): SE25_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE25_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

Annotation Method (AM): SE30_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE30_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat
Annotation Method (AM): SE16_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE16_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat
Annotation Method (AM): SE29_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE29_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat
Annotation Method (AM): SE35_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE35_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat
Annotation Method (AM): SE6_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE6_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data
Annotation Method (AM): SE1_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE1_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data
Annotation Method (AM): SE8_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE8_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data
Annotation Method (AM): SE13_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE13_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat
Annotation Method (AM): SE26_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE26_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat
Annotation Method (AM): SE27_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE27_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat
Annotation Method (AM): SE34_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE34_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat
Annotation Method (AM): SE5_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE5_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data
Annotation Method (AM): SE15_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE15_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat
Annotation Method (AM): SE31_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE31_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat
Annotation Method (AM): SE32_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE32_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat
FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation.

Science.gov (United States)

Bolleman, Jerven T; Mungall, Christopher J; Strozzi, Francesco; Baran, Joachim; Dumontier, Michel; Bonnal, Raoul J P; Buels, Robert; Hoehndorf, Robert; Fujisawa, Takatomo; Katayama, Toshiaki; Cock, Peter J A

2016-06-13

Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned "omics" areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe - and potentially merge - sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.
Evaluating stance-annotated sentences from the Brexit Blog Corpus: A quantitative linguistic analysis

Directory of Open Access Journals (Sweden)

Simaki Vasiliki

2018-03-01

Full Text Available This paper offers a formally driven quantitative analysis of stance-annotated sentences in the Brexit Blog Corpus (BBC. Our goal is to identify features that determine the formal profiles of six stance categories (contrariety, hypotheticality, necessity, prediction, source of knowledge and uncertainty in a subset of the BBC. The study has two parts: firstly, it examines a large number of formal linguistic features, such as punctuation, words and grammatical categories that occur in the sentences in order to describe the specific characteristics of each category, and secondly, it compares characteristics in the entire data set in order to determine stance similarities in the data set. We show that among the six stance categories in the corpus, contrariety and necessity are the most discriminative ones, with the former using longer sentences, more conjunctions, more repetitions and shorter forms than the sentences expressing other stances. necessity has longer lexical forms but shorter sentences, which are syntactically more complex. We show that stance in our data set is expressed in sentences with around 21 words per sentence. The sentences consist mainly of alphabetical characters forming a varied vocabulary without special forms, such as digits or special characters.
Effect of woodlots on thrips density in leek fields: a landscape analysis

NARCIS (Netherlands)

Belder, den E.; Elderson, J.; Brink, van den W.J.; Schelling, G.C.

2002-01-01

The effect of woodlots, natural areas and agricultural land in the landscape on a generalist herbivore insect species in cropland was investigated. The abundance of onion thrips (Thrips tabaci) was compared in leek (Allium porrum) fields in 43 agricultural landscape plots of different sizes in The
Characterizing European cultural landscapes

DEFF Research Database (Denmark)

Tieskens, Koen F.; Schulp, Catharina J E; Levers, Christian

2017-01-01

intensification and land abandonment. To prevent the loss of cultural landscapes, knowledge on the location of different types of cultural landscapes is needed. In this paper, we present a characterization of European cultural landscapes based on the prevalence of three key dimensions of cultural landscapes......Almost all rural areas in Europe have been shaped or altered by humans and can be considered cultural landscapes, many of which now are considered to entail valuable cultural heritage. Current dynamics in land management have put cultural landscapes under a huge pressure of agricultural...... the three dimensions into a continuous “cultural landscape index” that allows for a characterization of Europe's rural landscapes. The characterization identifies hotspots of cultural landscapes, where all three dimensions are present, such as in the Mediterranean. On the other hand, Eastern and Northern...
EXTRACT: interactive extraction of environment metadata and term suggestion for metagenomic sample annotation.

Science.gov (United States)

Pafilis, Evangelos; Buttigieg, Pier Luigi; Ferrell, Barbra; Pereira, Emiliano; Schnetzer, Julia; Arvanitidis, Christos; Jensen, Lars Juhl

2016-01-01

The microbial and molecular ecology research communities have made substantial progress on developing standards for annotating samples with environment metadata. However, sample manual annotation is a highly labor intensive process and requires familiarity with the terminologies used. We have therefore developed an interactive annotation tool, EXTRACT, which helps curators identify and extract standard-compliant terms for annotation of metagenomic records and other samples. Behind its web-based user interface, the system combines published methods for named entity recognition of environment, organism, tissue and disease terms. The evaluators in the BioCreative V Interactive Annotation Task found the system to be intuitive, useful, well documented and sufficiently accurate to be helpful in spotting relevant text passages and extracting organism and environment terms. Comparison of fully manual and text-mining-assisted curation revealed that EXTRACT speeds up annotation by 15-25% and helps curators to detect terms that would otherwise have been missed. Database URL: https://extract.hcmr.gr/. © The Author(s) 2016. Published by Oxford University Press.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.