WorldWideScience

Sample records for annotation est-ssr characterization

  1. De novo assembly of transcriptome sequencing in Caragana korshinskii Kom. and characterization of EST-SSR markers.

    Directory of Open Access Journals (Sweden)

    Yan Long

    Full Text Available Caragana korshinskii Kom. is widely distributed in various habitats, including gravel desert, clay desert, fixed and semi-fixed sand, and saline land in the Asian and African deserts. To date, no previous genomic information or EST-SSR marker has been reported in Caragana Fabr. genus. In this study, more than two billion bases of high-quality sequence of C. korshinskii were generated by using illumina sequencing technology and demonstrated the de novo assembly and annotation of genes without prior genome information. These reads were assembled into 86,265 unigenes (mean length = 709 bp. The similarity search indicated that 33,955 and 21,978 unigenes showed significant similarities to known proteins from NCBI non-redundant and Swissprot protein databases, respectively. Among these annotated unigenes, 26,232 a unigenes were separately assigned to Gene Ontology (GO database. When 22,756 unigenes searched against the Kyoto Encyclopedia of Genes and Genomes Pathway (KEGG database, 5,598 unigenes were assigned to 5 main categories including 32 KEGG pathways. Among the main KEGG categories, metabolism was the biggest category (2,862, 43.7%, suggesting the active metabolic processes in the desert tree. In addition, a total of 19,150 EST-SSRs were identified from 15,484 unigenes, and the characterizations of EST-SSRs were further compared with other four species in Fabraceae. 126 potential marker sites were randomly selected to validate the assembly quality and develop EST-SSR markers. Among the 9 germplasms in Caranaga Fabr. genus, PCR success rate were 93.7% and the phylogenic tree was constructed based on the genotypic data. This research generated a substantial fraction of transcriptome sequences, which were very useful resources for gene annotation and discovery, molecular markers development, genome assembly and annotation. The EST-SSR markers identified and developed in this study will facilitate marker-assisted selection breeding.

  2. Characterization and development of EST-SSR markers in sweet potato (Ipomoea batatas (L.) Lam).

    Science.gov (United States)

    Kim, Jin-Hee; Kim, Jun-Hoi; Jo, Won-Sam; Ham, Jeong-Gwan; Chung, Il Kyung; Kim, Kyung-Min

    2016-12-01

    In this study, a cDNA library was constructed from the total RNA of sweet potato leaves. A total of 789 copies of the cDNA were cloned in Escherichia coli by employing the pGEM-T Easy vector. Sequencing was carried out by Solgent Co. (Korea). As many as 579 expressed sequence tag-simple sequence repeat (EST-SSR) markers were designed (73.38%) from the known cDNA nucleotide base sequences. The lengths of the developed EST-SSR markers ranged from 100 to 499 bp (average length 238 bp). Their motif sequence types were varied, with most being dinucleotides and pentanucleotides, and the most commonly found motifs were CAGAAT (29.0%) and TCT (2.8%). Based on these SSR-containing sequences, 619 pairs of high-quality SSR primers were designed using WebSat and Primer3web. The total number of primers designed was 144. Polymorphism was evident in 82 EST-SSR markers among 20 Korean sweet potato cultivars tested and in 90 EST-SSR markers in the two parents of a mapping population, Yeseumi and Annobeny. In this study, the hexaploid sweet potato (2n = 6x = 90) EST-SSR markers were developed in the absence of full-sequence data. Moreover, by acting as a molecular tag for particular traits, the EST-SSR marker can also simultaneously identify information about the corresponding gene. These EST-SSR markers will allow the molecular analysis of sweet potato to be done more efficiently. Thus, we can develop high-quality sweet potato while overcoming the challenges from climate change and other unfavorable conditions.

  3. Characterization of variable EST SSR markers for Norway spruce (Picea abies L.

    Directory of Open Access Journals (Sweden)

    Spiess Nadine

    2011-10-01

    Full Text Available Abstract Background Norway spruce is widely distributed across Europe and the predominant tree of the Alpine region. Fast growth and the fact that timber can be harvested cost-effectively in relatively young populations define its status as one of the economically most important tree species of Northern Europe. In this study, EST derived simple sequence repeat (SSR markers were developed for the assessment of putative functional diversity in Austrian Norway spruce stands. Results SSR sequences were identified by analyzing 14,022 publicly available EST sequences. Tri-nucleotide repeat motifs were most abundant in the data set followed by penta- and hexa-nucleotide repeats. Specific primer pairs were designed for sixty loci. Among these, 27 displayed polymorphism in a testing population of 16 P. abies individuals sampled across Austria and in an additional screening population of 96 P. abies individuals from two geographically distinct Austrian populations. Allele numbers per locus ranged from two to 17 with observed heterozygosity ranging from 0.075 to 0.99. Conclusions We have characterized variable EST SSR markers for Norway spruce detected in expressed genes. Due to their moderate to high degree of variability in the two tested screening populations, these newly developed SSR markers are well suited for the analysis of stress related functional variation present in Norway spruce populations.

  4. Genetic characterization of an elite coffee germplasm assessed by gSSR and EST-SSR markers.

    Science.gov (United States)

    Missio, R F; Caixeta, E T; Zambolim, E M; Pena, G F; Zambolim, L; Dias, L A S; Sakiyama, N S

    2011-10-06

    Coffee is one of the main agrifood commodities traded worldwide. In 2009, coffee accounted for 6.1% of the value of Brazilian agricultural production, generating a revenue of US$6 billion. Despite the importance of coffee production in Brazil, it is supported by a narrow genetic base, with few accessions. Molecular differentiation and diversity of a coffee breeding program were assessed with gSSR and EST-SSR markers. The study comprised 24 coffee accessions according to their genetic origin: arabica accessions (six traditional genotypes of C. arabica), resistant arabica (six leaf rust-resistant C. arabica genotypes with introgression of Híbrido de Timor), robusta (five C. canephora genotypes), Híbrido de Timor (three C. arabica x C. canephora), triploids (three C. arabica x C. racemosa), and racemosa (one C. racemosa). Allele and polymorphism analysis, AMOVA, the Student t-test, Jaccard's dissimilarity coefficient, cluster analysis, correlation of genetic distances, and discriminant analysis, were performed. EST-SSR markers gave 25 exclusive alleles per genetic group, while gSSR showed 47, which will be useful for differentiating accessions and for fingerprinting varieties. The gSSR markers detected a higher percentage of polymorphism among (35% higher on average) and within (42.9% higher on average) the genetic groups, compared to EST-SSR markers. The highest percentage of polymorphism within the genetic groups was found with gSSR markers for robusta (89.2%) and for resistant arabica (39.5%). It was possible to differentiate all genotypes including the arabica-related accessions. Nevertheless, combined use of gSSR and EST-SSR markers is recommended for coffee molecular characterization, because EST-SSRs can provide complementary information.

  5. Development and Characterization of 37 Novel EST-SSR Markers in Pisum sativum (Fabaceae

    Directory of Open Access Journals (Sweden)

    Xiaofeng Zhuang

    2013-01-01

    Full Text Available Premise of the study: Simple sequence repeat markers were developed based on expressed sequence tags (EST-SSR and screened for polymorphism among 23 Pisum sativum individuals to assist development and refinement of pea linkage maps. In particular, the SSR markers were developed to assist in mapping of white mold disease resistance quantitative trait loci. Methods and Results: Primer pairs were designed for 46 SSRs identified in EST contiguous sequences assembled from a 454 pyrosequenced transcriptome of the pea cultivar, ‘LIFTER’. Thirty-seven SSR markers amplified PCR products, of which 11 (30% SSR markers produced polymorphism in 23 individuals, including parents of recombinant inbred lines, with two to four alleles. The observed and expected heterozygosities ranged from 0 to 0.43 and from 0.31 to 0.83, respectively. Conclusions: These EST-SSR markers for pea will be useful for refinement of pea linkage maps, and will likely be useful for comparative mapping of pea and as tools for marker-based pea breeding.

  6. Exploiting EST databases for the development and characterization of EST-SSR markers in castor bean (Ricinus communis L.

    Directory of Open Access Journals (Sweden)

    Yang Jun-Bo

    2010-12-01

    Full Text Available Abstract Background The castor bean (Ricinus communis L., a monotypic species in the spurge family (Euphorbiaceae, 2n = 20, is an important non-edible oilseed crop widely cultivated in tropical, sub-tropical and temperate countries for its high economic value. Because of the high level of ricinoleic acid (over 85% in its seed oil, the castor bean seed derivatives are often used in aviation oil, lubricants, nylon, dyes, inks, soaps, adhesive and biodiesel. Due to lack of efficient molecular markers, little is known about the population genetic diversity and the genetic relationships among castor bean germplasm. Efficient and robust molecular markers are increasingly needed for breeding and improving varieties in castor bean. The advent of modern genomics has produced large amounts of publicly available DNA sequence data. In particular, expressed sequence tags (ESTs provide valuable resources to develop gene-associated SSR markers. Results In total, 18,928 publicly available non-redundant castor bean EST sequences, representing approximately 17.03 Mb, were evaluated and 7732 SSR sites in 5,122 ESTs were identified by data mining. Castor bean exhibited considerably high frequency of EST-SSRs. We developed and characterized 118 polymorphic EST-SSR markers from 379 primer pairs flanking repeats by screening 24 castor bean samples collected from different countries. A total of 350 alleles were identified from 118 polymorphic SSR loci, ranging from 2-6 per locus (A with an average of 2.97. The EST-SSR markers developed displayed moderate gene diversity (He with an average of 0.41. Genetic relationships among 24 germplasms were investigated using the genotypes of 350 alleles, showing geographic pattern of genotypes across genetic diversity centers of castor bean. Conclusion Castor bean EST sequences exhibited considerably high frequency of SSR sites, and were rich resources for developing EST-SSR markers. These EST-SSR markers would be particularly

  7. Development and Characterization of 1,906 EST-SSR Markers from Unigenes in Jute (Corchorus spp..

    Directory of Open Access Journals (Sweden)

    Liwu Zhang

    Full Text Available Jute, comprising white and dark jute, is the second important natural fiber crop after cotton worldwide. However, the lack of expressed sequence tag-derived simple sequence repeat (EST-SSR markers has resulted in a large gap in the improvement of jute. Previously, de novo 48,914 unigenes from white jute were assembled. In this study, 1,906 EST-SSRs were identified from these assembled uingenes. Among these markers, di-, tri- and tetra-nucleotide repeat types were the abundant types (12.0%, 56.9% and 21.6% respectively. The AG-rich or GA-rich nucleotide repeats were the predominant. Subsequently, a sample of 116 SSRs, located in genes encoding transcription factors and cellulose synthases, were selected to survey polymorphisms among12 diverse jute accessions. Of these, 83.6% successfully amplified at least one fragment and detected polymorphism among the 12diverse genotypes, indicating that the newly developed SSRs are of good quality. Furthermore, the genetic similarity coefficients of all the 12 accessions were evaluated using 97 polymorphic SSRs. The cluster analysis divided the jute accessions into two main groups with genetic similarity coefficient of 0.61. These EST-SSR markers not only enrich molecular markers of jute genome, but also facilitate genetic and genomic researches in jute.

  8. Characterization of the Kenaf (Hibiscus cannabinus) Global Transcriptome Using Illumina Paired-End Sequencing and Development of EST-SSR Markers

    Science.gov (United States)

    Li, Hui; Li, Defang; Chen, Anguo; Tang, Huijuan; Li, Jianjun; Huang, Siqi

    2016-01-01

    Kenaf (Hibiscus cannabinus L.) is an economically important natural fiber crop grown worldwide. However, only 20 expressed tag sequences (ESTs) for kenaf are available in public databases. The aim of this study was to develop large-scale simple sequence repeat (SSR) markers to lay a solid foundation for the construction of genetic linkage maps and marker-assisted breeding in kenaf. We used Illumina paired-end sequencing technology to generate new EST-simple sequences and MISA software to mine SSR markers. We identified 71,318 unigenes with an average length of 1143 nt and annotated these unigenes using four different protein databases. Overall, 9324 complementary pairs were designated as EST-SSR markers, and their quality was validated using 100 randomly selected SSR markers. In total, 72 primer pairs reproducibly amplified target amplicons, and 61 of these primer pairs detected significant polymorphism among 28 kenaf accessions. Thus, in this study, we have developed large-scale SSR markers for kenaf, and this new resource will facilitate construction of genetic linkage maps, investigation of fiber growth and development in kenaf, and also be of value to novel gene discovery and functional genomic studies. PMID:26960153

  9. Characterization of the sesame (Sesamum indicum L.) global transcriptome using Illumina paired-end sequencing and development of EST-SSR markers.

    Science.gov (United States)

    Wei, Wenliang; Qi, Xiaoqiong; Wang, Linhai; Zhang, Yanxin; Hua, Wei; Li, Donghua; Lv, Haixia; Zhang, Xiurong

    2011-09-19

    Sesame is an important oil crop, but limited transcriptomic and genomic data are currently available. This information is essential to clarify the fatty acid and lignan biosynthesis molecular mechanism. In addition, a shortage of sesame molecular markers limits the efficiency and accuracy of genetic breeding. High-throughput transcriptomic sequencing is essential to generate a large transcriptome sequence dataset for gene discovery and molecular marker development. Sesame transcriptomes from five tissues were sequenced using Illumina paired-end sequencing technology. The cleaned raw reads were assembled into a total of 86,222 unigenes with an average length of 629 bp. Of the unigenes, 46,584 (54.03%) had significant similarity with proteins in the NCBI nonredundant protein database and Swiss-Prot database (E-value < 10-5). Of these annotated unigenes, 10,805 and 27,588 unigenes were assigned to gene ontology categories and clusters of orthologous groups, respectively. In total, 22,003 (25.52%) unigenes were mapped onto 119 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG). Furthermore, 44,750 unigenes showed homology to 15,460 Arabidopsis genes based on BLASTx analysis against The Arabidopsis Information Resource (TAIR, Version 10) and revealed relatively high gene coverage. In total, 7,702 unigenes were converted into SSR markers (EST-SSR). Dinucleotide SSRs were the dominant repeat motif (67.07%, 5,166), followed by trinucleotide (24.89%, 1,917), tetranucleotide (4.31%, 332), hexanucleotide (2.62%, 202), and pentanucleotide (1.10%, 85) SSRs. AG/CT (46.29%) was the dominant repeat motif, followed by AC/GT (16.07%), AT/AT (10.53%), AAG/CTT (6.23%), and AGG/CCT (3.39%). Fifty EST-SSRs were randomly selected to validate amplification and to determine the degree of polymorphism in the genomic DNA pools. Forty primer pairs successfully amplified DNA fragments and detected significant amounts of polymorphism among 24 sesame accessions

  10. Construction of full-length cDNA library and development of EST-derived simple sequence repeat (EST-SSR) markers in Senecio scandens.

    Science.gov (United States)

    Qian, Gang; Ping, Junjiao; Lu, Jian; Zhang, Zhen; Wang, Lei; Xu, Delin

    2014-12-01

    Senecio scandens Buch.-Ham. ex D. Don (Compositae) is a crucial source of Chinese traditional medicine with antibacterial properties. We constructed a cDNA library and obtained expressed sequence tags (ESTs) to show the distribution of gene ontology annotations for mRNAs, using an individual plant with superior antibacterial characteristics. Analysis of comparative genomics indicates that the putative uncharacterized proteins (21.07%) might be derived from "molecular function unknown" clones or rare transcripts. Furthermore, the Compositae had high cross-species transferability of EST-derived simple sequence repeats (EST-SSR), based on valid amplifications of 206 primer pairs developed from the newly assembled expressed sequence tag sequences in Artemisia annua L. Among those EST-SSR markers, 52 primers showed polymorphic amplifications between individuals with contrasting diverse antibacterial traits. Our sequence data and molecular markers will be cost-effective tools for further studies such as genome annotation, molecular breeding, and novel transcript profiles within Compositae species.

  11. A second generation framework for the analysis of microsatellites in expressed sequence tags and the development of EST-SSR markers for a conifer, Cryptomeria japonica

    Directory of Open Access Journals (Sweden)

    Ueno Saneyoshi

    2012-04-01

    Full Text Available Abstract Background Microsatellites or simple sequence repeats (SSRs in expressed sequence tags (ESTs are useful resources for genome analysis because of their abundance, functionality and polymorphism. The advent of commercial second generation sequencing machines has lead to new strategies for developing EST-SSR markers, necessitating the development of bioinformatic framework that can keep pace with the increasing quality and quantity of sequence data produced. We describe an open scheme for analyzing ESTs and developing EST-SSR markers from reads collected by Sanger sequencing and pyrosequencing of sugi (Cryptomeria japonica. Results We collected 141,097 sequence reads by Sanger sequencing and 1,333,444 by pyrosequencing. After trimming contaminant and low quality sequences, 118,319 Sanger and 1,201,150 pyrosequencing reads were passed to the MIRA assembler, generating 81,284 contigs that were analysed for SSRs. 4,059 SSRs were found in 3,694 (4.54% contigs, giving an SSR frequency lower than that in seven other plant species with gene indices (5.4–21.9%. The average GC content of the SSR-containing contigs was 41.55%, compared to 40.23% for all contigs. Tri-SSRs were the most common SSRs; the most common motif was AT, which was found in 655 (46.3% di-SSRs, followed by the AAG motif, found in 342 (25.9% tri-SSRs. Most (72.8% tri-SSRs were in coding regions, but 55.6% of the di-SSRs were in non-coding regions; the AT motif was most abundant in 3′ untranslated regions. Gene ontology (GO annotations showed that six GO terms were significantly overrepresented within SSR-containing contigs. Forty–four EST-SSR markers were developed from 192 primer pairs using two pipelines: read2Marker and the newly-developed CMiB, which combines several open tools. Markers resulting from both pipelines showed no differences in PCR success rate and polymorphisms, but PCR success and polymorphism were significantly affected by the expected PCR product size

  12. Expressed Sequence Tag-Simple Sequence Repeat (EST-SSR Marker Resources for Diversity Analysis of Mango (Mangifera indica L.

    Directory of Open Access Journals (Sweden)

    Natalie L. Dillon

    2014-01-01

    Full Text Available In this study, a collection of 24,840 expressed sequence tags (ESTs generated from five mango (Mangifera indica L. cDNA libraries was mined for EST-based simple sequence repeat (SSR markers. Over 1,000 ESTs with SSR motifs were detected from more than 24,000 EST sequences with di- and tri-nucleotide repeat motifs the most abundant. Of these, 25 EST-SSRs in genes involved in plant development, stress response, and fruit color and flavor development pathways were selected, developed into PCR markers and characterized in a population of 32 mango selections including M. indica varieties, and related Mangifera species. Twenty-four of the 25 EST-SSR markers exhibited polymorphisms, identifying a total of 86 alleles with an average of 5.38 alleles per locus, and distinguished between all Mangifera selections. Private alleles were identified for Mangifera species. These newly developed EST-SSR markers enhance the current 11 SSR mango genetic identity panel utilized by the Australian Mango Breeding Program. The current panel has been used to identify progeny and parents for selection and the application of this extended panel will further improve and help to design mango hybridization strategies for increased breeding efficiency.

  13. Development of genomic SSR and potential EST-SSR markers in ...

    African Journals Online (AJOL)

    In addition, forty four EST-SSRs which can be amplified with expected sizes were identified from a B. chinense root cDNA library. The genomic SSR markers and potential EST-SSR markers developed in the present study should be useful for genetic diversity and molecular marker assistant selection breeding research in ...

  14. Development of a novel set of EST-SSR markers and cross-species amplification in Tamarix africana (Tamaricaceae).

    Science.gov (United States)

    Terzoli, Serena; Beritognolo, Isacco; Sabatti, Maurizio; Kuzminsky, Elena

    2010-06-01

    Tamarix plants are resistant to abiotic stresses and have become invasive in North America. Their taxonomy is troublesome, and few molecular makers are available to enable species identification or to track the spread of specific invasive genotypes. Transcriptome sequencing projects offer a potential source for the development of new markers. • Thirteen polymorphic simple sequence repeats (SSRs) markers derived from Expressed Sequence Tags (ESTs) from Tamarix hispida, T. androssowii, T. ramosissima, and T. albiflonum were identified and screened on 24 samples of T. africana to detect polymorphism. The number of alleles per locus ranged from two to eight, with an average of 4.3 alleles per locus, and the mean expected heterozygosity was 0.453. • Amplification products of these 13 loci were also generated for T. gallica. These new EST-SSR markers will be useful in genetic characterization of Tamarix, as additional tools for taxonomic clarification, and for studying invasive populations where they are a threat.

  15. DEVELOPMENT OF EST-SSR MARKERS TO ASSESS GENETIC DIVERSITY OF BROCCOLI AND ITS RELATED SPECIES

    Directory of Open Access Journals (Sweden)

    Nur Kholilatul Izzah

    2017-01-01

    Full Text Available Development of Expressed Sequence Tag-Simple Sequence Repeat (EST-SSR markers derived from public database is known to be more efficient, faster and low cost. The objective of this study was to generate a new set of EST-SSR markers for broccoli and its related species and their usefulness for assessing their genetic diversity. A total of 202 Brassica oleracea ESTs were retrieved from NCBI and then assembled into 172 unigenes by means of CAP3 program. Identification of SSRs was carried out using web-based tool, RepeatMasker software. Afterwards, EST-SSR markers were developed using Primer3 program. Among the identified SSRs, trinucleotide repeats were the most common repeat types, which accounted for about 50%. A total of eight primer pairs were successfully designed and yielded amplification products. Among them, five markers were polymorphic and displayed a total of 30 alleles with an average number of six alleles per locus. The polymorphic markers were subsequently used for analyzing genetic diversity of 36 B. oleracea cultivars including 22 broccoli, five cauliflower and nine kohlrabi cultivars based on genetic similarity matrix as implemented in NTSYS program. At similarity coefficient of 61%, a UPGMA clustering dendrogram effectively separated 36 genotypes into three main groups, where 30 out of 36 genotypes were clearly discriminated. The result obtained in the present study would help breeders in selecting parental lines for crossing. Moreover, the novel EST-SSR markers developed in the study could be a valuable tool for differentiating cultivars of broccoli and related species.

  16. Development, cross-species/genera transferability of novel EST-SSR markers and their utility in revealing population structure and genetic diversity in sugarcane

    KAUST Repository

    Singh, Ram K.

    2013-07-01

    Sugarcane (Saccharum spp. hybrid) with complex polyploid genome requires a large number of informative DNA markers for various applications in genetics and breeding. Despite the great advances in genomic technology, it is observed in several crop species, especially in sugarcane, the availability of molecular tools such as microsatellite markers are limited. Now-a-days EST-SSR markers are preferred to genomic SSR (gSSR) as they represent only the functional part of the genome, which can be easily associated with desired trait. The present study was taken up with a new set of 351 EST-SSRs developed from the 4085 non redundant EST sequences of two Indian sugarcane cultivars. Among these EST-SSRs, TNR containing motifs were predominant with a frequency of 51.6%. Thirty percent EST-SSRs showed homology with annotated protein. A high frequency of SSRs was found in the 5\\'UTR and in the ORF (about 27%) and a low frequency was observed in the 3\\'UTR (about 8%). Two hundred twenty-seven EST-SSRs were evaluated, in sugarcane, allied genera of sugarcane and cereals, and 134 of these have revealed polymorphism with a range of PIC value 0.12 to 0.99. The cross transferability rate ranged from 87.0% to 93.4% in Saccharum complex, 80.0% to 87.0% in allied genera, and 76.0% to 80.0% in cereals. Cloning and sequencing of EST-SSR size variant amplicons revealed that the variation in the number of repeat-units was the main source of EST-SSR fragment polymorphism. When 124 sugarcane accessions were analyzed for population structure using model-based approach, seven genetically distinct groups or admixtures thereof were observed in sugarcane. Results of principal coordinate analysis or UPGMA to evaluate genetic relationships delineated also the 124 accessions into seven groups. Thus, a high level of polymorphism adequate genetic diversity and population structure assayed with the EST-SSR markers not only suggested their utility in various applications in genetics and genomics in

  17. Development of Novel Polymorphic EST-SSR Markers in Bailinggu (Pleurotus tuoliensis for Crossbreeding

    Directory of Open Access Journals (Sweden)

    Yueting Dai

    2017-11-01

    Full Text Available Identification of monokaryons and their mating types and discrimination of hybrid offspring are key steps for the crossbreeding of Pleurotus tuoliensis (Bailinggu. However, conventional crossbreeding methods are troublesome and time consuming. Using RNA-seq technology, we developed new expressed sequence tag-simple sequence repeat (EST-SSR markers for Bailinggu to easily and rapidly identify monokaryons and their mating types, genetic diversity and hybrid offspring. We identified 1110 potential EST-based SSR loci from a newly-sequenced Bailinggu transcriptome and then randomly selected 100 EST-SSRs for further validation. Results showed that 39, 43 and 34 novel EST-SSR markers successfully identified monokaryons from their parent dikaryons, differentiated two different mating types and discriminated F1 and F2 hybrid offspring, respectively. Furthermore, a total of 86 alleles were detected in 37 monokaryons using 18 highly informative EST-SSRs. The observed number of alleles per locus ranged from three to seven. Cluster analysis revealed that these monokaryons have a relatively high level of genetic diversity. Transfer rates of the EST-SSRs in the monokaryons of closely-related species Pleurotus eryngii var. ferulae and Pleurotus ostreatus were 72% and 64%, respectively. Therefore, our study provides new SSR markers and an efficient method to enhance the crossbreeding of Bailinggu and closely-related species.

  18. Genetic diversity of Phytophthora sojae isolates in Heilongjiang Province in China assessed by RAPD and EST-SSR

    Science.gov (United States)

    Wu, J. J.; Xu, P. F.; Liu, L. J.; Wang, J. S.; Lin, W. G.; Zhang, S. Z.; Wei, L.

    Random-amplified polymorphic DNA (RAPD) and EST-SSR markers were used to estimate the genetic relationship among thirty-nine P.sojae isolates from three locations in Heilongjiang Province, and nine isolates from Ohio in America were made as reference strains. 10 of 50 RAPD primers and 5 of 33 EST-SSR were polymorphic across 48 P.sojae isolates. Similarity values among P.sojae isolates were from 49% to 82% based on the RAPD data. The similarities based on EST-SSR markers ranged from 47% to 85%. The genetic diversity revealed by EST-SSR marker analysis was higher than that obtained from RAPD. The similarity matrices for the SSR data and the RAPD data were moderately correlated (r = 0.47). Genetic similarity coefficients were also relatively lower, which demonstrated complicated genetic background within each location. The high similarity values range revealed the ability of RAPD/EST-SSR markers to distinguish even among morphological similar phytophthora.

  19. SSR and EST-SSR-based genetic linkage map of cassava (Manihot esculenta Crantz).

    Science.gov (United States)

    Sraphet, Supajit; Boonchanawiwat, Athipong; Thanyasiriwat, Thanwanit; Boonseng, Opas; Tabata, Satoshi; Sasamoto, Shigemi; Shirasawa, Kenta; Isobe, Sachiko; Lightfoot, David A; Tangphatsornruang, Sithichoke; Triwitayakorn, Kanokporn

    2011-04-01

    Simple sequence repeat (SSR) markers provide a powerful tool for genetic linkage map construction that can be applied for identification of quantitative trait loci (QTL). In this study, a total of 640 new SSR markers were developed from an enriched genomic DNA library of the cassava variety 'Huay Bong 60' and 1,500 novel expressed sequence tag-simple sequence repeat (EST-SSR) loci were developed from the Genbank database. To construct a genetic linkage map of cassava, a 100 F(1) line mapping population was developed from the cross Huay Bong 60 by 'Hanatee'. Polymorphism screening between the parental lines revealed that 199 SSRs and 168 EST-SSRs were identified as novel polymorphic markers. Combining with previously developed SSRs, we report a linkage map consisted of 510 markers encompassing 1,420.3 cM, distributed on 23 linkage groups with a mean distance between markers of 4.54 cM. Comparison analysis of the SSR order on the cassava linkage map and the cassava genome sequences allowed us to locate 284 scaffolds on the genetic map. Although the number of linkage groups reported here revealed that this F(1) genetic linkage map is not yet a saturated map, it encompassed around 88% of the cassava genome indicating that the map was almost complete. Therefore, sufficient markers now exist to encompass most of the genomes and efficiently map traits in cassava.

  20. Three vibrio-resistance related EST-SSR markers revealed by selective genotyping in the clam Meretrix meretrix.

    Science.gov (United States)

    Nie, Qing; Yue, Xin; Chai, Xueliang; Wang, Hongxia; Liu, Baozhong

    2013-08-01

    The clam Meretrix meretrix is an important commercial bivalve distributed in the coastal areas of South and Southeast Asia. In this study, marker-trait association analyses were performed based on the stock materials of M. meretrix with different vibrio-resistance profile obtained by selective breeding. Forty-eight EST-SSR markers were screened and 27 polymorphic SSRs of them were genotyped in the clam stocks with different resistance to Vibrio parahaemolyticus (11-R and 11-S) and to Vibrio harveyi (09-R and 09-C). Allele frequency distributions of the SSRs among different stocks were compared using Pearson's Chi-square test, and three functional EST-SSR markers (MM959, MM4765 and MM8364) were found to be associated with vibrio-resistance trait. The 140-bp allele of MM959 and 128-bp allele of MM4765 had significantly higher frequencies in resistant groups (11-R and 09-R) than in susceptive/control groups (11-S and 09-C) (P markers were consistent with the three subgroups distinctions. The putative functions of contig959, contig4765 and contig8364 also suggested that the three SSR-involved genes might play important roles in immunity of M. meretrix. All these results supported that EST-SSR markers MM959, MM4765 and MM8364 were associated with vibrio-resistance and would be useful for marker-assisted selection (MAS) in M. meretrix genetic breeding. Copyright © 2013 Elsevier Ltd. All rights reserved.

  1. First genetic linkage map of Taraxacum koksaghyz Rodin based on AFLP, SSR, COS and EST-SSR markers.

    Science.gov (United States)

    Arias, Marina; Hernandez, Monica; Remondegui, Naroa; Huvenaars, Koen; van Dijk, Peter; Ritter, Enrique

    2016-08-04

    Taraxacum koksaghyz Rodin (TKS) has been studied in many occasions as a possible alternative source for natural rubber production of good quality and for inulin production. Some tire companies are already testing TKS tire prototypes. There are also many investigations on the production of bio-fuels from inulin and inulin applications for health improvement and in the food industry. A limited amount of genomic resources exist for TKS and particularly no genetic linkage map is available in this species. We have constructed the first TKS genetic linkage map based on AFLP, COS, SSR and EST-SSR markers. The integrated linkage map with eight linkage groups (LG), representing the eight chromosomes of Russian dandelion, has 185 individual AFLP markers from parent 1, 188 individual AFLP markers from parent 2, 75 common AFLP markers and 6 COS, 1 SSR and 63 EST-SSR loci. Blasting the EST-SSR sequences against known sequences from lettuce allowed a partial alignment of our TKS map with a lettuce map. Blast searches against plant gene databases revealed some homologies with useful genes for downstream applications in the future.

  2. First genetic linkage map of Taraxacum koksaghyz Rodin based on AFLP, SSR, COS and EST-SSR markers

    Science.gov (United States)

    Arias, Marina; Hernandez, Monica; Remondegui, Naroa; Huvenaars, Koen; van Dijk, Peter; Ritter, Enrique

    2016-01-01

    Taraxacum koksaghyz Rodin (TKS) has been studied in many occasions as a possible alternative source for natural rubber production of good quality and for inulin production. Some tire companies are already testing TKS tire prototypes. There are also many investigations on the production of bio-fuels from inulin and inulin applications for health improvement and in the food industry. A limited amount of genomic resources exist for TKS and particularly no genetic linkage map is available in this species. We have constructed the first TKS genetic linkage map based on AFLP, COS, SSR and EST-SSR markers. The integrated linkage map with eight linkage groups (LG), representing the eight chromosomes of Russian dandelion, has 185 individual AFLP markers from parent 1, 188 individual AFLP markers from parent 2, 75 common AFLP markers and 6 COS, 1 SSR and 63 EST-SSR loci. Blasting the EST-SSR sequences against known sequences from lettuce allowed a partial alignment of our TKS map with a lettuce map. Blast searches against plant gene databases revealed some homologies with useful genes for downstream applications in the future. PMID:27488242

  3. Development and characterization of EST-SSR markers for Artocarpus hypargyreus (Moraceae).

    Science.gov (United States)

    Liu, Haijun; Tan, Weizheng; Sun, Hongbin; Liu, Yu; Meng, Kaikai; Liao, Wenbo

    2016-12-01

    Polymorphic microsatellite markers were developed for Artocarpus hypargyreus (Moraceae), a threatened species endemic to China, to investigate the genetic diversity and structure of the species. Based on the transcriptome data of A. hypargyreus , 63 primer pairs were preliminarily designed and tested, of which 34 were successfully amplified and 10 displayed clear polymorphisms across the 67 individuals from four populations of A. hypargyreus . The results showed the number of alleles per locus ranged from three to 10, and the observed heterozygosity and expected heterozygosity per locus varied from 0.000 to 0.706 and from 0.328 to 0.807, respectively. These microsatellite markers will be useful in exploring genetic diversity and structure of A. hypargyreus . Furthermore, most loci were successfully cross-amplified in A. nitidus and A. heterophyllus , indicating that they will be of great value for genetic study across this genus.

  4. Isolation and characterization of twenty-nine novel EST-SSR ...

    Indian Academy of Sciences (India)

    1988) and shows relatively close relationship to S. scherz- eri. S. undulata is considered Near Threatened (Zhao 2011) because it has been impacted by overfishing and pollution, and its population decline is suspected to be close to 30% over the past 10 years. There is thus a significant need to pro- tect this species. So far ...

  5. Isolation and characterization of twenty-nine novel EST-SSR ...

    Indian Academy of Sciences (India)

    Genotyping errors due to null alleles, stutter bands, or allele dropout were checked by Micro-Checker ver. 2.2.3 software. (Van-Oosterhout et al. 2004). Results and discussion. As shown in table 1, 29 of the 73 tested loci were poly- morphic. For the polymorphic loci, the number of alleles observed ranged from 3 to 13, with ...

  6. EST-SSR marker revealed effective over biochemical and morphological scepticism towards identification of specific turmeric (Curcuma longa L.) cultivars.

    Science.gov (United States)

    Sahoo, Ambika; Jena, Sudipta; Kar, Basudeba; Sahoo, Suprava; Ray, Asit; Singh, Subhashree; Joshi, Raj Kumar; Acharya, Laxmikanta; Nayak, Sanghamitra

    2017-05-01

    Turmeric (Curcuma longa L., family Zingiberaceae) is one of the most economically important plants for its use in food, medicine, and cosmetic industries. Cultivar identification is a major constraint in turmeric, owing to high degree of morphological similarity that in turn, affects its commercialization. The present study addresses this constraint, using EST-SSR marker based, molecular identification of 8 elite cultivars and 88 accessions in turmeric. Fifty EST-SSR primers were screened against eight cultivars of turmeric (Suroma, Roma, Lakadong, Megha, Alleppey Supreme, Kedaram, Pratibha, and Suvarna); out of which 11 primers showed polymorphic banding pattern. The polymorphic information content (PIC) of these primers ranged from 0.13 to 0.48. However, only three SSR loci (CSSR 14, CSSR 15, and CSSR 18) gave reproducible unique banding pattern clearly distinguishing the cultivars 'Lakadong' and 'Suvarna' from other cultivars tested. These three unique SSR markers also proved to be effective in identification of 'Lakadong' cultivars when analysed with 88 accessions of turmeric collected from different agro-climatic regions. Furthermore, two identified cultivars (Lakadong and Suvarna) could also be precisely differentiated when analysed and based on phylogenetic tree, with other 94 genotypes of turmeric. The novel SSR markers can be used for identification and authentication of two commercially important turmeric cultivars 'Lakadong' and 'Suvarna'.

  7. Construction of a genetic map using EST-SSR markers and QTL analysis of major agronomic characters in hexaploid sweet potato (Ipomoea batatas (L.) Lam).

    Science.gov (United States)

    Kim, Jin-Hee; Chung, Il Kyung; Kim, Kyung-Min

    2017-01-01

    The Sweet potato, Ipomoea batatas (L.) Lam, is difficult to study in genetics and genomics because it is a hexaploid. The sweet potato study not have been performed domestically or internationally. In this study was performed to construct genetic map and quantitative trait loci (QTL) analysis. A total of 245 EST-SSR markers were developed, and the map was constructed by using 210 of those markers. The total map length was 1508.1 cM, and the mean distance between markers was 7.2 cM. Fifteen characteristics were investigated for QTLs analysis. According to those, the Four QTLs were identified, and The LOD score was 3.0. Further studies need to develop molecular markers in terms of EST-SSR markers for doing to be capable of efficient breeding. The genetic map created here using EST-SSR markers will facilitate planned breeding of sweet potato cultivars with various desirable traits.

  8. Construction of a genetic map using EST-SSR markers and QTL analysis of major agronomic characters in hexaploid sweet potato (Ipomoea batatas (L.) Lam)

    Science.gov (United States)

    Kim, Jin-Hee; Chung, Il Kyung

    2017-01-01

    The Sweet potato, Ipomoea batatas (L.) Lam, is difficult to study in genetics and genomics because it is a hexaploid. The sweet potato study not have been performed domestically or internationally. In this study was performed to construct genetic map and quantitative trait loci (QTL) analysis. A total of 245 EST-SSR markers were developed, and the map was constructed by using 210 of those markers. The total map length was 1508.1 cM, and the mean distance between markers was 7.2 cM. Fifteen characteristics were investigated for QTLs analysis. According to those, the Four QTLs were identified, and The LOD score was 3.0. Further studies need to develop molecular markers in terms of EST-SSR markers for doing to be capable of efficient breeding. The genetic map created here using EST-SSR markers will facilitate planned breeding of sweet potato cultivars with various desirable traits. PMID:29020092

  9. An EST-SSR based linkage map for Persea americana Mill. (avocado)

    Science.gov (United States)

    Recent enhancement of the pool of known molecular markers for avocado has allowed the construction of the first moderate density genetic map for this species. Over 300 microsatellite markers have been characterized and 163 of these were used to construct a map from the cross of two Florida cultivar...

  10. Exploiting Illumina Sequencing for the Development of 95 Novel Polymorphic EST-SSR Markers in Common Vetch (Vicia sativa subsp. sativa

    Directory of Open Access Journals (Sweden)

    Zhipeng Liu

    2014-05-01

    Full Text Available The common vetch (Vicia sativa subsp. sativa, a self-pollinating and diploid species, is one of the most important annual legumes in the world due to its short growth period, high nutritional value, and multiple usages as hay, grain, silage, and green manure. The available simple sequence repeat (SSR markers for common vetch, however, are insufficient to meet the developing demand for genetic and molecular research on this important species. Here, we aimed to develop and characterise several polymorphic EST-SSR markers from the vetch Illumina transcriptome. A total number of 1,071 potential EST-SSR markers were identified from 1025 unigenes whose lengths were greater than 1,000 bp, and 450 primer pairs were then designed and synthesized. Finally, 95 polymorphic primer pairs were developed for the 10 common vetch accessions, which included 50 individuals. Among the 95 EST-SSR markers, the number of alleles ranged from three to 13, and the polymorphism information content values ranged from 0.09 to 0.98. The observed heterozygosity values ranged from 0.00 to 1.00, and the expected heterozygosity values ranged from 0.11 to 0.98. These 95 EST-SSR markers developed from the vetch Illumina transcriptome could greatly promote the development of genetic and molecular breeding studies pertaining to in this species.

  11. Signatures of diversifying selection at EST-SSR loci and association with climate in natural Eucalyptus populations.

    Science.gov (United States)

    Bradbury, Donna; Smithson, Ann; Krauss, Siegfried L

    2013-10-01

    Understanding the environmental parameters that drive adaptation among populations is important in predicting how species may respond to global climatic changes and how gene pools might be managed to conserve adaptive genetic diversity. Here, we used Bayesian FST outlier tests and allele-climate association analyses to reveal two Eucalyptus EST-SSR loci as strong candidates for diversifying selection in natural populations of a southwestern Australian forest tree, Eucalyptus gomphocephala (Myrtaceae). The Eucalyptus homolog of a CONSTANS-like gene was an FST outlier, and allelic variation showed significant latitudinal clinal associations with annual and winter solar radiation, potential evaporation, summer precipitation and aridity. A second FST outlier locus, homologous to quinone oxidoreductase, was significantly associated with measures of temperature range, high summer temperature and summer solar radiation, with important implications for predicting the effect of temperature on natural populations in the context of climate change. We complemented these data with investigations into neutral population genetic structure and diversity throughout the species range. This study provides an investigation into selection signatures at gene-homologous EST-SSRs in natural Eucalyptus populations, and contributes to our understanding of the relationship between climate and adaptive genetic variation, informing the conservation of both putatively neutral and adaptive components of genetic diversity. © 2013 John Wiley & Sons Ltd.

  12. Determination of the genetic diversity of vegetable soybean [Glycine max (L.) Merr.] using EST-SSR markers.

    Science.gov (United States)

    Zhang, Gu-wen; Xu, Sheng-chun; Mao, Wei-hua; Hu, Qi-zan; Gong, Ya-ming

    2013-04-01

    The development of expressed sequence tag-derived simple sequence repeats (EST-SSRs) provided a useful tool for investigating plant genetic diversity. In the present study, 22 polymorphic EST-SSRs from grain soybean were identified and used to assess the genetic diversity in 48 vegetable soybean accessions. Among the 22 EST-SSR loci, tri-nucleotides were the most abundant repeats, accounting for 50.00% of the total motifs. GAA was the most common motif among tri-nucleotide repeats, with a frequency of 18.18%. Polymorphic analysis identified a total of 71 alleles, with an average of 3.23 per locus. The polymorphism information content (PIC) values ranged from 0.144 to 0.630, with a mean of 0.386. Observed heterozygosity (Ho) values varied from 0.0196 to 1.0000, with an average of 0.6092, while the expected heterozygosity (He) values ranged from 0.1502 to 0.6840, with a mean value of 0.4616. Principal coordinate analysis and phylogenetic tree analysis indicated that the accessions could be assigned to different groups based to a large extent on their geographic distribution, and most accessions from China were clustered into the same groups. These results suggest that Chinese vegetable soybean accessions have a narrow genetic base. The results of this study indicate that EST-SSRs from grain soybean have high transferability to vegetable soybean, and that these new markers would be helpful in taxonomy, molecular breeding, and comparative mapping studies of vegetable soybean in the future.

  13. Genetic Diversity and Association of EST-SSR and SCoT Markers with Rust Traits in Orchardgrass (Dactylis glomerata L.).

    Science.gov (United States)

    Yan, Haidong; Zhang, Yu; Zeng, Bing; Yin, Guohua; Zhang, Xinquan; Ji, Yang; Huang, Linkai; Jiang, Xiaomei; Liu, Xinchun; Peng, Yan; Ma, Xiao; Yan, Yanhong

    2016-01-08

    Orchardgrass (Dactylis glomerata L.), is a well-known perennial forage species; however, rust diseases have caused a noticeable reduction in the quality and production of orchardgrass. In this study, genetic diversity was assessed and the marker-trait associations for rust were examined using 18 EST-SSR and 21 SCoT markers in 75 orchardgrass accessions. A high level of genetic diversity was detected in orchardgrass with an average genetic diversity index of 0.369. For the EST-SSR and SCoT markers, 164 and 289 total bands were obtained, of which 148 (90.24%) and 272 (94.12%) were polymorphic, respectively. Results from an AMOVA analysis showed that more genetic variance existed within populations (87.57%) than among populations (12.43%). Using a parameter marker index, the efficiencies of the EST-SSR and SCoT markers were compared to show that SCoTs have higher marker efficiency (8.07) than EST-SSRs (4.82). The results of a UPGMA cluster analysis and a STRUCTURE analysis were both correlated with the geographic distribution of the orchardgrass accessions. Linkage disequilibrium analysis revealed an average r² of 0.1627 across all band pairs, indicating a high extent of linkage disequilibrium in the material. An association analysis between the rust trait and 410 bands from the EST-SSR and SCoT markers using TASSEL software revealed 20 band panels were associated with the rust trait in both 2011 and 2012. The 20 bands obtained from association analysis could be used in breeding programs for lineage selection to prevent great losses of orchardgrass caused by rust, and provide valuable information for further association mapping using this collection of orchardgrass.

  14. De novo transcriptomic analysis and development of EST-SSR markers in the Siberian tiger (Panthera tigris altaica).

    Science.gov (United States)

    Lu, Taofeng; Sun, Yujiao; Ma, Qin; Zhu, Minghao; Liu, Dan; Ma, Jianzhang; Ma, Yuehui; Chen, Hongyan; Guan, Weijun

    2016-12-01

    The Siberian tiger, Panthera tigris altaica, is an endangered species, and much more work is needed to protect this species, which is still vulnerable to extinction. Conservation efforts may be supported by the genetic assessment of wild populations, for which highly specific microsatellite markers are required. However, only a limited amount of genetic sequence data is available for this species. To identify the genes involved in the lung transcriptome and to develop additional simple sequence repeat (SSR) markers for the Siberian tiger, we used high-throughput RNA-Seq to characterize the Siberian tiger transcriptome in lung tissue (designated 'PTA-lung') and a pooled tissue sample (designated 'PTA'). Approximately 47.5 % (33,187/69,836) of the lung transcriptome was annotated in four public databases (Nr, Swiss-Prot, KEGG, and COG). The annotated genes formed a potential pool for gene identification in the tiger. An analysis of the genes differentially expressed in the PTA lung, and PTA samples revealed that the tiger may have suffered a series of diseases before death. In total, 1062 non-redundant SSRs were identified in the Siberian tiger transcriptome. Forty-three primer pairs were randomly selected for amplification reactions, and 26 of the 43 pairs were also used to evaluate the levels of genetic polymorphism. Fourteen primer pairs (32.56 %) amplified products that were polymorphic in size in P. tigris altaica. In conclusion, the transcriptome sequences will provide a valuable genomic resource for genetic research, and these new SSR markers comprise a reasonable number of loci for the genetic analysis of wild and captive populations of P. tigris altaica.

  15. Transcriptome sequencing of mung bean (Vigna radiate L. genes and the identification of EST-SSR markers.

    Directory of Open Access Journals (Sweden)

    Honglin Chen

    Full Text Available Mung bean (Vigna radiate (L. Wilczek is an important traditional food legume crop, with high economic and nutritional value. It is widely grown in China and other Asian countries. Despite its importance, genomic information is currently unavailable for this crop plant species or some of its close relatives in the Vigna genus. In this study, more than 103 million high quality cDNA sequence reads were obtained from mung bean using Illumina paired-end sequencing technology. The processed reads were assembled into 48,693 unigenes with an average length of 874 bp. Of these unigenes, 25,820 (53.0% and 23,235 (47.7% showed significant similarity to proteins in the NCBI non-redundant protein and nucleotide sequence databases, respectively. Furthermore, 19,242 (39.5% could be classified into gene ontology categories, 18,316 (37.6% into Swiss-Prot categories and 10,918 (22.4% into KOG database categories (E-value < 1.0E-5. A total of 6,585 (8.3% were mapped onto 244 pathways using the Kyoto Encyclopedia of Genes and Genome (KEGG pathway database. Among the unigenes, 10,053 sequences contained a unique simple sequence repeat (SSR, and 2,303 sequences contained more than one SSR together in the same expressed sequence tag (EST. A total of 13,134 EST-SSRs were identified as potential molecular markers, with mono-nucleotide A/T repeats being the most abundant motif class and G/C repeats being rare. In this SSR analysis, we found five main repeat motifs: AG/CT (30.8%, GAA/TTC (12.6%, AAAT/ATTT (6.8%, AAAAT/ATTTT (6.2% and AAAAAT/ATTTTT (1.9%. A total of 200 SSR loci were randomly selected for validation by PCR amplification as EST-SSR markers. Of these, 66 marker primer pairs produced reproducible amplicons that were polymorphic among 31 mung bean accessions selected from diverse geographical locations. The large number of SSR-containing sequences found in this study will be valuable for the construction of a high-resolution genetic linkage maps, association

  16. PROSITE, a protein domain database for functional characterization and annotation.

    Science.gov (United States)

    Sigrist, Christian J A; Cerutti, Lorenzo; de Castro, Edouard; Langendijk-Genevaux, Petra S; Bulliard, Virginie; Bairoch, Amos; Hulo, Nicolas

    2010-01-01

    PROSITE consists of documentation entries describing protein domains, families and functional sites, as well as associated patterns and profiles to identify them. It is complemented by ProRule, a collection of rules based on profiles and patterns, which increases the discriminatory power of these profiles and patterns by providing additional information about functionally and/or structurally critical amino acids. PROSITE is largely used for the annotation of domain features of UniProtKB/Swiss-Prot entries. Among the 983 (DNA-binding) domains, repeats and zinc fingers present in Swiss-Prot (release 57.8 of 22 September 2009), 696 ( approximately 70%) are annotated with PROSITE descriptors using information from ProRule. In order to allow better functional characterization of domains, PROSITE developments focus on subfamily specific profiles and a new profile building method giving more weight to functionally important residues. Here, we describe AMSA, an annotated multiple sequence alignment format used to build a new generation of generalized profiles, the migration of ScanProsite to Vital-IT, a cluster of 633 CPUs, and the adoption of the Distributed Annotation System (DAS) to facilitate PROSITE data integration and interchange with other sources. The latest version of PROSITE (release 20.54, of 22 September 2009) contains 1308 patterns, 863 profiles and 869 ProRules. PROSITE is accessible at: http://www.expasy.org/prosite/.

  17. In silico mining for simple sequence repeat loci in a pineapple expressed sequence tag database and cross-species amplification of EST-SSR markers across Bromeliaceae.

    Science.gov (United States)

    Wöhrmann, Tina; Weising, Kurt

    2011-08-01

    A collection of 5,659 expressed sequence tags (ESTs) from pineapple [Ananas comosus (L.) Merr.] was screened for simple sequence repeats (EST-SSRs) with motif lengths between 1 and 6 bp. Lower thresholds of 15, 7 and 5 repeat units were used to define microsatellites of the mono-, di-, and tri- to hexanucleotide repeat type, respectively. Based on these criteria, 696 SSRs were identified among 3,389 EST unigenes, together representing 2,840 kb. This corresponds to an average density of one SSR every 4.1 kb of non-redundant EST sequences. Dinucleotide repeats were most abundant (38.4% of all SSRs) followed by trinucleotide repeats (38.1%). Flanking primer pairs were designed for 537 EST-SSR loci, and 49 of these were screened for their functionality in 12 accessions of A. comosus, 14 accessions of 5 additional Ananas species and 1 species of Pseudananas. Distinct PCR products of the expected size range were obtained with 36 primer pairs. Eighteen loci analyzed in more detail were all polymorphic in pineapple, and primer pairs flanking these loci also generated PCR products from a wide range of genera and species from six subfamilies of the Bromeliaceae. The potential to reveal polymorphism in a heterologous target species was demonstrated in Deuterocohnia brevifolia (subfamily Pitcairnioideae).

  18. De novo comparative transcriptome analysis of genes involved in fruit morphology of pumpkin cultivars with extreme size difference and development of EST-SSR markers.

    Science.gov (United States)

    Xanthopoulou, Aliki; Ganopoulos, Ioannis; Psomopoulos, Fotis; Manioudaki, Maria; Moysiadis, Theodoros; Kapazoglou, Aliki; Osathanunkul, Maslin; Michailidou, Sofia; Kalivas, Apostolos; Tsaftaris, Athanasios; Nianiou-Obeidat, Irini; Madesis, Panagiotis

    2017-07-30

    The genetic basis of fruit size and shape was investigated for the first time in Cucurbita species and genetic loci associated with fruit morphology have been identified. Although extensive genomic resources are available at present for tomato (Solanum lycopersicum), cucumber (Cucumis sativus), melon (Cucumis melo) and watermelon (Citrullus lanatus), genomic databases for Cucurbita species are limited. Recently, our group reported the generation of pumpkin (Cucurbita pepo) transcriptome databases from two contrasting cultivars with extreme fruit sizes. In the current study we used these databases to perform comparative transcriptome analysis in order to identify genes with potential roles in fruit morphology and fruit size. Differential Gene Expression (DGE) analysis between cv. 'Munchkin' (small-fruit) and cv. 'Big Moose' (large-fruit) revealed a variety of candidate genes associated with fruit morphology with significant differences in gene expression between the two cultivars. In addition, we have set the framework for generating EST-SSR markers, which discriminate different C. pepo cultivars and show transferability to related Cucurbitaceae species. The results of the present study will contribute to both further understanding the molecular mechanisms regulating fruit morphology and furthermore identifying the factors that determine fruit size. Moreover, they may lead to the development of molecular marker tools for selecting genotypes with desired morphological traits. Copyright © 2017. Published by Elsevier B.V.

  19. A combined functional and structural genomics approach identified an EST-SSR marker with complete linkage to the Ligon lintless-2 genetic locus in cotton (Gossypium hirsutum L.

    Directory of Open Access Journals (Sweden)

    Tang Yuhong

    2011-09-01

    Full Text Available Abstract Background Cotton fiber length is an important quality attribute to the textile industry and longer fibers can be more efficiently spun into yarns to produce superior fabrics. There is typically a negative correlation between yield and fiber quality traits such as length. An understanding of the regulatory mechanisms controlling fiber length can potentially provide a valuable tool for cotton breeders to improve fiber length while maintaining high yields. The cotton (Gossypium hirsutum L. fiber mutation Ligon lintless-2 is controlled by a single dominant gene (Li2 that results in significantly shorter fibers than a wild-type. In a near-isogenic state with a wild-type cotton line, Li2 is a model system with which to study fiber elongation. Results Two near-isogenic lines of Ligon lintless-2 (Li2 cotton, one mutant and one wild-type, were developed through five generations of backcrosses (BC5. An F2 population was developed from a cross between the two Li2 near-isogenic lines and used to develop a linkage map of the Li2 locus on chromosome 18. Five simple sequence repeat (SSR markers were closely mapped around the Li2 locus region with two of the markers flanking the Li2 locus at 0.87 and 0.52 centimorgan. No apparent differences in fiber initiation and early fiber elongation were observed between the mutant ovules and the wild-type ones. Gene expression profiling using microarrays suggested roles of reactive oxygen species (ROS homeostasis and cytokinin regulation in the Li2 mutant phenotype. Microarray gene expression data led to successful identification of an EST-SSR marker (NAU3991 that displayed complete linkage to the Li2 locus. Conclusions In the field of cotton genomics, we report the first successful conversion of gene expression data into an SSR marker that is associated with a genomic region harboring a gene responsible for a fiber trait. The EST-derived SSR marker NAU3991 displayed complete linkage to the Li2 locus on

  20. Characterizing and annotating the genome using RNA-seq data.

    Science.gov (United States)

    Chen, Geng; Shi, Tieliu; Shi, Leming

    2017-02-01

    Bioinformatics methods for various RNA-seq data analyses are in fast evolution with the improvement of sequencing technologies. However, many challenges still exist in how to efficiently process the RNA-seq data to obtain accurate and comprehensive results. Here we reviewed the strategies for improving diverse transcriptomic studies and the annotation of genetic variants based on RNA-seq data. Mapping RNA-seq reads to the genome and transcriptome represent two distinct methods for quantifying the expression of genes/transcripts. Besides the known genes annotated in current databases, many novel genes/transcripts (especially those long noncoding RNAs) still can be identified on the reference genome using RNA-seq. Moreover, owing to the incompleteness of current reference genomes, some novel genes are missing from them. Genome- guided and de novo transcriptome reconstruction are two effective and complementary strategies for identifying those novel genes/transcripts on or beyond the reference genome. In addition, integrating the genes of distinct databases to conduct transcriptomics and genetics studies can improve the results of corresponding analyses.

  1. Assessment of Functional EST-SSR Markers (Sugarcane in Cross-Species Transferability, Genetic Diversity among Poaceae Plants, and Bulk Segregation Analysis

    Directory of Open Access Journals (Sweden)

    Shamshad Ul Haq

    2016-01-01

    Full Text Available Expressed sequence tags (ESTs are important resource for gene discovery, gene expression and its regulation, molecular marker development, and comparative genomics. We procured 10000 ESTs and analyzed 267 EST-SSRs markers through computational approach. The average density was one SSR/10.45 kb or 6.4% frequency, wherein trinucleotide repeats (66.74% were the most abundant followed by di- (26.10%, tetra- (4.67%, penta- (1.5%, and hexanucleotide (1.2% repeats. Functional annotations were done and after-effect newly developed 63 EST-SSRs were used for cross transferability, genetic diversity, and bulk segregation analysis (BSA. Out of 63 EST-SSRs, 42 markers were identified owing to their expansion genetics across 20 different plants which amplified 519 alleles at 180 loci with an average of 2.88 alleles/locus and the polymorphic information content (PIC ranged from 0.51 to 0.93 with an average of 0.83. The cross transferability ranged from 25% for wheat to 97.22% for Schlerostachya, with an average of 55.86%, and genetic relationships were established based on diversification among them. Moreover, 10 EST-SSRs were recognized as important markers between bulks of pooled DNA of sugarcane cultivars through BSA. This study highlights the employability of the markers in transferability, genetic diversity in grass species, and distinguished sugarcane bulks.

  2. Characterization of grinding wheels: An annotated Bibliography. Final report

    Energy Technology Data Exchange (ETDEWEB)

    McClung, R.W.

    1995-12-01

    The characteristics of grinding wheels, after both fabrication and periods of operation, have a significant effect on the processed surface and the mechanical properties of advanced ceramics. An extensive literature survey and review has been conducted to determine and catalogue the various characterization methods that have been investigated and reported. Although many of the references have addressed the grinding of metals, the historical and technical merit justify their inclusion in this bibliography. For convenience, the references have been subdivided into nine subheadings: Nondestructive examination; elasticity and stiffness; wheel hardness; topography and profilometry; observation of texture of wheel surfaces wheel wear; in process monitoring of grinding, acoustic emission, other; characteristics of ground surfaces; and miscellaneous.

  3. simple sequence repeats (EST-SSR)

    African Journals Online (AJOL)

    Yomi

    2012-01-19

    Jan 19, 2012 ... 212 primer pairs selected, based on repeat patterns of n≥8 for di-, tri-, tetra- and penta-nucleotide repeat ... Cluster analysis revealed a high genetic similarity among the sugarcane (Saccharum spp.) breeding lines which could reduce the genetic gain in ..... The multiple allele characteristic of SSR com-.

  4. (EST-SSR) markers in radi

    African Journals Online (AJOL)

    user2

    2013-02-27

    Feb 27, 2013 ... newer molecular marker systems, such as microsatellite. *Corresponding ... recent years, a few molecular marker systems including random ...... markers for estimating genetic diversity in cucumber. Biologia. Plantarum. 55(3):577-580. Huang H, Lu J, Ren Z, Hunter W, Dowd SE, Dang P (2011). Mining and.

  5. Characterization of common carp transcriptome: sequencing, de novo assembly, annotation and comparative genomics.

    Directory of Open Access Journals (Sweden)

    Peifeng Ji

    Full Text Available BACKGROUND: Common carp (Cyprinus carpio is one of the most important aquaculture species of Cyprinidae with an annual global production of 3.4 million tons, accounting for nearly 14% of the freshwater aquaculture production in the world. Due to the economical and ecological importance of common carp, genomic data are eagerly needed for genetic improvement purpose. However, there is still no sufficient transcriptome data available. The objective of the project is to sequence transcriptome deeply and provide well-assembled transcriptome sequences to common carp research community. RESULT: Transcriptome sequencing of common carp was performed using Roche 454 platform. A total of 1,418,591 clean ESTs were collected and assembled into 36,811 cDNA contigs, with average length of 888 bp and N50 length of 1,002 bp. Annotation was performed and a total of 19,165 unique proteins were identified from assembled contigs. Gene ontology and KEGG analysis were performed and classified all contigs into functional categories for understanding gene functions and regulation pathways. Open Reading Frames (ORFs were detected from 29,869 (81.1% contigs with an average ORF length of 763 bp. From these contigs, 9,625 full-length cDNAs were identified with sequence length from 201 bp to 9,956 bp. Comparative analysis revealed that 27,693(75.2% contigs have significant similarity to zebrafish Refseq proteins, and 24,371(66.2%, 24,501(66.5% and 25,025(70.0% to teraodon, medaka and three-spined stickleback refseq proteins. A total of 2,064 microsatellites were initially identified from 1,730 contigs, and 1,639 unique sequences had sufficient flanking sequences on both sides for primer design. CONCLUSION: The transcriptome of common carp had been deep sequenced, de novo assembled and characterized, providing the valuable resource for better understanding of common carp genome. The transcriptome data will facilitate future functional studies on common carp genome, and

  6. Soil-characterization and soil-amendment use on coal surface mine lands: An annotated bibliography. Information Circular/1991

    International Nuclear Information System (INIS)

    Norland, M.R.; Veith, D.L.

    1991-01-01

    The U.S. Bureau of Mines Report on United States and Canadian Literature pertaining to soil characterization and the use of soil amendments as a part of the reclamation process of coal surface-mined lands contains 1,280 references. The references were published during the 1977 to 1988 period. Each reference is evaluated by keywords, providing the reader with a means of rapidly sorting through the references to locate those articles with the coal mining regions and subjects of interest. All references are annotated

  7. New gSSR and EST-SSR markers reveal high genetic diversity in the invasive plant Ambrosia artemisiifolia L. and can be transferred to other invasive Ambrosia species.

    Directory of Open Access Journals (Sweden)

    Lucie Meyer

    Full Text Available Ambrosia artemisiifolia L., (common ragweed, is an annual invasive and highly troublesome plant species originating from North America that has become widespread across Europe. New sets of genomic and expressed sequence tag (EST based simple sequence repeats (SSRs markers were developed in this species using three approaches. After validation, 13 genomic SSRs and 13 EST-SSRs were retained and used to characterize the genetic diversity and population genetic structure of Ambrosia artemisiifolia populations from the native (North America and invasive (Europe ranges of the species. Analysing the mating system based on maternal families did not reveal any departure from complete allogamy and excess homozygosity was mostly due the presence of null alleles. High genetic diversity and patterns of genetic structure in Europe suggest two main introduction events followed by secondary colonization events. Cross-species transferability of the newly developed markers to other invasive species of the Ambrosia genus was assessed. Sixty-five percent and 75% of markers, respectively, were transferable from A. artemisiifolia to Ambrosia psilostachya and Ambrosia tenuifolia. 40% were transferable to Ambrosia trifida, this latter species being seemingly more phylogenetically distantly related to A. artemisiifolia than the former two.

  8. Construction of an EST-SSR-based interspecific transcriptome ...

    Indian Academy of Sciences (India)

    Quantitative trait locus (QTL) mapping is an important method in marker-assisted selection breeding. Many studies on the QTLs focus on cotton fibre yield and quality; however, most are conducted at the DNA level, which may reveal null QTLs. Hence, QTL mapping based on transcriptome maps at the cDNA level is often ...

  9. Construction of an EST-SSR-based interspecific transcriptome ...

    Indian Academy of Sciences (India)

    age groups, with 37 remaining loci unmapped. The total length of the transcriptome linkage map was 1938.72cM. (table 2; figure 1). The longest linkage group was 132.74 cM with 15 loci (LG05/Chr05), while the shortest linkage group was 2.33cM with two loci (LG31/Chr25); generally, the average length of a linkage group ...

  10. Construction of an EST-SSR-based interspecific transcriptome ...

    Indian Academy of Sciences (India)

    DPL. 21. 6 (28.6%). 0 (0). Gh. 8. 0 (0). 0 (0). MUSB. 6. 1 (16.7%). 0 (0). Total. 1270. 570 (44.9%). 303 (23.86%). Table 2. Basic information on the F2 transcriptome linkage map based on developing fibres at five DPA. Linkage group. Chromosome. Length (cM). Total loci. Average distance. Largest gap (cM). 1. Chr01. 78.53.

  11. Characterization of Liaoning cashmere goat transcriptome: sequencing, de novo assembly, functional annotation and comparative analysis.

    Science.gov (United States)

    Liu, Hongliang; Wang, Tingting; Wang, Jinke; Quan, Fusheng; Zhang, Yong

    2013-01-01

    Liaoning cashmere goat is a famous goat breed for cashmere wool. In order to increase the transcriptome data and accelerate genetic improvement for this breed, we performed de novo transcriptome sequencing to generate the first expressed sequence tag dataset for the Liaoning cashmere goat, using next-generation sequencing technology. Transcriptome sequencing of Liaoning cashmere goat on a Roche 454 platform yielded 804,601 high-quality reads. Clustering and assembly of these reads produced a non-redundant set of 117,854 unigenes, comprising 13,194 isotigs and 104,660 singletons. Based on similarity searches with known proteins, 17,356 unigenes were assigned to 6,700 GO categories, and the terms were summarized into three main GO categories and 59 sub-categories. 3,548 and 46,778 unigenes had significant similarity to existing sequences in the KEGG and COG databases, respectively. Comparative analysis revealed that 42,254 unigenes were aligned to 17,532 different sequences in NCBI non-redundant nucleotide databases. 97,236 (82.51%) unigenes were mapped to the 30 goat chromosomes. 35,551 (30.17%) unigenes were matched to 11,438 reported goat protein-coding genes. The remaining non-matched unigenes were further compared with cattle and human reference genes, 67 putative new goat genes were discovered. Additionally, 2,781 potential simple sequence repeats were initially identified from all unigenes. The transcriptome of Liaoning cashmere goat was deep sequenced, de novo assembled, and annotated, providing abundant data to better understand the Liaoning cashmere goat transcriptome. The potential simple sequence repeats provide a material basis for future genetic linkage and quantitative trait loci analyses.

  12. Characterization of Liaoning cashmere goat transcriptome: sequencing, de novo assembly, functional annotation and comparative analysis.

    Directory of Open Access Journals (Sweden)

    Hongliang Liu

    Full Text Available Liaoning cashmere goat is a famous goat breed for cashmere wool. In order to increase the transcriptome data and accelerate genetic improvement for this breed, we performed de novo transcriptome sequencing to generate the first expressed sequence tag dataset for the Liaoning cashmere goat, using next-generation sequencing technology.Transcriptome sequencing of Liaoning cashmere goat on a Roche 454 platform yielded 804,601 high-quality reads. Clustering and assembly of these reads produced a non-redundant set of 117,854 unigenes, comprising 13,194 isotigs and 104,660 singletons. Based on similarity searches with known proteins, 17,356 unigenes were assigned to 6,700 GO categories, and the terms were summarized into three main GO categories and 59 sub-categories. 3,548 and 46,778 unigenes had significant similarity to existing sequences in the KEGG and COG databases, respectively. Comparative analysis revealed that 42,254 unigenes were aligned to 17,532 different sequences in NCBI non-redundant nucleotide databases. 97,236 (82.51% unigenes were mapped to the 30 goat chromosomes. 35,551 (30.17% unigenes were matched to 11,438 reported goat protein-coding genes. The remaining non-matched unigenes were further compared with cattle and human reference genes, 67 putative new goat genes were discovered. Additionally, 2,781 potential simple sequence repeats were initially identified from all unigenes.The transcriptome of Liaoning cashmere goat was deep sequenced, de novo assembled, and annotated, providing abundant data to better understand the Liaoning cashmere goat transcriptome. The potential simple sequence repeats provide a material basis for future genetic linkage and quantitative trait loci analyses.

  13. NFP: An R Package for Characterizing and Comparing of Annotated Biological Networks.

    Science.gov (United States)

    Cao, Yang; Xu, Wenjian; Niu, Chao; Bo, Xiaochen; Li, Fei

    2017-01-01

    Large amounts of various biological networks exist for representing different types of interaction data, such as genetic, metabolic, gene regulatory, and protein-protein relationships. Recent approaches on biological network study are based on different mathematical concepts. It is necessary to construct a uniform framework to judge the functionality of biological networks. We recently introduced a knowledge-based computational framework that reliably characterized biological networks in system level. The method worked by making systematic comparisons to a set of well-studied "basic networks," measuring both the functional and topological similarities. A biological network could be characterized as a spectrum-like vector consisting of similarities to basic networks. Here, to facilitate the application, development, and adoption of this framework, we present an R package called NFP. This package extends our previous pipeline, offering a powerful set of functions for Network Fingerprint analysis. The software shows great potential in biological network study. The open source NFP R package is freely available under the GNU General Public License v2.0 at CRAN along with the vignette.

  14. NFP: An R Package for Characterizing and Comparing of Annotated Biological Networks

    Directory of Open Access Journals (Sweden)

    Yang Cao

    2017-01-01

    Full Text Available Large amounts of various biological networks exist for representing different types of interaction data, such as genetic, metabolic, gene regulatory, and protein-protein relationships. Recent approaches on biological network study are based on different mathematical concepts. It is necessary to construct a uniform framework to judge the functionality of biological networks. We recently introduced a knowledge-based computational framework that reliably characterized biological networks in system level. The method worked by making systematic comparisons to a set of well-studied “basic networks,” measuring both the functional and topological similarities. A biological network could be characterized as a spectrum-like vector consisting of similarities to basic networks. Here, to facilitate the application, development, and adoption of this framework, we present an R package called NFP. This package extends our previous pipeline, offering a powerful set of functions for Network Fingerprint analysis. The software shows great potential in biological network study. The open source NFP R package is freely available under the GNU General Public License v2.0 at CRAN along with the vignette.

  15. Development and Characterization of Microsatellite Markers from the Transcriptome of Firmiana danxiaensis (Malvaceae s.l.

    Directory of Open Access Journals (Sweden)

    Qiang Fan

    2013-11-01

    Full Text Available Premise of the study: Firmiana consists of 12–16 species, many of which are narrow endemics. Expressed sequence tag (EST–simple sequence repeat (SSR markers were developed and characterized for size polymorphism in four Firmiana species. Methods and Results: A total of 102 EST-SSR primer pairs were designed based on the transcriptome sequences of F. danxiaensis; these were then characterized in four Firmiana species—F. danxiaensis, F. kwangsiensis, F. hainanensis, and F. simplex. In these four species, 17 primer pairs were successfully amplified, and 14 were polymorphic in at least one species. The number of alleles ranged from one to 13, and the observed and expected heterozygosities ranged from 0 to 1 and 0 to 0.925, respectively. The lowest level of polymorphism was observed in F. danxiaensis. Conclusions: These polymorphic EST-SSR markers are valuable for conservation genetics studies in the endangered Firmiana species.

  16. Characterization and Development of EST-SSRs by Deep Transcriptome Sequencing in Chinese Cabbage (Brassica rapa L. ssp. pekinensis

    Directory of Open Access Journals (Sweden)

    Qian Ding

    2015-01-01

    Full Text Available Simple sequence repeats (SSRs are among the most important markers for population analysis and have been widely used in plant genetic mapping and molecular breeding. Expressed sequence tag-SSR (EST-SSR markers, located in the coding regions, are potentially more efficient for QTL mapping, gene targeting, and marker-assisted breeding. In this study, we investigated 51,694 nonredundant unigenes, assembled from clean reads from deep transcriptome sequencing with a Solexa/Illumina platform, for identification and development of EST-SSRs in Chinese cabbage. In total, 10,420 EST-SSRs with over 12 bp were identified and characterized, among which 2744 EST-SSRs are new and 2317 are known ones showing polymorphism with previously reported SSRs. A total of 7877 PCR primer pairs for 1561 EST-SSR loci were designed, and primer pairs for twenty-four EST-SSRs were selected for primer evaluation. In nineteen EST-SSR loci (79.2%, amplicons were successfully generated with high quality. Seventeen (89.5% showed polymorphism in twenty-four cultivars of Chinese cabbage. The polymorphic alleles of each polymorphic locus were sequenced, and the results showed that most polymorphisms were due to variations of SSR repeat motifs. The EST-SSRs identified and characterized in this study have important implications for developing new tools for genetics and molecular breeding in Chinese cabbage.

  17. Annotated bibliography

    International Nuclear Information System (INIS)

    1997-08-01

    Under a cooperative agreement with the U.S. Department of Energy's Office of Science and Technology, Waste Policy Institute (WPI) is conducting a five-year research project to develop a research-based approach for integrating communication products in stakeholder involvement related to innovative technology. As part of the research, WPI developed this annotated bibliography which contains almost 100 citations of articles/books/resources involving topics related to communication and public involvement aspects of deploying innovative cleanup technology. To compile the bibliography, WPI performed on-line literature searches (e.g., Dialog, International Association of Business Communicators Public Relations Society of America, Chemical Manufacturers Association, etc.), consulted past years proceedings of major environmental waste cleanup conferences (e.g., Waste Management), networked with professional colleagues and DOE sites to gather reports or case studies, and received input during the August 1996 Research Design Team meeting held to discuss the project's research methodology. Articles were selected for annotation based upon their perceived usefulness to the broad range of public involvement and communication practitioners

  18. Exploiting ''Subjective'' Annotations

    NARCIS (Netherlands)

    Reidsma, Dennis; op den Akker, Hendrikus J.A.; Artstein, R.; Boleda, G.; Keller, F.; Schulte im Walde, S.

    2008-01-01

    Many interesting phenomena in conversation can only be annotated as a subjective task, requiring interpretative judgements from annotators. This leads to data which is annotated with lower levels of agreement not only due to errors in the annotation, but also due to the differences in how annotators

  19. Building Simple Annotation Tools

    OpenAIRE

    Lin, Gordon

    2016-01-01

    The right annotation tool does not always exist for processing a particular natural language task. In these scenarios, researchers are required to build new annotation tools to fit the tasks at hand. However, developing new annotation tools is difficult and inefficient. There has not been careful consideration of software complexity in current annotation tools. Due to the problems of complexity, new annotation tools must reimplement common annotation features despite the availability of imple...

  20. Mining GO annotations for improving annotation consistency.

    Directory of Open Access Journals (Sweden)

    Daniel Faria

    Full Text Available Despite the structure and objectivity provided by the Gene Ontology (GO, the annotation of proteins is a complex task that is subject to errors and inconsistencies. Electronically inferred annotations in particular are widely considered unreliable. However, given that manual curation of all GO annotations is unfeasible, it is imperative to improve the quality of electronically inferred annotations. In this work, we analyze the full GO molecular function annotation of UniProtKB proteins, and discuss some of the issues that affect their quality, focusing particularly on the lack of annotation consistency. Based on our analysis, we estimate that 64% of the UniProtKB proteins are incompletely annotated, and that inconsistent annotations affect 83% of the protein functions and at least 23% of the proteins. Additionally, we present and evaluate a data mining algorithm, based on the association rule learning methodology, for identifying implicit relationships between molecular function terms. The goal of this algorithm is to assist GO curators in updating GO and correcting and preventing inconsistent annotations. Our algorithm predicted 501 relationships with an estimated precision of 94%, whereas the basic association rule learning methodology predicted 12,352 relationships with a precision below 9%.

  1. Phylogenetic molecular function annotation

    International Nuclear Information System (INIS)

    Engelhardt, Barbara E; Jordan, Michael I; Repo, Susanna T; Brenner, Steven E

    2009-01-01

    It is now easier to discover thousands of protein sequences in a new microbial genome than it is to biochemically characterize the specific activity of a single protein of unknown function. The molecular functions of protein sequences have typically been predicted using homology-based computational methods, which rely on the principle that homologous proteins share a similar function. However, some protein families include groups of proteins with different molecular functions. A phylogenetic approach for predicting molecular function (sometimes called 'phylogenomics') is an effective means to predict protein molecular function. These methods incorporate functional evidence from all members of a family that have functional characterizations using the evolutionary history of the protein family to make robust predictions for the uncharacterized proteins. However, they are often difficult to apply on a genome-wide scale because of the time-consuming step of reconstructing the phylogenies of each protein to be annotated. Our automated approach for function annotation using phylogeny, the SIFTER (Statistical Inference of Function Through Evolutionary Relationships) methodology, uses a statistical graphical model to compute the probabilities of molecular functions for unannotated proteins. Our benchmark tests showed that SIFTER provides accurate functional predictions on various protein families, outperforming other available methods.

  2. Annotating Coloured Petri Nets

    DEFF Research Database (Denmark)

    Lindstrøm, Bo; Wells, Lisa Marie

    2002-01-01

    a method which makes it possible to associate auxiliary information, called annotations, with tokens without modifying the colour sets of the CP-net. Annotations are pieces of information that are not essential for determining the behaviour of the system being modelled, but are rather added to support...... a certain use of the CP-net. We define the semantics of annotations by describing a translation from a CP-net and the corresponding annotation layers to another CP-net where the annotations are an integrated part of the CP-net....

  3. Ubiquitous Annotation Systems

    DEFF Research Database (Denmark)

    Hansen, Frank Allan

    2006-01-01

    Ubiquitous annotation systems allow users to annotate physical places, objects, and persons with digital information. Especially in the field of location based information systems much work has been done to implement adaptive and context-aware systems, but few efforts have focused on the general...... requirements for linking information to objects in both physical and digital space. This paper surveys annotation techniques from open hypermedia systems, Web based annotation systems, and mobile and augmented reality systems to illustrate different approaches to four central challenges ubiquitous annotation...... systems have to deal with: anchoring, structuring, presentation, and authoring. Through a number of examples each challenge is discussed and HyCon, a context-aware hypermedia framework developed at the University of Aarhus, Denmark, is used to illustrate an integrated approach to ubiquitous annotations...

  4. Swine transcriptome characterization by combined Iso-Seq and RNA-seq for annotating the emerging long read-based reference genome

    Science.gov (United States)

    PacBio long-read sequencing technology is increasingly popular in genome sequence assembly and transcriptome cataloguing. Recently, a new-generation pig reference genome was assembled based on long reads from this technology. To finely annotate this genome assembly, transcriptomes of nine tissues fr...

  5. Development and characterization of a Psathyrostachys huashanica Keng 7Ns chromosome addition line with leaf rust resistance.

    Directory of Open Access Journals (Sweden)

    Wanli Du

    Full Text Available The aim of this study was to characterize a Triticum aestivum-Psathyrostachys huashanica Keng (2n = 2x = 14, NsNs disomic addition line 2-1-6-3. Individual line 2-1-6-3 plants were analyzed using cytological, genomic in situ hybridization (GISH, EST-SSR, and EST-STS techniques. The alien addition line 2-1-6-3 was shown to have two P. huashanica chromosomes, with a meiotic configuration of 2n = 44 = 22 II. We tested 55 EST-SSR and 336 EST-STS primer pairs that mapped onto seven different wheat chromosomes using DNA from parents and the P. huashanica addition line. One EST-SSR and nine EST-STS primer pairs indicated that the additional chromosome of P. huashanica belonged to homoeologous group 7, the diagnostic fragments of five EST-STS markers (BE404955, BE591127, BE637663, BF482781 and CD452422 were cloned, sequenced and compared. The results showed that the amplified polymorphic bands of P. huashanica and disomic addition line 2-1-6-3 shared 100% sequence identity, which was designated as the 7Ns disomic addition line. Disomic addition line 2-1-6-3 was evaluated to test the leaf rust resistance of adult stages in the field. We found that one pair of the 7Ns genome chromosomes carried new leaf rust resistance gene(s. Moreover, wheat line 2-1-6-3 had a superior numbers of florets and grains per spike, which were associated with the introgression of the paired P. huashanica chromosomes. These high levels of disease resistance and stable, excellent agronomic traits suggest that this line could be utilized as a novel donor in wheat breeding programs.

  6. Genome-wide annotation of the soybean WRKY family and functional characterization of genes involved in response to Phakopsora pachyrhizi infection.

    Science.gov (United States)

    Bencke-Malato, Marta; Cabreira, Caroline; Wiebke-Strohm, Beatriz; Bücker-Neto, Lauro; Mancini, Estefania; Osorio, Marina B; Homrich, Milena S; Turchetto-Zolet, Andreia Carina; De Carvalho, Mayra C C G; Stolf, Renata; Weber, Ricardo L M; Westergaard, Gastón; Castagnaro, Atílio P; Abdelnoor, Ricardo V; Marcelino-Guimarães, Francismar C; Margis-Pinheiro, Márcia; Bodanese-Zanettini, Maria Helena

    2014-09-10

    Many previous studies have shown that soybean WRKY transcription factors are involved in the plant response to biotic and abiotic stresses. Phakopsora pachyrhizi is the causal agent of Asian Soybean Rust, one of the most important soybean diseases. There are evidences that WRKYs are involved in the resistance of some soybean genotypes against that fungus. The number of WRKY genes already annotated in soybean genome was underrepresented. In the present study, a genome-wide annotation of the soybean WRKY family was carried out and members involved in the response to P. pachyrhizi were identified. As a result of a soybean genomic databases search, 182 WRKY-encoding genes were annotated and 33 putative pseudogenes identified. Genes involved in the response to P. pachyrhizi infection were identified using superSAGE, RNA-Seq of microdissected lesions and microarray experiments. Seventy-five genes were differentially expressed during fungal infection. The expression of eight WRKY genes was validated by RT-qPCR. The expression of these genes in a resistant genotype was earlier and/or stronger compared with a susceptible genotype in response to P. pachyrhizi infection. Soybean somatic embryos were transformed in order to overexpress or silence WRKY genes. Embryos overexpressing a WRKY gene were obtained, but they were unable to convert into plants. When infected with P. pachyrhizi, the leaves of the silenced transgenic line showed a higher number of lesions than the wild-type plants. The present study reports a genome-wide annotation of soybean WRKY family. The participation of some members in response to P. pachyrhizi infection was demonstrated. The results contribute to the elucidation of gene function and suggest the manipulation of WRKYs as a strategy to increase fungal resistance in soybean plants.

  7. Contributions to In Silico Genome Annotation

    KAUST Repository

    Kalkatawi, Manal M.

    2017-11-30

    Genome annotation is an important topic since it provides information for the foundation of downstream genomic and biological research. It is considered as a way of summarizing part of existing knowledge about the genomic characteristics of an organism. Annotating different regions of a genome sequence is known as structural annotation, while identifying functions of these regions is considered as a functional annotation. In silico approaches can facilitate both tasks that otherwise would be difficult and timeconsuming. This study contributes to genome annotation by introducing several novel bioinformatics methods, some based on machine learning (ML) approaches. First, we present Dragon PolyA Spotter (DPS), a method for accurate identification of the polyadenylation signals (PAS) within human genomic DNA sequences. For this, we derived a novel feature-set able to characterize properties of the genomic region surrounding the PAS, enabling development of high accuracy optimized ML predictive models. DPS considerably outperformed the state-of-the-art results. The second contribution concerns developing generic models for structural annotation, i.e., the recognition of different genomic signals and regions (GSR) within eukaryotic DNA. We developed DeepGSR, a systematic framework that facilitates generating ML models to predict GSR with high accuracy. To the best of our knowledge, no available generic and automated method exists for such task that could facilitate the studies of newly sequenced organisms. The prediction module of DeepGSR uses deep learning algorithms to derive highly abstract features that depend mainly on proper data representation and hyperparameters calibration. DeepGSR, which was evaluated on recognition of PAS and translation initiation sites (TIS) in different organisms, yields a simpler and more precise representation of the problem under study, compared to some other hand-tailored models, while producing high accuracy prediction results. Finally

  8. Development of genomic SSR and potential EST-SSR markers in ...

    African Journals Online (AJOL)

    PRECIOUS

    2009-11-16

    Nov 16, 2009 ... Tel: +86-10-62818841. Var. parvifolia Shan et Y. Li, B. marginatum Wall. ex DC.,. B. bicaule Helm. and B. scorzonerifolium Willd. var. angustissimum (Franch.) Huang (Song, 2002). Many studies have shown the active component, such as saikosaponins, volatile oils and polysaccharides, varied remarkably ...

  9. Development of genomic SSR and potential EST-SSR markers in ...

    African Journals Online (AJOL)

    PRECIOUS

    2009-11-16

    Nov 16, 2009 ... CH025G05-1. (AAG)5. F: CTCCATTCCTCCTTTGTTAGTC. 166. 57. GR308118. R: TGAACCGAATCTATTGGGTGAAA. CH025G05-2. (ATC)5. F: CAATAGATTCGGTTCAAGTTCAG. 318. 54. GR308118. R: ATCAAAGCAAAGGTGGCAAAT. CH026D06. (AAG)5+(AAG)6. F: TTGGGCATGACAATCACAGAA. 220.

  10. Semantic annotation of mutable data.

    Directory of Open Access Journals (Sweden)

    Robert A Morris

    Full Text Available Electronic annotation of scientific data is very similar to annotation of documents. Both types of annotation amplify the original object, add related knowledge to it, and dispute or support assertions in it. In each case, annotation is a framework for discourse about the original object, and, in each case, an annotation needs to clearly identify its scope and its own terminology. However, electronic annotation of data differs from annotation of documents: the content of the annotations, including expectations and supporting evidence, is more often shared among members of networks. Any consequent actions taken by the holders of the annotated data could be shared as well. But even those current annotation systems that admit data as their subject often make it difficult or impossible to annotate at fine-enough granularity to use the results in this way for data quality control. We address these kinds of issues by offering simple extensions to an existing annotation ontology and describe how the results support an interest-based distribution of annotations. We are using the result to design and deploy a platform that supports annotation services overlaid on networks of distributed data, with particular application to data quality control. Our initial instance supports a set of natural science collection metadata services. An important application is the support for data quality control and provision of missing data. A previous proof of concept demonstrated such use based on data annotations modeled with XML-Schema.

  11. Annotating individual human genomes.

    Science.gov (United States)

    Torkamani, Ali; Scott-Van Zeeland, Ashley A; Topol, Eric J; Schork, Nicholas J

    2011-10-01

    Advances in DNA sequencing technologies have made it possible to rapidly, accurately and affordably sequence entire individual human genomes. As impressive as this ability seems, however, it will not likely amount to much if one cannot extract meaningful information from individual sequence data. Annotating variations within individual genomes and providing information about their biological or phenotypic impact will thus be crucially important in moving individual sequencing projects forward, especially in the context of the clinical use of sequence information. In this paper we consider the various ways in which one might annotate individual sequence variations and point out limitations in the available methods for doing so. It is arguable that, in the foreseeable future, DNA sequencing of individual genomes will become routine for clinical, research, forensic, and personal purposes. We therefore also consider directions and areas for further research in annotating genomic variants. Copyright © 2011 Elsevier Inc. All rights reserved.

  12. ANNOTATING INDIVIDUAL HUMAN GENOMES*

    Science.gov (United States)

    Torkamani, Ali; Scott-Van Zeeland, Ashley A.; Topol, Eric J.; Schork, Nicholas J.

    2014-01-01

    Advances in DNA sequencing technologies have made it possible to rapidly, accurately and affordably sequence entire individual human genomes. As impressive as this ability seems, however, it will not likely to amount to much if one cannot extract meaningful information from individual sequence data. Annotating variations within individual genomes and providing information about their biological or phenotypic impact will thus be crucially important in moving individual sequencing projects forward, especially in the context of the clinical use of sequence information. In this paper we consider the various ways in which one might annotate individual sequence variations and point out limitations in the available methods for doing so. It is arguable that, in the foreseeable future, DNA sequencing of individual genomes will become routine for clinical, research, forensic, and personal purposes. We therefore also consider directions and areas for further research in annotating genomic variants. PMID:21839162

  13. GSV Annotated Bibliography

    Energy Technology Data Exchange (ETDEWEB)

    Roberts, Randy S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Pope, Paul A. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Jiang, Ming [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Trucano, Timothy G. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Aragon, Cecilia R. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Ni, Kevin [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Wei, Thomas [Argonne National Lab. (ANL), Argonne, IL (United States); Chilton, Lawrence K. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Bakel, Alan [Argonne National Lab. (ANL), Argonne, IL (United States)

    2010-09-14

    The following annotated bibliography was developed as part of the geospatial algorithm verification and validation (GSV) project for the Simulation, Algorithms and Modeling program of NA-22. Verification and Validation of geospatial image analysis algorithms covers a wide range of technologies. Papers in the bibliography are thus organized into the following five topic areas: Image processing and analysis, usability and validation of geospatial image analysis algorithms, image distance measures, scene modeling and image rendering, and transportation simulation models. Many other papers were studied during the course of the investigation including. The annotations for these articles can be found in the paper "On the verification and validation of geospatial image analysis algorithms".

  14. Improving microbial genome annotations in an integrated database context.

    Directory of Open Access Journals (Sweden)

    I-Min A Chen

    Full Text Available Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/.

  15. Annotating Emotions in Meetings

    NARCIS (Netherlands)

    Reidsma, Dennis; Heylen, Dirk K.J.; Ordelman, Roeland J.F.

    We present the results of two trials testing procedures for the annotation of emotion and mental state of the AMI corpus. The first procedure is an adaptation of the FeelTrace method, focusing on a continuous labelling of emotion dimensions. The second method is centered around more discrete

  16. Annotation of Regular Polysemy

    DEFF Research Database (Denmark)

    Martinez Alonso, Hector

    Regular polysemy has received a lot of attention from the theory of lexical semantics and from computational linguistics. However, there is no consensus on how to represent the sense of underspecified examples at the token level, namely when annotating or disambiguating senses of metonymic words...

  17. Ion implantation: an annotated bibliography

    International Nuclear Information System (INIS)

    Ting, R.N.; Subramanyam, K.

    1975-10-01

    Ion implantation is a technique for introducing controlled amounts of dopants into target substrates, and has been successfully used for the manufacture of silicon semiconductor devices. Ion implantation is superior to other methods of doping such as thermal diffusion and epitaxy, in view of its advantages such as high degree of control, flexibility, and amenability to automation. This annotated bibliography of 416 references consists of journal articles, books, and conference papers in English and foreign languages published during 1973-74, on all aspects of ion implantation including range distribution and concentration profile, channeling, radiation damage and annealing, compound semiconductors, structural and electrical characterization, applications, equipment and ion sources. Earlier bibliographies on ion implantation, and national and international conferences in which papers on ion implantation were presented have also been listed separately

  18. GSV Annotated Bibliography

    Energy Technology Data Exchange (ETDEWEB)

    Roberts, Randy S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Pope, Paul A. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Jiang, Ming [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Trucano, Timothy G. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Aragon, Cecilia R. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Ni, Kevin [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Wei, Thomas [Argonne National Lab. (ANL), Argonne, IL (United States); Chilton, Lawrence K. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Bakel, Alan [Argonne National Lab. (ANL), Argonne, IL (United States)

    2011-06-14

    The following annotated bibliography was developed as part of the Geospatial Algorithm Veri cation and Validation (GSV) project for the Simulation, Algorithms and Modeling program of NA-22. Veri cation and Validation of geospatial image analysis algorithms covers a wide range of technologies. Papers in the bibliography are thus organized into the following ve topic areas: Image processing and analysis, usability and validation of geospatial image analysis algorithms, image distance measures, scene modeling and image rendering, and transportation simulation models.

  19. Diverse Image Annotation

    KAUST Repository

    Wu, Baoyuan

    2017-11-09

    In this work we study the task of image annotation, of which the goal is to describe an image using a few tags. Instead of predicting the full list of tags, here we target for providing a short list of tags under a limited number (e.g., 3), to cover as much information as possible of the image. The tags in such a short list should be representative and diverse. It means they are required to be not only corresponding to the contents of the image, but also be different to each other. To this end, we treat the image annotation as a subset selection problem based on the conditional determinantal point process (DPP) model, which formulates the representation and diversity jointly. We further explore the semantic hierarchy and synonyms among the candidate tags, and require that two tags in a semantic hierarchy or in a pair of synonyms should not be selected simultaneously. This requirement is then embedded into the sampling algorithm according to the learned conditional DPP model. Besides, we find that traditional metrics for image annotation (e.g., precision, recall and F1 score) only consider the representation, but ignore the diversity. Thus we propose new metrics to evaluate the quality of the selected subset (i.e., the tag list), based on the semantic hierarchy and synonyms. Human study through Amazon Mechanical Turk verifies that the proposed metrics are more close to the humans judgment than traditional metrics. Experiments on two benchmark datasets show that the proposed method can produce more representative and diverse tags, compared with existing image annotation methods.

  20. A Factor Graph Approach to Automated GO Annotation.

    Directory of Open Access Journals (Sweden)

    Flavio E Spetale

    Full Text Available As volume of genomic data grows, computational methods become essential for providing a first glimpse onto gene annotations. Automated Gene Ontology (GO annotation methods based on hierarchical ensemble classification techniques are particularly interesting when interpretability of annotation results is a main concern. In these methods, raw GO-term predictions computed by base binary classifiers are leveraged by checking the consistency of predefined GO relationships. Both formal leveraging strategies, with main focus on annotation precision, and heuristic alternatives, with main focus on scalability issues, have been described in literature. In this contribution, a factor graph approach to the hierarchical ensemble formulation of the automated GO annotation problem is presented. In this formal framework, a core factor graph is first built based on the GO structure and then enriched to take into account the noisy nature of GO-term predictions. Hence, starting from raw GO-term predictions, an iterative message passing algorithm between nodes of the factor graph is used to compute marginal probabilities of target GO-terms. Evaluations on Saccharomyces cerevisiae, Arabidopsis thaliana and Drosophila melanogaster protein sequences from the GO Molecular Function domain showed significant improvements over competing approaches, even when protein sequences were naively characterized by their physicochemical and secondary structure properties or when loose noisy annotation datasets were considered. Based on these promising results and using Arabidopsis thaliana annotation data, we extend our approach to the identification of most promising molecular function annotations for a set of proteins of unknown function in Solanum lycopersicum.

  1. Automating Ontological Annotation with WordNet

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Tratz, Stephen C.; Gregory, Michelle L.; Chappell, Alan R.; Whitney, Paul D.; Posse, Christian; Paulson, Patrick R.; Baddeley, Bob L.; Hohimer, Ryan E.; White, Amanda M.

    2006-01-22

    Semantic Web applications require robust and accurate annotation tools that are capable of automating the assignment of ontological classes to words in naturally occurring text (ontological annotation). Most current ontologies do not include rich lexical databases and are therefore not easily integrated with word sense disambiguation algorithms that are needed to automate ontological annotation. WordNet provides a potentially ideal solution to this problem as it offers a highly structured lexical conceptual representation that has been extensively used to develop word sense disambiguation algorithms. However, WordNet has not been designed as an ontology, and while it can be easily turned into one, the result of doing this would present users with serious practical limitations due to the great number of concepts (synonym sets) it contains. Moreover, mapping WordNet to an existing ontology may be difficult and requires substantial labor. We propose to overcome these limitations by developing an analytical platform that (1) provides a WordNet-based ontology offering a manageable and yet comprehensive set of concept classes, (2) leverages the lexical richness of WordNet to give an extensive characterization of concept class in terms of lexical instances, and (3) integrates a class recognition algorithm that automates the assignment of concept classes to words in naturally occurring text. The ensuing framework makes available an ontological annotation platform that can be effectively integrated with intelligence analysis systems to facilitate evidence marshaling and sustain the creation and validation of inference models.

  2. Ontological Annotation with WordNet

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Tratz, Stephen C.; Gregory, Michelle L.; Chappell, Alan R.; Whitney, Paul D.; Posse, Christian; Paulson, Patrick R.; Baddeley, Bob; Hohimer, Ryan E.; White, Amanda M.

    2006-06-06

    Semantic Web applications require robust and accurate annotation tools that are capable of automating the assignment of ontological classes to words in naturally occurring text (ontological annotation). Most current ontologies do not include rich lexical databases and are therefore not easily integrated with word sense disambiguation algorithms that are needed to automate ontological annotation. WordNet provides a potentially ideal solution to this problem as it offers a highly structured lexical conceptual representation that has been extensively used to develop word sense disambiguation algorithms. However, WordNet has not been designed as an ontology, and while it can be easily turned into one, the result of doing this would present users with serious practical limitations due to the great number of concepts (synonym sets) it contains. Moreover, mapping WordNet to an existing ontology may be difficult and requires substantial labor. We propose to overcome these limitations by developing an analytical platform that (1) provides a WordNet-based ontology offering a manageable and yet comprehensive set of concept classes, (2) leverages the lexical richness of WordNet to give an extensive characterization of concept class in terms of lexical instances, and (3) integrates a class recognition algorithm that automates the assignment of concept classes to words in naturally occurring text. The ensuing framework makes available an ontological annotation platform that can be effectively integrated with intelligence analysis systems to facilitate evidence marshaling and sustain the creation and validation of inference models.

  3. Impingement: an annotated bibliography

    International Nuclear Information System (INIS)

    Uziel, M.S.; Hannon, E.H.

    1979-04-01

    This bibliography of 655 annotated references on impingement of aquatic organisms at intake structures of thermal-power-plant cooling systems was compiled from the published and unpublished literature. The bibliography includes references from 1928 to 1978 on impingement monitoring programs; impingement impact assessment; applicable law; location and design of intake structures, screens, louvers, and other barriers; fish behavior and swim speed as related to impingement susceptibility; and the effects of light, sound, bubbles, currents, and temperature on fish behavior. References are arranged alphabetically by author or corporate author. Indexes are provided for author, keywords, subject category, geographic location, taxon, and title

  4. Detection and Characterization of Engineered Nanomaterials in the Environment: Current State-of-the-art and Future Directions Report, Annotated Bibliography, and Image Library

    Science.gov (United States)

    The increasing manufacture and implementation of engineered nanomaterials (ENMs) will continue to lead to the release of these materials into the environment. Reliably assessing the environmental exposure risk of ENMs will depend highly on the ability to quantify and characterize...

  5. Predicting word sense annotation agreement

    DEFF Research Database (Denmark)

    Martinez Alonso, Hector; Johannsen, Anders Trærup; Lopez de Lacalle, Oier

    2015-01-01

    High agreement is a common objective when annotating data for word senses. However, a number of factors make perfect agreement impossible, e.g. the limitations of the sense inventories, the difficulty of the examples or the interpretation preferences of the annotations. Estimating potential...... agreement is thus a relevant task to supplement the evaluation of sense annotations. In this article we propose two methods to predict agreement on word-annotation instances. We experiment with a continuous representation and a three-way discretization of observed agreement. In spite of the difficulty...

  6. Annotation in Digital Scholarly Editions

    NARCIS (Netherlands)

    Boot, P.; Haentjens Dekker, R.

    2016-01-01

    Annotation in digital scholarly editions (of historical documents, literary works, letters, etc.) has long been recognized as an important desideratum, but has also proven to be an elusive ideal. In so far as annotation functionality is available, it is usually developed for a single edition and

  7. Mesotext. Framing and exploring annotations

    NARCIS (Netherlands)

    Boot, P.; Boot, P.; Stronks, E.

    2007-01-01

    From the introduction: Annotation is an important item on the wish list for digital scholarly tools. It is one of John Unsworth’s primitives of scholarship (Unsworth 2000). Especially in linguistics,a number of tools have been developed that facilitate the creation of annotations to source material

  8. Applied bioinformatics: Genome annotation and transcriptome analysis

    DEFF Research Database (Denmark)

    Gupta, Vikas

    and dhurrin, which have not previously been characterized in blueberries. There are more than 44,500 spider species with distinct habitats and unique characteristics. Spiders are masters of producing silk webs to catch prey and using venom to neutralize. The exploration of the genetics behind these properties...... japonicus (Lotus), Vaccinium corymbosum (blueberry), Stegodyphus mimosarum (spider) and Trifolium occidentale (clover). From a bioinformatics data analysis perspective, my work can be divided into three parts; genome annotation, small RNA, and gene expression analysis. Lotus is a legume of significant...... has just started. We have assembled and annotated the first two spider genomes to facilitate our understanding of spiders at the molecular level. The need for analyzing the large and increasing amount of sequencing data has increased the demand for efficient, user friendly, and broadly applicable...

  9. Comparative genomic mapping of the bovine Fragile Histidine Triad (FHIT tumour suppressor gene: characterization of a 2 Mb BAC contig covering the locus, complete annotation of the gene, analysis of cDNA and of physiological expression profiles

    Directory of Open Access Journals (Sweden)

    Boussaha Mekki

    2006-05-01

    Full Text Available Abstract Background The Fragile Histidine Triad gene (FHIT is an oncosuppressor implicated in many human cancers, including vesical tumors. FHIT is frequently hit by deletions caused by fragility at FRA3B, the most active of human common fragile sites, where FHIT lays. Vesical tumors affect also cattle, including animals grazing in the wild on bracken fern; compounds released by the fern are known to induce chromosome fragility and may trigger cancer with the interplay of latent Papilloma virus. Results The bovine FHIT was characterized by assembling a contig of 78 BACs. Sequence tags were designed on human exons and introns and used directly to select bovine BACs, or compared with sequence data in the bovine genome database or in the trace archive of the bovine genome sequencing project, and adapted before use. FHIT is split in ten exons like in man, with exons 5 to 9 coding for a 149 amino acids protein. VISTA global alignments between bovine genomic contigs retrieved from the bovine genome database and the human FHIT region were performed. Conservation was extremely high over a 2 Mb region spanning the whole FHIT locus, including the size of introns. Thus, the bovine FHIT covers about 1.6 Mb compared to 1.5 Mb in man. Expression was analyzed by RT-PCR and Northern blot, and was found to be ubiquitous. Four cDNA isoforms were isolated and sequenced, that originate from an alternative usage of three variants of exon 4, revealing a size very close to the major human FHIT cDNAs. Conclusion A comparative genomic approach allowed to assemble a contig of 78 BACs and to completely annotate a 1.6 Mb region spanning the bovine FHIT gene. The findings confirmed the very high level of conservation between human and bovine genomes and the importance of comparative mapping to speed the annotation process of the recently sequenced bovine genome. The detailed knowledge of the genomic FHIT region will allow to study the role of FHIT in bovine cancerogenesis

  10. Gene Ontology annotations and resources.

    Science.gov (United States)

    Blake, J A; Dolan, M; Drabkin, H; Hill, D P; Li, Ni; Sitnikov, D; Bridges, S; Burgess, S; Buza, T; McCarthy, F; Peddinti, D; Pillai, L; Carbon, S; Dietze, H; Ireland, A; Lewis, S E; Mungall, C J; Gaudet, P; Chrisholm, R L; Fey, P; Kibbe, W A; Basu, S; Siegele, D A; McIntosh, B K; Renfro, D P; Zweifel, A E; Hu, J C; Brown, N H; Tweedie, S; Alam-Faruque, Y; Apweiler, R; Auchinchloss, A; Axelsen, K; Bely, B; Blatter, M -C; Bonilla, C; Bouguerleret, L; Boutet, E; Breuza, L; Bridge, A; Chan, W M; Chavali, G; Coudert, E; Dimmer, E; Estreicher, A; Famiglietti, L; Feuermann, M; Gos, A; Gruaz-Gumowski, N; Hieta, R; Hinz, C; Hulo, C; Huntley, R; James, J; Jungo, F; Keller, G; Laiho, K; Legge, D; Lemercier, P; Lieberherr, D; Magrane, M; Martin, M J; Masson, P; Mutowo-Muellenet, P; O'Donovan, C; Pedruzzi, I; Pichler, K; Poggioli, D; Porras Millán, P; Poux, S; Rivoire, C; Roechert, B; Sawford, T; Schneider, M; Stutz, A; Sundaram, S; Tognolli, M; Xenarios, I; Foulgar, R; Lomax, J; Roncaglia, P; Khodiyar, V K; Lovering, R C; Talmud, P J; Chibucos, M; Giglio, M Gwinn; Chang, H -Y; Hunter, S; McAnulla, C; Mitchell, A; Sangrador, A; Stephan, R; Harris, M A; Oliver, S G; Rutherford, K; Wood, V; Bahler, J; Lock, A; Kersey, P J; McDowall, D M; Staines, D M; Dwinell, M; Shimoyama, M; Laulederkind, S; Hayman, T; Wang, S -J; Petri, V; Lowry, T; D'Eustachio, P; Matthews, L; Balakrishnan, R; Binkley, G; Cherry, J M; Costanzo, M C; Dwight, S S; Engel, S R; Fisk, D G; Hitz, B C; Hong, E L; Karra, K; Miyasato, S R; Nash, R S; Park, J; Skrzypek, M S; Weng, S; Wong, E D; Berardini, T Z; Huala, E; Mi, H; Thomas, P D; Chan, J; Kishore, R; Sternberg, P; Van Auken, K; Howe, D; Westerfield, M

    2013-01-01

    The Gene Ontology (GO) Consortium (GOC, http://www.geneontology.org) is a community-based bioinformatics resource that classifies gene product function through the use of structured, controlled vocabularies. Over the past year, the GOC has implemented several processes to increase the quantity, quality and specificity of GO annotations. First, the number of manual, literature-based annotations has grown at an increasing rate. Second, as a result of a new 'phylogenetic annotation' process, manually reviewed, homology-based annotations are becoming available for a broad range of species. Third, the quality of GO annotations has been improved through a streamlined process for, and automated quality checks of, GO annotations deposited by different annotation groups. Fourth, the consistency and correctness of the ontology itself has increased by using automated reasoning tools. Finally, the GO has been expanded not only to cover new areas of biology through focused interaction with experts, but also to capture greater specificity in all areas of the ontology using tools for adding new combinatorial terms. The GOC works closely with other ontology developers to support integrated use of terminologies. The GOC supports its user community through the use of e-mail lists, social media and web-based resources.

  11. Chado controller: advanced annotation management with a community annotation system.

    Science.gov (United States)

    Guignon, Valentin; Droc, Gaëtan; Alaux, Michael; Baurens, Franc-Christophe; Garsmeur, Olivier; Poiron, Claire; Carver, Tim; Rouard, Mathieu; Bocs, Stéphanie

    2012-04-01

    We developed a controller that is compliant with the Chado database schema, GBrowse and genome annotation-editing tools such as Artemis and Apollo. It enables the management of public and private data, monitors manual annotation (with controlled vocabularies, structural and functional annotation controls) and stores versions of annotation for all modified features. The Chado controller uses PostgreSQL and Perl. The Chado Controller package is available for download at http://www.gnpannot.org/content/chado-controller and runs on any Unix-like operating system, and documentation is available at http://www.gnpannot.org/content/chado-controller-doc The system can be tested using the GNPAnnot Sandbox at http://www.gnpannot.org/content/gnpannot-sandbox-form valentin.guignon@cirad.fr; stephanie.sidibe-bocs@cirad.fr Supplementary data are available at Bioinformatics online.

  12. Automated genome sequence analysis and annotation.

    Science.gov (United States)

    Andrade, M A; Brown, N P; Leroy, C; Hoersch, S; de Daruvar, A; Reich, C; Franchini, A; Tamames, J; Valencia, A; Ouzounis, C; Sander, C

    1999-05-01

    Large-scale genome projects generate a rapidly increasing number of sequences, most of them biochemically uncharacterized. Research in bioinformatics contributes to the development of methods for the computational characterization of these sequences. However, the installation and application of these methods require experience and are time consuming. We present here an automatic system for preliminary functional annotation of protein sequences that has been applied to the analysis of sets of sequences from complete genomes, both to refine overall performance and to make new discoveries comparable to those made by human experts. The GeneQuiz system includes a Web-based browser that allows examination of the evidence leading to an automatic annotation and offers additional information, views of the results, and links to biological databases that complement the automatic analysis. System structure and operating principles concerning the use of multiple sequence databases, underlying sequence analysis tools, lexical analyses of database annotations and decision criteria for functional assignments are detailed. The system makes automatic quality assessments of results based on prior experience with the underlying sequence analysis tools; overall error rates in functional assignment are estimated at 2.5-5% for cases annotated with highest reliability ('clear' cases). Sources of over-interpretation of results are discussed with proposals for improvement. A conservative definition for reporting 'new findings' that takes account of database maturity is presented along with examples of possible kinds of discoveries (new function, family and superfamily) made by the system. System performance in relation to sequence database coverage, database dynamics and database search methods is analysed, demonstrating the inherent advantages of an integrated automatic approach using multiple databases and search methods applied in an objective and repeatable manner. The GeneQuiz system

  13. A framework for annotating human genome in disease context.

    Science.gov (United States)

    Xu, Wei; Wang, Huisong; Cheng, Wenqing; Fu, Dong; Xia, Tian; Kibbe, Warren A; Lin, Simon M

    2012-01-01

    Identification of gene-disease association is crucial to understanding disease mechanism. A rapid increase in biomedical literatures, led by advances of genome-scale technologies, poses challenge for manually-curated-based annotation databases to characterize gene-disease associations effectively and timely. We propose an automatic method-The Disease Ontology Annotation Framework (DOAF) to provide a comprehensive annotation of the human genome using the computable Disease Ontology (DO), the NCBO Annotator service and NCBI Gene Reference Into Function (GeneRIF). DOAF can keep the resulting knowledgebase current by periodically executing automatic pipeline to re-annotate the human genome using the latest DO and GeneRIF releases at any frequency such as daily or monthly. Further, DOAF provides a computable and programmable environment which enables large-scale and integrative analysis by working with external analytic software or online service platforms. A user-friendly web interface (doa.nubic.northwestern.edu) is implemented to allow users to efficiently query, download, and view disease annotations and the underlying evidences.

  14. Annotation-based feature extraction from sets of SBML models.

    Science.gov (United States)

    Alm, Rebekka; Waltemath, Dagmar; Wolfien, Markus; Wolkenhauer, Olaf; Henkel, Ron

    2015-01-01

    Model repositories such as BioModels Database provide computational models of biological systems for the scientific community. These models contain rich semantic annotations that link model entities to concepts in well-established bio-ontologies such as Gene Ontology. Consequently, thematically similar models are likely to share similar annotations. Based on this assumption, we argue that semantic annotations are a suitable tool to characterize sets of models. These characteristics improve model classification, allow to identify additional features for model retrieval tasks, and enable the comparison of sets of models. In this paper we discuss four methods for annotation-based feature extraction from model sets. We tested all methods on sets of models in SBML format which were composed from BioModels Database. To characterize each of these sets, we analyzed and extracted concepts from three frequently used ontologies, namely Gene Ontology, ChEBI and SBO. We find that three out of the methods are suitable to determine characteristic features for arbitrary sets of models: The selected features vary depending on the underlying model set, and they are also specific to the chosen model set. We show that the identified features map on concepts that are higher up in the hierarchy of the ontologies than the concepts used for model annotations. Our analysis also reveals that the information content of concepts in ontologies and their usage for model annotation do not correlate. Annotation-based feature extraction enables the comparison of model sets, as opposed to existing methods for model-to-keyword comparison, or model-to-model comparison.

  15. Objective-guided image annotation.

    Science.gov (United States)

    Mao, Qi; Tsang, Ivor Wai-Hung; Gao, Shenghua

    2013-04-01

    Automatic image annotation, which is usually formulated as a multi-label classification problem, is one of the major tools used to enhance the semantic understanding of web images. Many multimedia applications (e.g., tag-based image retrieval) can greatly benefit from image annotation. However, the insufficient performance of image annotation methods prevents these applications from being practical. On the other hand, specific measures are usually designed to evaluate how well one annotation method performs for a specific objective or application, but most image annotation methods do not consider optimization of these measures, so that they are inevitably trapped into suboptimal performance of these objective-specific measures. To address this issue, we first summarize a variety of objective-guided performance measures under a unified representation. Our analysis reveals that macro-averaging measures are very sensitive to infrequent keywords, and hamming measure is easily affected by skewed distributions. We then propose a unified multi-label learning framework, which directly optimizes a variety of objective-specific measures of multi-label learning tasks. Specifically, we first present a multilayer hierarchical structure of learning hypotheses for multi-label problems based on which a variety of loss functions with respect to objective-guided measures are defined. And then, we formulate these loss functions as relaxed surrogate functions and optimize them by structural SVMs. According to the analysis of various measures and the high time complexity of optimizing micro-averaging measures, in this paper, we focus on example-based measures that are tailor-made for image annotation tasks but are seldom explored in the literature. Experiments show consistency with the formal analysis on two widely used multi-label datasets, and demonstrate the superior performance of our proposed method over state-of-the-art baseline methods in terms of example-based measures on four

  16. Image annotation under X Windows

    Science.gov (United States)

    Pothier, Steven

    1991-08-01

    A mechanism for attaching graphic and overlay annotation to multiple bits/pixel imagery while providing levels of performance approaching that of native mode graphics systems is presented. This mechanism isolates programming complexity from the application programmer through software encapsulation under the X Window System. It ensures display accuracy throughout operations on the imagery and annotation including zooms, pans, and modifications of the annotation. Trade-offs that affect speed of display, consumption of memory, and system functionality are explored. The use of resource files to tune the display system is discussed. The mechanism makes use of an abstraction consisting of four parts; a graphics overlay, a dithered overlay, an image overly, and a physical display window. Data structures are maintained that retain the distinction between the four parts so that they can be modified independently, providing system flexibility. A unique technique for associating user color preferences with annotation is introduced. An interface that allows interactive modification of the mapping between image value and color is discussed. A procedure that provides for the colorization of imagery on 8-bit display systems using pixel dithering is explained. Finally, the application of annotation mechanisms to various applications is discussed.

  17. An automated annotation tool for genomic DNA sequences using ...

    Indian Academy of Sciences (India)

    Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated ...

  18. Early experiences with crowdsourcing airway annotations in chest CT

    DEFF Research Database (Denmark)

    Cheplygina, Veronika; Perez-Rovira, Adria; Kuo, Wieying

    2016-01-01

    Measuring airways in chest computed tomography (CT) images is important for characterizing diseases such as cystic fibrosis, yet very time-consuming to perform manually. Machine learning algorithms offer an alternative, but need large sets of annotated data to perform well. We investigate whether...

  19. An automated annotation tool for genomic DNA sequences using

    Indian Academy of Sciences (India)

    Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated ...

  20. Alignment-Annotator web server: rendering and annotating sequence alignments.

    Science.gov (United States)

    Gille, Christoph; Fähling, Michael; Weyand, Birgit; Wieland, Thomas; Gille, Andreas

    2014-07-01

    Alignment-Annotator is a novel web service designed to generate interactive views of annotated nucleotide and amino acid sequence alignments (i) de novo and (ii) embedded in other software. All computations are performed at server side. Interactivity is implemented in HTML5, a language native to web browsers. The alignment is initially displayed using default settings and can be modified with the graphical user interfaces. For example, individual sequences can be reordered or deleted using drag and drop, amino acid color code schemes can be applied and annotations can be added. Annotations can be made manually or imported (BioDAS servers, the UniProt, the Catalytic Site Atlas and the PDB). Some edits take immediate effect while others require server interaction and may take a few seconds to execute. The final alignment document can be downloaded as a zip-archive containing the HTML files. Because of the use of HTML the resulting interactive alignment can be viewed on any platform including Windows, Mac OS X, Linux, Android and iOS in any standard web browser. Importantly, no plugins nor Java are required and therefore Alignment-Anotator represents the first interactive browser-based alignment visualization. http://www.bioinformatics.org/strap/aa/ and http://strap.charite.de/aa/. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. Development and validation of 697 novel polymorphic genomic and EST-SSR markers in the American cranberry (Vaccinium macrocarpon Ait.).

    Science.gov (United States)

    Schlautman, Brandon; Fajardo, Diego; Bougie, Tierney; Wiesman, Eric; Polashock, James; Vorsa, Nicholi; Steffan, Shawn; Zalapa, Juan

    2015-01-27

    The American cranberry, Vaccinium macrocarpon Ait., is an economically important North American fruit crop that is consumed because of its unique flavor and potential health benefits. However, a lack of abundant, genome-wide molecular markers has limited the adoption of modern molecular assisted selection approaches in cranberry breeding programs. To increase the number of available markers in the species, this study identified, tested, and validated microsatellite markers from existing nuclear and transcriptome sequencing data. In total, new primers were designed, synthesized, and tested for 979 SSR loci; 697 of the markers amplified allele patterns consistent with single locus segregation in a diploid organism and were considered polymorphic. Of the 697 polymorphic loci, 507 were selected for additional genetic diversity and segregation analyses in 29 cranberry genotypes. More than 95% of the 507 loci did not display segregation distortion at the p 0.25. This comprehensive collection of developed and validated microsatellite loci represents a substantial addition to the molecular tools available for geneticists, genomicists, and breeders in cranberry and Vaccinium.

  2. Development and Validation of 697 Novel Polymorphic Genomic and EST-SSR Markers in the American Cranberry (Vaccinium macrocarpon Ait.

    Directory of Open Access Journals (Sweden)

    Brandon Schlautman

    2015-01-01

    Full Text Available The American cranberry, Vaccinium macrocarpon Ait., is an economically important North American fruit crop that is consumed because of its unique flavor and potential health benefits. However, a lack of abundant, genome-wide molecular markers has limited the adoption of modern molecular assisted selection approaches in cranberry breeding programs. To increase the number of available markers in the species, this study identified, tested, and validated microsatellite markers from existing nuclear and transcriptome sequencing data. In total, new primers were designed, synthesized, and tested for 979 SSR loci; 697 of the markers amplified allele patterns consistent with single locus segregation in a diploid organism and were considered polymorphic. Of the 697 polymorphic loci, 507 were selected for additional genetic diversity and segregation analyses in 29 cranberry genotypes. More than 95% of the 507 loci did not display segregation distortion at the p < 0.05 level, and contained moderate to high levels of polymorphism with a polymorphic information content >0.25. This comprehensive collection of developed and validated microsatellite loci represents a substantial addition to the molecular tools available for geneticists, genomicists, and breeders in cranberry and Vaccinium.

  3. A fast and cost-effective approach to develop and map EST-SSR markers: oak as a case study

    Directory of Open Access Journals (Sweden)

    Cherubini Marcello

    2010-10-01

    Full Text Available Abstract Background Expressed Sequence Tags (ESTs are a source of simple sequence repeats (SSRs that can be used to develop molecular markers for genetic studies. The availability of ESTs for Quercus robur and Quercus petraea provided a unique opportunity to develop microsatellite markers to accelerate research aimed at studying adaptation of these long-lived species to their environment. As a first step toward the construction of a SSR-based linkage map of oak for quantitative trait locus (QTL mapping, we describe the mining and survey of EST-SSRs as well as a fast and cost-effective approach (bin mapping to assign these markers to an approximate map position. We also compared the level of polymorphism between genomic and EST-derived SSRs and address the transferability of EST-SSRs in Castanea sativa (chestnut. Results A catalogue of 103,000 Sanger ESTs was assembled into 28,024 unigenes from which 18.6% presented one or more SSR motifs. More than 42% of these SSRs corresponded to trinucleotides. Primer pairs were designed for 748 putative unigenes. Overall 37.7% (283 were found to amplify a single polymorphic locus in a reference full-sib pedigree of Quercus robur. The usefulness of these loci for establishing a genetic map was assessed using a bin mapping approach. Bin maps were constructed for the male and female parental tree for which framework linkage maps based on AFLP markers were available. The bin set consisting of 14 highly informative offspring selected based on the number and position of crossover sites. The female and male maps comprised 44 and 37 bins, with an average bin length of 16.5 cM and 20.99 cM, respectively. A total of 256 EST-SSRs were assigned to bins and their map position was further validated by linkage mapping. EST-SSRs were found to be less polymorphic than genomic SSRs, but their transferability rate to chestnut, a phylogenetically related species to oak, was higher. Conclusion We have generated a bin map for oak comprising 256 EST-SSRs. This resource constitutes a first step toward the establishment of a gene-based map for this genus that will facilitate the dissection of QTLs affecting complex traits of ecological importance.

  4. Use of EST-SSR markers for evaluating genetic diversity and fingerprinting celery (Apium graveolens L.) cultivars.

    Science.gov (United States)

    Fu, Nan; Wang, Ping-Yong; Liu, Xiao-Dan; Shen, Huo-Lin

    2014-02-10

    Celery (Apium graveolens L.) is one of the most economically important vegetables worldwide, but genetic and genomic resources supporting celery molecular breeding are quite limited, thus few studies on celery have been conducted so far. In this study we made use of simple sequence repeat (SSR) markers generated from previous celery transcriptome sequencing and attempted to detect the genetic diversity and relationships of commonly used celery accessions and explore the efficiency of the primers used for cultivars identification. Analysis of molecular variance (AMOVA) of Apium graveolens L. var. dulce showed that approximately 43% of genetic diversity was within accessions, 45% among accessions, and 22% among horticultural types. The neighbor-joining tree generated by unweighted pair group method with arithmetic mean (UPGMA), and population structure analysis, as well as principal components analysis (PCA), separated the cultivars into clusters corresponding to the geographical areas where they originated. Genetic distance analysis suggested that genetic variation within Apium graveolens was quite limited. Genotypic diversity showed any combinations of 55 genic SSRs were able to distinguish the genotypes of all 30 accessions.

  5. Use of EST-SSR Markers for Evaluating Genetic Diversity and Fingerprinting Celery (Apium graveolens L. Cultivars

    Directory of Open Access Journals (Sweden)

    Nan Fu

    2014-02-01

    Full Text Available Celery (Apium graveolens L. is one of the most economically important vegetables worldwide, but genetic and genomic resources supporting celery molecular breeding are quite limited, thus few studies on celery have been conducted so far. In this study we made use of simple sequence repeat (SSR markers generated from previous celery transcriptome sequencing and attempted to detect the genetic diversity and relationships of commonly used celery accessions and explore the efficiency of the primers used for cultivars identification. Analysis of molecular variance (AMOVA of Apium graveolens L. var. dulce showed that approximately 43% of genetic diversity was within accessions, 45% among accessions, and 22% among horticultural types. The neighbor-joining tree generated by unweighted pair group method with arithmetic mean (UPGMA, and population structure analysis, as well as principal components analysis (PCA, separated the cultivars into clusters corresponding to the geographical areas where they originated. Genetic distance analysis suggested that genetic variation within Apium graveolens was quite limited. Genotypic diversity showed any combinations of 55 genic SSRs were able to distinguish the genotypes of all 30 accessions.

  6. An annotated corpus with nanomedicine and pharmacokinetic parameters.

    Science.gov (United States)

    Lewinski, Nastassja A; Jimenez, Ivan; McInnes, Bridget T

    2017-01-01

    A vast amount of data on nanomedicines is being generated and published, and natural language processing (NLP) approaches can automate the extraction of unstructured text-based data. Annotated corpora are a key resource for NLP and information extraction methods which employ machine learning. Although corpora are available for pharmaceuticals, resources for nanomedicines and nanotechnology are still limited. To foster nanotechnology text mining (NanoNLP) efforts, we have constructed a corpus of annotated drug product inserts taken from the US Food and Drug Administration's Drugs@FDA online database. In this work, we present the development of the Engineered Nanomedicine Database corpus to support the evaluation of nanomedicine entity extraction. The data were manually annotated for 21 entity mentions consisting of nanomedicine physicochemical characterization, exposure, and biologic response information of 41 Food and Drug Administration-approved nanomedicines. We evaluate the reliability of the manual annotations and demonstrate the use of the corpus by evaluating two state-of-the-art named entity extraction systems, OpenNLP and Stanford NER. The annotated corpus is available open source and, based on these results, guidelines and suggestions for future development of additional nanomedicine corpora are provided.

  7. Towards Automated Annotation of Benthic Survey Images: Variability of Human Experts and Operational Modes of Automation.

    Directory of Open Access Journals (Sweden)

    Oscar Beijbom

    Full Text Available Global climate change and other anthropogenic stressors have heightened the need to rapidly characterize ecological changes in marine benthic communities across large scales. Digital photography enables rapid collection of survey images to meet this need, but the subsequent image annotation is typically a time consuming, manual task. We investigated the feasibility of using automated point-annotation to expedite cover estimation of the 17 dominant benthic categories from survey-images captured at four Pacific coral reefs. Inter- and intra- annotator variability among six human experts was quantified and compared to semi- and fully- automated annotation methods, which are made available at coralnet.ucsd.edu. Our results indicate high expert agreement for identification of coral genera, but lower agreement for algal functional groups, in particular between turf algae and crustose coralline algae. This indicates the need for unequivocal definitions of algal groups, careful training of multiple annotators, and enhanced imaging technology. Semi-automated annotation, where 50% of the annotation decisions were performed automatically, yielded cover estimate errors comparable to those of the human experts. Furthermore, fully-automated annotation yielded rapid, unbiased cover estimates but with increased variance. These results show that automated annotation can increase spatial coverage and decrease time and financial outlay for image-based reef surveys.

  8. Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs.

    Directory of Open Access Journals (Sweden)

    Norihiro Maeda

    2006-04-01

    Full Text Available The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts, providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.

  9. Instructional Materials Centers; Annotated Bibliography.

    Science.gov (United States)

    Poli, Rosario, Comp.

    An annotated bibliography lists 74 articles and reports on instructional materials centers (IMC) which appeared from 1967-70. The articles deal with such topics as the purposes of an IMC, guidelines for setting up an IMC, and the relationship of an IMC to technology. Most articles deal with use of an IMC on an elementary or secondary level, but…

  10. Designing Annotation Before It's Needed

    NARCIS (Netherlands)

    F.-M. Nack (Frank); W. Putz

    2001-01-01

    textabstractThis paper considers the automated and semi-automated annotation of audiovisual media in a new type of production framework, A4SM (Authoring System for Syntactic, Semantic and Semiotic Modelling). We present the architecture of the framework and outline the underlying XML-Schema based

  11. Image annotation using clickthrough data

    NARCIS (Netherlands)

    T. Tsikrika (Theodora); C. Diou; A.P. de Vries (Arjen); A. Delopoulos

    2009-01-01

    htmlabstractAutomatic image annotation using supervised learning is performed by concept classifiers trained on labelled example images. This work proposes the use of clickthrough data collected from search logs as a source for the automatic generation of concept training data, thus avoiding the

  12. Learning Intelligent Dialogs for Bounding Box Annotation

    OpenAIRE

    Konyushkova, Ksenia; Uijlings, Jasper; Lampert, Christoph; Ferrari, Vittorio

    2017-01-01

    We introduce Intelligent Annotation Dialogs for bounding box annotation. We train an agent to automatically choose a sequence of actions for a human annotator to produce a bounding box in a minimal amount of time. Specifically, we consider two actions: box verification [37], where the annotator verifies a box generated by an object detector, and manual box drawing. We explore two kinds of agents, one based on predicting the probability that a box will be positively verified, and the other bas...

  13. Annotating images by mining image search results

    NARCIS (Netherlands)

    Wang, X.J.; Zhang, L.; Li, X.; Ma, W.Y.

    2008-01-01

    Although it has been studied for years by the computer vision and machine learning communities, image annotation is still far from practical. In this paper, we propose a novel attempt at model-free image annotation, which is a data-driven approach that annotates images by mining their search

  14. Dictionary-driven protein annotation.

    Science.gov (United States)

    Rigoutsos, Isidore; Huynh, Tien; Floratos, Aris; Parida, Laxmi; Platt, Daniel

    2002-09-01

    Computational methods seeking to automatically determine the properties (functional, structural, physicochemical, etc.) of a protein directly from the sequence have long been the focus of numerous research groups. With the advent of advanced sequencing methods and systems, the number of amino acid sequences that are being deposited in the public databases has been increasing steadily. This has in turn generated a renewed demand for automated approaches that can annotate individual sequences and complete genomes quickly, exhaustively and objectively. In this paper, we present one such approach that is centered around and exploits the Bio-Dictionary, a collection of amino acid patterns that completely covers the natural sequence space and can capture functional and structural signals that have been reused during evolution, within and across protein families. Our annotation approach also makes use of a weighted, position-specific scoring scheme that is unaffected by the over-representation of well-conserved proteins and protein fragments in the databases used. For a given query sequence, the method permits one to determine, in a single pass, the following: local and global similarities between the query and any protein already present in a public database; the likeness of the query to all available archaeal/ bacterial/eukaryotic/viral sequences in the database as a function of amino acid position within the query; the character of secondary structure of the query as a function of amino acid position within the query; the cytoplasmic, transmembrane or extracellular behavior of the query; the nature and position of binding domains, active sites, post-translationally modified sites, signal peptides, etc. In terms of performance, the proposed method is exhaustive, objective and allows for the rapid annotation of individual sequences and full genomes. Annotation examples are presented and discussed in Results, including individual queries and complete genomes that were

  15. Evaluating Hierarchical Structure in Music Annotations.

    Science.gov (United States)

    McFee, Brian; Nieto, Oriol; Farbood, Morwaread M; Bello, Juan Pablo

    2017-01-01

    Music exhibits structure at multiple scales, ranging from motifs to large-scale functional components. When inferring the structure of a piece, different listeners may attend to different temporal scales, which can result in disagreements when they describe the same piece. In the field of music informatics research (MIR), it is common to use corpora annotated with structural boundaries at different levels. By quantifying disagreements between multiple annotators, previous research has yielded several insights relevant to the study of music cognition. First, annotators tend to agree when structural boundaries are ambiguous. Second, this ambiguity seems to depend on musical features, time scale, and genre. Furthermore, it is possible to tune current annotation evaluation metrics to better align with these perceptual differences. However, previous work has not directly analyzed the effects of hierarchical structure because the existing methods for comparing structural annotations are designed for "flat" descriptions, and do not readily generalize to hierarchical annotations. In this paper, we extend and generalize previous work on the evaluation of hierarchical descriptions of musical structure. We derive an evaluation metric which can compare hierarchical annotations holistically across multiple levels. sing this metric, we investigate inter-annotator agreement on the multilevel annotations of two different music corpora, investigate the influence of acoustic properties on hierarchical annotations, and evaluate existing hierarchical segmentation algorithms against the distribution of inter-annotator agreement.

  16. Marky: a tool supporting annotation consistency in multi-user and iterative document annotation projects.

    Science.gov (United States)

    Pérez-Pérez, Martín; Glez-Peña, Daniel; Fdez-Riverola, Florentino; Lourenço, Anália

    2015-02-01

    Document annotation is a key task in the development of Text Mining methods and applications. High quality annotated corpora are invaluable, but their preparation requires a considerable amount of resources and time. Although the existing annotation tools offer good user interaction interfaces to domain experts, project management and quality control abilities are still limited. Therefore, the current work introduces Marky, a new Web-based document annotation tool equipped to manage multi-user and iterative projects, and to evaluate annotation quality throughout the project life cycle. At the core, Marky is a Web application based on the open source CakePHP framework. User interface relies on HTML5 and CSS3 technologies. Rangy library assists in browser-independent implementation of common DOM range and selection tasks, and Ajax and JQuery technologies are used to enhance user-system interaction. Marky grants solid management of inter- and intra-annotator work. Most notably, its annotation tracking system supports systematic and on-demand agreement analysis and annotation amendment. Each annotator may work over documents as usual, but all the annotations made are saved by the tracking system and may be further compared. So, the project administrator is able to evaluate annotation consistency among annotators and across rounds of annotation, while annotators are able to reject or amend subsets of annotations made in previous rounds. As a side effect, the tracking system minimises resource and time consumption. Marky is a novel environment for managing multi-user and iterative document annotation projects. Compared to other tools, Marky offers a similar visually intuitive annotation experience while providing unique means to minimise annotation effort and enforce annotation quality, and therefore corpus consistency. Marky is freely available for non-commercial use at http://sing.ei.uvigo.es/marky. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  17. Werkzeuge zur Annotation diachroner Korpora

    OpenAIRE

    Burghardt, Manuel; Wolff, Christian

    2009-01-01

    Wir diskutieren zunächst die Problematik der (syntaktischen) Annotation diachroner Korpora und stellen anschließend eine Evaluationsstudie vor, bei der mehr als 50 Annotationswerkzeuge und -frameworks vor dem Hintergrund eines funktionalen und software-ergonomischen Anforderungsprofils nach dem Qualitätsmodell von ISO/IEC 9126-1:2001 (Software engineering – Product quality – Part 1: Quality model) und ISO/IEC 25000:2005 (Software Engineering – Software product Quality Requirements and Evaluat...

  18. Semantic annotation in biomedicine: the current landscape.

    Science.gov (United States)

    Jovanović, Jelena; Bagheri, Ebrahim

    2017-09-22

    The abundance and unstructured nature of biomedical texts, be it clinical or research content, impose significant challenges for the effective and efficient use of information and knowledge stored in such texts. Annotation of biomedical documents with machine intelligible semantics facilitates advanced, semantics-based text management, curation, indexing, and search. This paper focuses on annotation of biomedical entity mentions with concepts from relevant biomedical knowledge bases such as UMLS. As a result, the meaning of those mentions is unambiguously and explicitly defined, and thus made readily available for automated processing. This process is widely known as semantic annotation, and the tools that perform it are known as semantic annotators.Over the last dozen years, the biomedical research community has invested significant efforts in the development of biomedical semantic annotation technology. Aiming to establish grounds for further developments in this area, we review a selected set of state of the art biomedical semantic annotators, focusing particularly on general purpose annotators, that is, semantic annotation tools that can be customized to work with texts from any area of biomedicine. We also examine potential directions for further improvements of today's annotators which could make them even more capable of meeting the needs of real-world applications. To motivate and encourage further developments in this area, along the suggested and/or related directions, we review existing and potential practical applications and benefits of semantic annotators.

  19. AISO: Annotation of Image Segments with Ontologies.

    Science.gov (United States)

    Lingutla, Nikhil Tej; Preece, Justin; Todorovic, Sinisa; Cooper, Laurel; Moore, Laura; Jaiswal, Pankaj

    2014-01-01

    Large quantities of digital images are now generated for biological collections, including those developed in projects premised on the high-throughput screening of genome-phenome experiments. These images often carry annotations on taxonomy and observable features, such as anatomical structures and phenotype variations often recorded in response to the environmental factors under which the organisms were sampled. At present, most of these annotations are described in free text, may involve limited use of non-standard vocabularies, and rarely specify precise coordinates of features on the image plane such that a computer vision algorithm could identify, extract and annotate them. Therefore, researchers and curators need a tool that can identify and demarcate features in an image plane and allow their annotation with semantically contextual ontology terms. Such a tool would generate data useful for inter and intra-specific comparison and encourage the integration of curation standards. In the future, quality annotated image segments may provide training data sets for developing machine learning applications for automated image annotation. We developed a novel image segmentation and annotation software application, "Annotation of Image Segments with Ontologies" (AISO). The tool enables researchers and curators to delineate portions of an image into multiple highlighted segments and annotate them with an ontology-based controlled vocabulary. AISO is a freely available Java-based desktop application and runs on multiple platforms. It can be downloaded at http://www.plantontology.org/software/AISO. AISO enables curators and researchers to annotate digital images with ontology terms in a manner which ensures the future computational value of the annotated images. We foresee uses for such data-encoded image annotations in biological data mining, machine learning, predictive annotation, semantic inference, and comparative analyses.

  20. Computational algorithms to predict Gene Ontology annotations.

    Science.gov (United States)

    Pinoli, Pietro; Chicco, Davide; Masseroli, Marco

    2015-01-01

    Gene function annotations, which are associations between a gene and a term of a controlled vocabulary describing gene functional features, are of paramount importance in modern biology. Datasets of these annotations, such as the ones provided by the Gene Ontology Consortium, are used to design novel biological experiments and interpret their results. Despite their importance, these sources of information have some known issues. They are incomplete, since biological knowledge is far from being definitive and it rapidly evolves, and some erroneous annotations may be present. Since the curation process of novel annotations is a costly procedure, both in economical and time terms, computational tools that can reliably predict likely annotations, and thus quicken the discovery of new gene annotations, are very useful. We used a set of computational algorithms and weighting schemes to infer novel gene annotations from a set of known ones. We used the latent semantic analysis approach, implementing two popular algorithms (Latent Semantic Indexing and Probabilistic Latent Semantic Analysis) and propose a novel method, the Semantic IMproved Latent Semantic Analysis, which adds a clustering step on the set of considered genes. Furthermore, we propose the improvement of these algorithms by weighting the annotations in the input set. We tested our methods and their weighted variants on the Gene Ontology annotation sets of three model organism genes (Bos taurus, Danio rerio and Drosophila melanogaster ). The methods showed their ability in predicting novel gene annotations and the weighting procedures demonstrated to lead to a valuable improvement, although the obtained results vary according to the dimension of the input annotation set and the considered algorithm. Out of the three considered methods, the Semantic IMproved Latent Semantic Analysis is the one that provides better results. In particular, when coupled with a proper weighting policy, it is able to predict a

  1. Annotation of regular polysemy and underspecification

    DEFF Research Database (Denmark)

    Martínez Alonso, Héctor; Pedersen, Bolette Sandford; Bel, Núria

    2013-01-01

    We present the result of an annotation task on regular polysemy for a series of seman- tic classes or dot types in English, Dan- ish and Spanish. This article describes the annotation process, the results in terms of inter-encoder agreement, and the sense distributions obtained with two methods......: majority voting with a theory-compliant backoff strategy, and MACE, an unsuper- vised system to choose the most likely sense from all the annotations....

  2. BioAnnote: a software platform for annotating biomedical documents with application in medical learning environments.

    Science.gov (United States)

    López-Fernández, H; Reboiro-Jato, M; Glez-Peña, D; Aparicio, F; Gachet, D; Buenaga, M; Fdez-Riverola, F

    2013-07-01

    Automatic term annotation from biomedical documents and external information linking are becoming a necessary prerequisite in modern computer-aided medical learning systems. In this context, this paper presents BioAnnote, a flexible and extensible open-source platform for automatically annotating biomedical resources. Apart from other valuable features, the software platform includes (i) a rich client enabling users to annotate multiple documents in a user friendly environment, (ii) an extensible and embeddable annotation meta-server allowing for the annotation of documents with local or remote vocabularies and (iii) a simple client/server protocol which facilitates the use of our meta-server from any other third-party application. In addition, BioAnnote implements a powerful scripting engine able to perform advanced batch annotations. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  3. Annotating temporal information in clinical narratives.

    Science.gov (United States)

    Sun, Weiyi; Rumshisky, Anna; Uzuner, Ozlem

    2013-12-01

    Temporal information in clinical narratives plays an important role in patients' diagnosis, treatment and prognosis. In order to represent narrative information accurately, medical natural language processing (MLP) systems need to correctly identify and interpret temporal information. To promote research in this area, the Informatics for Integrating Biology and the Bedside (i2b2) project developed a temporally annotated corpus of clinical narratives. This corpus contains 310 de-identified discharge summaries, with annotations of clinical events, temporal expressions and temporal relations. This paper describes the process followed for the development of this corpus and discusses annotation guideline development, annotation methodology, and corpus quality. Copyright © 2013 Elsevier Inc. All rights reserved.

  4. ANNOTATION SUPPORTED OCCLUDED OBJECT TRACKING

    Directory of Open Access Journals (Sweden)

    Devinder Kumar

    2012-08-01

    Full Text Available Tracking occluded objects at different depths has become as extremely important component of study for any video sequence having wide applications in object tracking, scene recognition, coding, editing the videos and mosaicking. The paper studies the ability of annotation to track the occluded object based on pyramids with variation in depth further establishing a threshold at which the ability of the system to track the occluded object fails. Image annotation is applied on 3 similar video sequences varying in depth. In the experiment, one bike occludes the other at a depth of 60cm, 80cm and 100cm respectively. Another experiment is performed on tracking humans with similar depth to authenticate the results. The paper also computes the frame by frame error incurred by the system, supported by detailed simulations. This system can be effectively used to analyze the error in motion tracking and further correcting the error leading to flawless tracking. This can be of great interest to computer scientists while designing surveillance systems etc.

  5. annot8r: GO, EC and KEGG annotation of EST datasets.

    Science.gov (United States)

    Schmid, Ralf; Blaxter, Mark L

    2008-04-09

    The expressed sequence tag (EST) methodology is an attractive option for the generation of sequence data for species for which no completely sequenced genome is available. The annotation and comparative analysis of such datasets poses a formidable challenge for research groups that do not have the bioinformatics infrastructure of major genome sequencing centres. Therefore, there is a need for user-friendly tools to facilitate the annotation of non-model species EST datasets with well-defined ontologies that enable meaningful cross-species comparisons. To address this, we have developed annot8r, a platform for the rapid annotation of EST datasets with GO-terms, EC-numbers and KEGG-pathways. annot8r automatically downloads all files relevant for the annotation process and generates a reference database that stores UniProt entries, their associated Gene Ontology (GO), Enzyme Commission (EC) and Kyoto Encyclopaedia of Genes and Genomes (KEGG) annotation and additional relevant data. For each of GO, EC and KEGG, annot8r extracts a specific sequence subset from the UniProt dataset based on the information stored in the reference database. These three subsets are then formatted for BLAST searches. The user provides the protein or nucleotide sequences to be annotated and annot8r runs BLAST searches against these three subsets. The BLAST results are parsed and the corresponding annotations retrieved from the reference database. The annotations are saved both as flat files and also in a relational postgreSQL results database to facilitate more advanced searches within the results. annot8r is integrated with the PartiGene suite of EST analysis tools. annot8r is a tool that assigns GO, EC and KEGG annotations for data sets resulting from EST sequencing projects both rapidly and efficiently. The benefits of an underlying relational database, flexibility and the ease of use of the program make it ideally suited for non-model species EST-sequencing projects.

  6. Black English Annotations for Elementary Reading Programs.

    Science.gov (United States)

    Prasad, Sandre

    This report describes a program that uses annotations in the teacher's editions of existing reading programs to indicate the characteristics of black English that may interfere with the reading process of black children. The first part of the report provides a rationale for the annotation approach, explaining that the discrepancy between written…

  7. Ground Truth Annotation in T Analyst

    DEFF Research Database (Denmark)

    2015-01-01

    This video shows how to annotate the ground truth tracks in the thermal videos. The ground truth tracks are produced to be able to compare them to tracks obtained from a Computer Vision tracking approach. The program used for annotation is T-Analyst, which is developed by Aliaksei Laureshyn, Ph...

  8. Towards the Automated Annotation of Process Models

    NARCIS (Netherlands)

    Leopold, H.; Meilicke, C.; Fellmann, M.; Pittke, F.; Stuckenschmidt, H.; Mendling, J.

    2016-01-01

    Many techniques for the advanced analysis of process models build on the annotation of process models with elements from predefined vocabularies such as taxonomies. However, the manual annotation of process models is cumbersome and sometimes even hardly manageable taking the size of taxonomies into

  9. Towards Viral Genome Annotation Standards, Report from the 2010 NCBI Annotation Workshop.

    Science.gov (United States)

    Brister, James Rodney; Bao, Yiming; Kuiken, Carla; Lefkowitz, Elliot J; Le Mercier, Philippe; Leplae, Raphael; Madupu, Ramana; Scheuermann, Richard H; Schobel, Seth; Seto, Donald; Shrivastava, Susmita; Sterk, Peter; Zeng, Qiandong; Klimke, William; Tatusova, Tatiana

    2010-10-01

    Improvements in DNA sequencing technologies portend a new era in virology and could possibly lead to a giant leap in our understanding of viral evolution and ecology. Yet, as viral genome sequences begin to fill the world's biological databases, it is critically important to recognize that the scientific promise of this era is dependent on consistent and comprehensive genome annotation. With this in mind, the NCBI Genome Annotation Workshop recently hosted a study group tasked with developing sequence, function, and metadata annotation standards for viral genomes. This report describes the issues involved in viral genome annotation and reviews policy recommendations presented at the NCBI Annotation Workshop.

  10. Towards Viral Genome Annotation Standards, Report from the 2010 NCBI Annotation Workshop

    Directory of Open Access Journals (Sweden)

    Qiandong Zeng

    2010-10-01

    Full Text Available Improvements in DNA sequencing technologies portend a new era in virology and could possibly lead to a giant leap in our understanding of viral evolution and ecology. Yet, as viral genome sequences begin to fill the world’s biological databases, it is critically important to recognize that the scientific promise of this era is dependent on consistent and comprehensive genome annotation. With this in mind, the NCBI Genome Annotation Workshop recently hosted a study group tasked with developing sequence, function, and metadata annotation standards for viral genomes. This report describes the issues involved in viral genome annotation and reviews policy recommendations presented at the NCBI Annotation Workshop.

  11. Creating Gaze Annotations in Head Mounted Displays

    DEFF Research Database (Denmark)

    Mardanbeigi, Diako; Qvarfordt, Pernilla

    2015-01-01

    To facilitate distributed communication in mobile settings, we developed GazeNote for creating and sharing gaze annotations in head mounted displays (HMDs). With gaze annotations it possible to point out objects of interest within an image and add a verbal description. To create an annota- tion......, the user simply captures an image using the HMD’s camera, looks at an object of interest in the image, and speaks out the information to be associated with the object. The gaze location is recorded and visualized with a marker. The voice is transcribed using speech recognition. Gaze annotations can...... be shared. Our study showed that users found that gaze annotations add precision and expressive- ness compared to annotations of the image as a whole...

  12. Facilitating functional annotation of chicken microarray data

    Directory of Open Access Journals (Sweden)

    Gresham Cathy R

    2009-10-01

    Full Text Available Abstract Background Modeling results from chicken microarray studies is challenging for researchers due to little functional annotation associated with these arrays. The Affymetrix GenChip chicken genome array, one of the biggest arrays that serve as a key research tool for the study of chicken functional genomics, is among the few arrays that link gene products to Gene Ontology (GO. However the GO annotation data presented by Affymetrix is incomplete, for example, they do not show references linked to manually annotated functions. In addition, there is no tool that facilitates microarray researchers to directly retrieve functional annotations for their datasets from the annotated arrays. This costs researchers amount of time in searching multiple GO databases for functional information. Results We have improved the breadth of functional annotations of the gene products associated with probesets on the Affymetrix chicken genome array by 45% and the quality of annotation by 14%. We have also identified the most significant diseases and disorders, different types of genes, and known drug targets represented on Affymetrix chicken genome array. To facilitate functional annotation of other arrays and microarray experimental datasets we developed an Array GO Mapper (AGOM tool to help researchers to quickly retrieve corresponding functional information for their dataset. Conclusion Results from this study will directly facilitate annotation of other chicken arrays and microarray experimental datasets. Researchers will be able to quickly model their microarray dataset into more reliable biological functional information by using AGOM tool. The disease, disorders, gene types and drug targets revealed in the study will allow researchers to learn more about how genes function in complex biological systems and may lead to new drug discovery and development of therapies. The GO annotation data generated will be available for public use via AgBase website and

  13. Concept annotation in the CRAFT corpus.

    Science.gov (United States)

    Bada, Michael; Eckert, Miriam; Evans, Donald; Garcia, Kristin; Shipley, Krista; Sitnikov, Dmitry; Baumgartner, William A; Cohen, K Bretonnel; Verspoor, Karin; Blake, Judith A; Hunter, Lawrence E

    2012-07-09

    Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text. This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP) community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released). Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement. As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens), our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection), the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are freely available at http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml.

  14. Teaching and Learning Communities through Online Annotation

    Science.gov (United States)

    van der Pluijm, B.

    2016-12-01

    What do colleagues do with your assigned textbook? What they say or think about the material? Want students to be more engaged in their learning experience? If so, online materials that complement standard lecture format provide new opportunity through managed, online group annotation that leverages the ubiquity of internet access, while personalizing learning. The concept is illustrated with the new online textbook "Processes in Structural Geology and Tectonics", by Ben van der Pluijm and Stephen Marshak, which offers a platform for sharing of experiences, supplementary materials and approaches, including readings, mathematical applications, exercises, challenge questions, quizzes, alternative explanations, and more. The annotation framework used is Hypothes.is, which offers a free, open platform markup environment for annotation of websites and PDF postings. The annotations can be public, grouped or individualized, as desired, including export access and download of annotations. A teacher group, hosted by a moderator/owner, limits access to members of a user group of teachers, so that its members can use, copy or transcribe annotations for their own lesson material. Likewise, an instructor can host a student group that encourages sharing of observations, questions and answers among students and instructor. Also, the instructor can create one or more closed groups that offers study help and hints to students. Options galore, all of which aim to engage students and to promote greater responsibility for their learning experience. Beyond new capacity, the ability to analyze student annotation supports individual learners and their needs. For example, student notes can be analyzed for key phrases and concepts, and identify misunderstandings, omissions and problems. Also, example annotations can be shared to enhance notetaking skills and to help with studying. Lastly, online annotation allows active application to lecture posted slides, supporting real-time notetaking

  15. Concept annotation in the CRAFT corpus

    Science.gov (United States)

    2012-01-01

    Background Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text. Results This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP) community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released). Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement. Conclusions As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens), our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection), the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are freely available at http

  16. Automatic annotation of head velocity and acceleration in Anvil

    DEFF Research Database (Denmark)

    Jongejan, Bart

    2012-01-01

    We describe an automatic face tracker plugin for the ANVIL annotation tool. The face tracker produces data for velocity and for acceleration in two dimensions. We compare the annotations generated by the face tracking algorithm with independently made manual annotations for head movements....... The annotations are a useful supplement to manual annotations and may help human annotators to quickly and reliably determine onset of head movements and to suggest which kind of head movement is taking place....

  17. Semantic annotation of consumer health questions.

    Science.gov (United States)

    Kilicoglu, Halil; Ben Abacha, Asma; Mrabet, Yassine; Shooshan, Sonya E; Rodriguez, Laritza; Masterton, Kate; Demner-Fushman, Dina

    2018-02-06

    Consumers increasingly use online resources for their health information needs. While current search engines can address these needs to some extent, they generally do not take into account that most health information needs are complex and can only fully be expressed in natural language. Consumer health question answering (QA) systems aim to fill this gap. A major challenge in developing consumer health QA systems is extracting relevant semantic content from the natural language questions (question understanding). To develop effective question understanding tools, question corpora semantically annotated for relevant question elements are needed. In this paper, we present a two-part consumer health question corpus annotated with several semantic categories: named entities, question triggers/types, question frames, and question topic. The first part (CHQA-email) consists of relatively long email requests received by the U.S. National Library of Medicine (NLM) customer service, while the second part (CHQA-web) consists of shorter questions posed to MedlinePlus search engine as queries. Each question has been annotated by two annotators. The annotation methodology is largely the same between the two parts of the corpus; however, we also explain and justify the differences between them. Additionally, we provide information about corpus characteristics, inter-annotator agreement, and our attempts to measure annotation confidence in the absence of adjudication of annotations. The resulting corpus consists of 2614 questions (CHQA-email: 1740, CHQA-web: 874). Problems are the most frequent named entities, while treatment and general information questions are the most common question types. Inter-annotator agreement was generally modest: question types and topics yielded highest agreement, while the agreement for more complex frame annotations was lower. Agreement in CHQA-web was consistently higher than that in CHQA-email. Pairwise inter-annotator agreement proved most

  18. SeqAnt: A web service to rapidly identify and annotate DNA sequence variations

    Directory of Open Access Journals (Sweden)

    Patel Viren

    2010-09-01

    Full Text Available Abstract Background The enormous throughput and low cost of second-generation sequencing platforms now allow research and clinical geneticists to routinely perform single experiments that identify tens of thousands to millions of variant sites. Existing methods to annotate variant sites using information from publicly available databases via web browsers are too slow to be useful for the large sequencing datasets being routinely generated by geneticists. Because sequence annotation of variant sites is required before functional characterization can proceed, the lack of a high-throughput pipeline to efficiently annotate variant sites can act as a significant bottleneck in genetics research. Results SeqAnt (Sequence Annotator is an open source web service and software package that rapidly annotates DNA sequence variants and identifies recessive or compound heterozygous loci in human, mouse, fly, and worm genome sequencing experiments. Variants are characterized with respect to their functional type, frequency, and evolutionary conservation. Annotated variants can be viewed on a web browser, downloaded in a tab-delimited text file, or directly uploaded in a BED format to the UCSC genome browser. To demonstrate the speed of SeqAnt, we annotated a series of publicly available datasets that ranged in size from 37 to 3,439,107 variant sites. The total time to completely annotate these data completely ranged from 0.17 seconds to 28 minutes 49.8 seconds. Conclusion SeqAnt is an open source web service and software package that overcomes a critical bottleneck facing research and clinical geneticists using second-generation sequencing platforms. SeqAnt will prove especially useful for those investigators who lack dedicated bioinformatics personnel or infrastructure in their laboratories.

  19. Making web annotations persistent over time

    Energy Technology Data Exchange (ETDEWEB)

    Sanderson, Robert [Los Alamos National Laboratory; Van De Sompel, Herbert [Los Alamos National Laboratory

    2010-01-01

    As Digital Libraries (DL) become more aligned with the web architecture, their functional components need to be fundamentally rethought in terms of URIs and HTTP. Annotation, a core scholarly activity enabled by many DL solutions, exhibits a clearly unacceptable characteristic when existing models are applied to the web: due to the representations of web resources changing over time, an annotation made about a web resource today may no longer be relevant to the representation that is served from that same resource tomorrow. We assume the existence of archived versions of resources, and combine the temporal features of the emerging Open Annotation data model with the capability offered by the Memento framework that allows seamless navigation from the URI of a resource to archived versions of that resource, and arrive at a solution that provides guarantees regarding the persistence of web annotations over time. More specifically, we provide theoretical solutions and proof-of-concept experimental evaluations for two problems: reconstructing an existing annotation so that the correct archived version is displayed for all resources involved in the annotation, and retrieving all annotations that involve a given archived version of a web resource.

  20. Computational annotation of genes differentially expressed along olive fruit development

    Directory of Open Access Journals (Sweden)

    Martinelli Federico

    2009-10-01

    used to query all known KEGG (Kyoto Encyclopaedia of Genes and Genomes metabolic pathways for characterizing and positioning retrieved EST records. The integration of the olive sequence datasets within the MapMan platform for microarray analysis allowed the identification of specific biosynthetic pathways useful for the definition of key functional categories in time course analyses for gene groups. Conclusion The bioinformatic annotation of all gene sequences was useful to shed light on metabolic pathways and transcriptional aspects related to carbohydrates, fatty acids, secondary metabolites, transcription factors and hormones as well as response to biotic and abiotic stresses throughout olive drupe development. These results represent a first step toward both functional genomics and systems biology research for understanding the gene functions and regulatory networks in olive fruit growth and ripening.

  1. Crowdsourcing and annotating NER for Twitter #drift

    DEFF Research Database (Denmark)

    Fromreide, Hege; Hovy, Dirk; Søgaard, Anders

    2014-01-01

    We present two new NER datasets for Twitter; a manually annotated set of 1,467 tweets (kappa=0.942) and a set of 2,975 expert-corrected, crowdsourced NER annotated tweets from the dataset described in Finin et al. (2010). In our experiments with these datasets, we observe two important points: (a......) language drift on Twitter is significant, and while off-the-shelf systems have been reported to perform well on in-sample data, they often perform poorly on new samples of tweets, (b) state-of-the-art performance across various datasets can beobtained from crowdsourced annotations, making it more feasible...

  2. Annotating and Interpreting Linear and Cyclic Peptide Tandem Mass Spectra.

    Science.gov (United States)

    Niedermeyer, Timo Horst Johannes

    2016-01-01

    Nonribosomal peptides often possess pronounced bioactivity, and thus, they are often interesting hit compounds in natural product-based drug discovery programs. Their mass spectrometric characterization is difficult due to the predominant occurrence of non-proteinogenic monomers and, especially in the case of cyclic peptides, the complex fragmentation patterns observed. This makes nonribosomal peptide tandem mass spectra annotation challenging and time-consuming. To meet this challenge, software tools for this task have been developed. In this chapter, the workflow for using the software mMass for the annotation of experimentally obtained peptide tandem mass spectra is described. mMass is freely available (http://www.mmass.org), open-source, and the most advanced and user-friendly software tool for this purpose. The software enables the analyst to concisely annotate and interpret tandem mass spectra of linear and cyclic peptides. Thus, it is highly useful for accelerating the structure confirmation and elucidation of cyclic as well as linear peptides and depsipeptides.

  3. Meteor showers an annotated catalog

    CERN Document Server

    Kronk, Gary W

    2014-01-01

    Meteor showers are among the most spectacular celestial events that may be observed by the naked eye, and have been the object of fascination throughout human history. In “Meteor Showers: An Annotated Catalog,” the interested observer can access detailed research on over 100 annual and periodic meteor streams in order to capitalize on these majestic spectacles. Each meteor shower entry includes details of their discovery, important observations and orbits, and gives a full picture of duration, location in the sky, and expected hourly rates. Armed with a fuller understanding, the amateur observer can better view and appreciate the shower of their choice. The original book, published in 1988, has been updated with over 25 years of research in this new and improved edition. Almost every meteor shower study is expanded, with some original minor showers being dropped while new ones are added. The book also includes breakthroughs in the study of meteor showers, such as accurate predictions of outbursts as well ...

  4. AIGO: Towards a unified framework for the Analysis and the Inter-comparison of GO functional annotations

    Directory of Open Access Journals (Sweden)

    Defoin-Platel Michael

    2011-11-01

    Full Text Available Abstract Background In response to the rapid growth of available genome sequences, efforts have been made to develop automatic inference methods to functionally characterize them. Pipelines that infer functional annotation are now routinely used to produce new annotations at a genome scale and for a broad variety of species. These pipelines differ widely in their inference algorithms, confidence thresholds and data sources for reasoning. This heterogeneity makes a comparison of the relative merits of each approach extremely complex. The evaluation of the quality of the resultant annotations is also challenging given there is often no existing gold-standard against which to evaluate precision and recall. Results In this paper, we present a pragmatic approach to the study of functional annotations. An ensemble of 12 metrics, describing various aspects of functional annotations, is defined and implemented in a unified framework, which facilitates their systematic analysis and inter-comparison. The use of this framework is demonstrated on three illustrative examples: analysing the outputs of state-of-the-art inference pipelines, comparing electronic versus manual annotation methods, and monitoring the evolution of publicly available functional annotations. The framework is part of the AIGO library (http://code.google.com/p/aigo for the Analysis and the Inter-comparison of the products of Gene Ontology (GO annotation pipelines. The AIGO library also provides functionalities to easily load, analyse, manipulate and compare functional annotations and also to plot and export the results of the analysis in various formats. Conclusions This work is a step toward developing a unified framework for the systematic study of GO functional annotations. This framework has been designed so that new metrics on GO functional annotations can be added in a very straightforward way.

  5. Guidelines for visualizing and annotating rule-based models†

    Science.gov (United States)

    Chylek, Lily A.; Hu, Bin; Blinov, Michael L.; Emonet, Thierry; Faeder, James R.; Goldstein, Byron; Gutenkunst, Ryan N.; Haugh, Jason M.; Lipniacki, Tomasz; Posner, Richard G.; Yang, Jin; Hlavacek, William S.

    2011-01-01

    Rule-based modeling provides a means to represent cell signaling systems in a way that captures site-specific details of molecular interactions. For rule-based models to be more widely understood and (re)used, conventions for model visualization and annotation are needed. We have developed the concepts of an extended contact map and a model guide for illustrating and annotating rule-based models. An extended contact map represents the scope of a model by providing an illustration of each molecule, molecular component, direct physical interaction, post-translational modification, and enzyme-substrate relationship considered in a model. A map can also illustrate allosteric effects, structural relationships among molecular components, and compartmental locations of molecules. A model guide associates elements of a contact map with annotation and elements of an underlying model, which may be fully or partially specified. A guide can also serve to document the biological knowledge upon which a model is based. We provide examples of a map and guide for a published rule-based model that characterizes early events in IgE receptor (FcεRI) signaling. We also provide examples of how to visualize a variety of processes that are common in cell signaling systems but not considered in the example model, such as ubiquitination. An extended contact map and an associated guide can document knowledge of a cell signaling system in a form that is visual as well as executable. As a tool for model annotation, a map and guide can communicate the content of a model clearly and with precision, even for large models. PMID:21647530

  6. Guidelines for visualizing and annotating rule-based models.

    Science.gov (United States)

    Chylek, Lily A; Hu, Bin; Blinov, Michael L; Emonet, Thierry; Faeder, James R; Goldstein, Byron; Gutenkunst, Ryan N; Haugh, Jason M; Lipniacki, Tomasz; Posner, Richard G; Yang, Jin; Hlavacek, William S

    2011-10-01

    Rule-based modeling provides a means to represent cell signaling systems in a way that captures site-specific details of molecular interactions. For rule-based models to be more widely understood and (re)used, conventions for model visualization and annotation are needed. We have developed the concepts of an extended contact map and a model guide for illustrating and annotating rule-based models. An extended contact map represents the scope of a model by providing an illustration of each molecule, molecular component, direct physical interaction, post-translational modification, and enzyme-substrate relationship considered in a model. A map can also illustrate allosteric effects, structural relationships among molecular components, and compartmental locations of molecules. A model guide associates elements of a contact map with annotation and elements of an underlying model, which may be fully or partially specified. A guide can also serve to document the biological knowledge upon which a model is based. We provide examples of a map and guide for a published rule-based model that characterizes early events in IgE receptor (FcεRI) signaling. We also provide examples of how to visualize a variety of processes that are common in cell signaling systems but not considered in the example model, such as ubiquitination. An extended contact map and an associated guide can document knowledge of a cell signaling system in a form that is visual as well as executable. As a tool for model annotation, a map and guide can communicate the content of a model clearly and with precision, even for large models.

  7. An Informally Annotated Bibliography of Sociolinguistics.

    Science.gov (United States)

    Tannen, Deborah

    This annotated bibliography of sociolinguistics is divided into the following sections: speech events, ethnography of speaking and anthropological approaches to analysis of conversation; discourse analysis (including analysis of conversation and narrative), ethnomethodology and nonverbal communication; sociolinguistics; pragmatics (including…

  8. Annotation and retrieval in protein interaction databases

    Science.gov (United States)

    Cannataro, Mario; Hiram Guzzi, Pietro; Veltri, Pierangelo

    2014-06-01

    Biological databases have been developed with a special focus on the efficient retrieval of single records or the efficient computation of specialized bioinformatics algorithms against the overall database, such as in sequence alignment. The continuos production of biological knowledge spread on several biological databases and ontologies, such as Gene Ontology, and the availability of efficient techniques to handle such knowledge, such as annotation and semantic similarity measures, enable the development on novel bioinformatics applications that explicitly use and integrate such knowledge. After introducing the annotation process and the main semantic similarity measures, this paper shows how annotations and semantic similarity can be exploited to improve the extraction and analysis of biologically relevant data from protein interaction databases. As case studies, the paper presents two novel software tools, OntoPIN and CytoSeVis, both based on the use of Gene Ontology annotations, for the advanced querying of protein interaction databases and for the enhanced visualization of protein interaction networks.

  9. SASL: A Semantic Annotation System for Literature

    Science.gov (United States)

    Yuan, Pingpeng; Wang, Guoyin; Zhang, Qin; Jin, Hai

    Due to ambiguity, search engines for scientific literatures may not return right search results. One efficient solution to the problems is to automatically annotate literatures and attach the semantic information to them. Generally, semantic annotation requires identifying entities before attaching semantic information to them. However, due to abbreviation and other reasons, it is very difficult to identify entities correctly. The paper presents a Semantic Annotation System for Literature (SASL), which utilizes Wikipedia as knowledge base to annotate literatures. SASL mainly attaches semantic to terminology, academic institutions, conferences, and journals etc. Many of them are usually abbreviations, which induces ambiguity. Here, SASL uses regular expressions to extract the mapping between full name of entities and their abbreviation. Since full names of several entities may map to a single abbreviation, SASL introduces Hidden Markov Model to implement name disambiguation. Finally, the paper presents the experimental results, which confirm SASL a good performance.

  10. Temporal Annotation in the Clinical Domain

    Science.gov (United States)

    Styler, William F.; Bethard, Steven; Finan, Sean; Palmer, Martha; Pradhan, Sameer; de Groen, Piet C; Erickson, Brad; Miller, Timothy; Lin, Chen; Savova, Guergana; Pustejovsky, James

    2014-01-01

    This article discusses the requirements of a formal specification for the annotation of temporal information in clinical narratives. We discuss the implementation and extension of ISO-TimeML for annotating a corpus of clinical notes, known as the THYME corpus. To reflect the information task and the heavily inference-based reasoning demands in the domain, a new annotation guideline has been developed, “the THYME Guidelines to ISO-TimeML (THYME-TimeML)”. To clarify what relations merit annotation, we distinguish between linguistically-derived and inferentially-derived temporal orderings in the text. We also apply a top performing TempEval 2013 system against this new resource to measure the difficulty of adapting systems to the clinical domain. The corpus is available to the community and has been proposed for use in a SemEval 2015 task. PMID:29082229

  11. WormBase: Annotating many nematode genomes.

    Science.gov (United States)

    Howe, Kevin; Davis, Paul; Paulini, Michael; Tuli, Mary Ann; Williams, Gary; Yook, Karen; Durbin, Richard; Kersey, Paul; Sternberg, Paul W

    2012-01-01

    WormBase (www.wormbase.org) has been serving the scientific community for over 11 years as the central repository for genomic and genetic information for the soil nematode Caenorhabditis elegans. The resource has evolved from its beginnings as a database housing the genomic sequence and genetic and physical maps of a single species, and now represents the breadth and diversity of nematode research, currently serving genome sequence and annotation for around 20 nematodes. In this article, we focus on WormBase's role of genome sequence annotation, describing how we annotate and integrate data from a growing collection of nematode species and strains. We also review our approaches to sequence curation, and discuss the impact on annotation quality of large functional genomics projects such as modENCODE.

  12. Annotated Tsunami bibliography: 1962-1976

    International Nuclear Information System (INIS)

    Pararas-Carayannis, G.; Dong, B.; Farmer, R.

    1982-08-01

    This compilation contains annotated citations to nearly 3000 tsunami-related publications from 1962 to 1976 in English and several other languages. The foreign-language citations have English titles and abstracts

  13. Fluid Annotations in a Open World

    DEFF Research Database (Denmark)

    Zellweger, Polle Trescott; Bouvin, Niels Olof; Jehøj, Henning

    2001-01-01

    Fluid Documents use animated typographical changes to provide a novel and appealing user experience for hypertext browsing and for viewing document annotations in context. This paper describes an effort to broaden the utility of Fluid Documents by using the open hypermedia Arakne Environment...... to layer fluid annotations and links on top of abitrary HTML pages on the World Wide Web. Changes to both Fluid Documents and Arakne are required....

  14. Community annotation and bioinformatics workforce development in concert--Little Skate Genome Annotation Workshops and Jamborees.

    Science.gov (United States)

    Wang, Qinghua; Arighi, Cecilia N; King, Benjamin L; Polson, Shawn W; Vincent, James; Chen, Chuming; Huang, Hongzhan; Kingham, Brewster F; Page, Shallee T; Rendino, Marc Farnum; Thomas, William Kelley; Udwary, Daniel W; Wu, Cathy H

    2012-01-01

    Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome.

  15. Community annotation and bioinformatics workforce development in concert—Little Skate Genome Annotation Workshops and Jamborees

    Science.gov (United States)

    Wang, Qinghua; Arighi, Cecilia N.; King, Benjamin L.; Polson, Shawn W.; Vincent, James; Chen, Chuming; Huang, Hongzhan; Kingham, Brewster F.; Page, Shallee T.; Farnum Rendino, Marc; Thomas, William Kelley; Udwary, Daniel W.; Wu, Cathy H.

    2012-01-01

    Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome. PMID:22434832

  16. Annotation Method (AM): SE41_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available SE41_AM1 PowerGet annotation In annotation process, KEGG, KNApSAcK and LipidMAPS ar..., predicted molecular formulas are used for the annotation. MS/MS patterns was used to suggest functional gr...-MS Fragment Viewer (http://webs2.kazusa.or.jp/msmsfragmentviewer/) are used for annotation and identification of the compounds. ...

  17. JGI Plant Genomics Gene Annotation Pipeline

    Energy Technology Data Exchange (ETDEWEB)

    Shu, Shengqiang; Rokhsar, Dan; Goodstein, David; Hayes, David; Mitros, Therese

    2014-07-14

    Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward this aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.

  18. Annotation of microsporidian genomes using transcriptional signals.

    Science.gov (United States)

    Peyretaillade, Eric; Parisot, Nicolas; Polonais, Valérie; Terrat, Sébastien; Denonfoux, Jérémie; Dugat-Bony, Eric; Wawrzyniak, Ivan; Biderre-Petit, Corinne; Mahul, Antoine; Rimour, Sébastien; Gonçalves, Olivier; Bornes, Stéphanie; Delbac, Frédéric; Chebance, Brigitte; Duprat, Simone; Samson, Gaëlle; Katinka, Michael; Weissenbach, Jean; Wincker, Patrick; Peyret, Pierre

    2012-01-01

    High-quality annotation of microsporidian genomes is essential for understanding the biological processes that govern the development of these parasites. Here we present an improved structural annotation method using transcriptional DNA signals. We apply this method to re-annotate four previously annotated genomes, which allow us to detect annotation errors and identify a significant number of unpredicted genes. We then annotate the newly sequenced genome of Anncaliia algerae. A comparative genomic analysis of A. algerae permits the identification of not only microsporidian core genes, but also potentially highly expressed genes encoding membrane-associated proteins, which represent good candidates involved in the spore architecture, the invasion process and the microsporidian-host relationships. Furthermore, we find that the ten-fold variation in microsporidian genome sizes is not due to gene number, size or complexity, but instead stems from the presence of transposable elements. Such elements, along with kinase regulatory pathways and specific transporters, appear to be key factors in microsporidian adaptive processes.

  19. Annotating the human genome with Disease Ontology

    Science.gov (United States)

    Osborne, John D; Flatow, Jared; Holko, Michelle; Lin, Simon M; Kibbe, Warren A; Zhu, Lihua (Julie); Danila, Maria I; Feng, Gang; Chisholm, Rex L

    2009-01-01

    Background The human genome has been extensively annotated with Gene Ontology for biological functions, but minimally computationally annotated for diseases. Results We used the Unified Medical Language System (UMLS) MetaMap Transfer tool (MMTx) to discover gene-disease relationships from the GeneRIF database. We utilized a comprehensive subset of UMLS, which is disease-focused and structured as a directed acyclic graph (the Disease Ontology), to filter and interpret results from MMTx. The results were validated against the Homayouni gene collection using recall and precision measurements. We compared our results with the widely used Online Mendelian Inheritance in Man (OMIM) annotations. Conclusion The validation data set suggests a 91% recall rate and 97% precision rate of disease annotation using GeneRIF, in contrast with a 22% recall and 98% precision using OMIM. Our thesaurus-based approach allows for comparisons to be made between disease containing databases and allows for increased accuracy in disease identification through synonym matching. The much higher recall rate of our approach demonstrates that annotating human genome with Disease Ontology and GeneRIF for diseases dramatically increases the coverage of the disease annotation of human genome. PMID:19594883

  20. GRAIL and GenQuest Sequence Annotation Tools

    Energy Technology Data Exchange (ETDEWEB)

    Xu, Ying; Shah, Manesh B.; Einstein, J. Ralph; Parang, Morey; Snoddy, Jay; Petrov, Sergey; Olman, Victor; Zhang, Ge; Mural, Richard J.; Uberbacher, Edward C.

    1997-12-31

    Our goal is to develop and implement an integrated intelligent system which can recognize biologically significant features in DNA sequence and provide insight into the organization and function of regions of genomic DNA. GRAIL is a modular expert system which facilitates the recognition of gene features and provides an environment for the construction of sequence annotation. The last several years have seen a rapid evolution of the technology for analyzing genomic DNA sequences. The current GRAIL systems (including the e-mail, XGRAIL, JAVA-GRAIL and genQuest systems) are perhaps the most widely used, comprehensive, and user friendly systems available for computational characterization of genomic DNA sequence.

  1. annot8r: GO, EC and KEGG annotation of EST datasets

    Directory of Open Access Journals (Sweden)

    Schmid Ralf

    2008-04-01

    Full Text Available Abstract Background The expressed sequence tag (EST methodology is an attractive option for the generation of sequence data for species for which no completely sequenced genome is available. The annotation and comparative analysis of such datasets poses a formidable challenge for research groups that do not have the bioinformatics infrastructure of major genome sequencing centres. Therefore, there is a need for user-friendly tools to facilitate the annotation of non-model species EST datasets with well-defined ontologies that enable meaningful cross-species comparisons. To address this, we have developed annot8r, a platform for the rapid annotation of EST datasets with GO-terms, EC-numbers and KEGG-pathways. Results annot8r automatically downloads all files relevant for the annotation process and generates a reference database that stores UniProt entries, their associated Gene Ontology (GO, Enzyme Commission (EC and Kyoto Encyclopaedia of Genes and Genomes (KEGG annotation and additional relevant data. For each of GO, EC and KEGG, annot8r extracts a specific sequence subset from the UniProt dataset based on the information stored in the reference database. These three subsets are then formatted for BLAST searches. The user provides the protein or nucleotide sequences to be annotated and annot8r runs BLAST searches against these three subsets. The BLAST results are parsed and the corresponding annotations retrieved from the reference database. The annotations are saved both as flat files and also in a relational postgreSQL results database to facilitate more advanced searches within the results. annot8r is integrated with the PartiGene suite of EST analysis tools. Conclusion annot8r is a tool that assigns GO, EC and KEGG annotations for data sets resulting from EST sequencing projects both rapidly and efficiently. The benefits of an underlying relational database, flexibility and the ease of use of the program make it ideally suited for non

  2. Metannogen: annotation of biological reaction networks.

    Science.gov (United States)

    Gille, Christoph; Hübner, Katrin; Hoppe, Andreas; Holzhütter, Hermann-Georg

    2011-10-01

    Semantic annotations of the biochemical entities constituting a biological reaction network are indispensable to create biologically meaningful networks. They further heighten efficient exchange, reuse and merging of existing models which concern present-day systems biology research more often. Two types of tools for the reconstruction of biological networks currently exist: (i) several sophisticated programs support graphical network editing and visualization. (ii) Data management systems permit reconstruction and curation of huge networks in a team of scientists including data integration, annotation and cross-referencing. We seeked ways to combine the advantages of both approaches. Metannogen, which was previously developed for network reconstruction, has been considerably improved. From now on, Metannogen provides sbml import and annotation of networks created elsewhere. This permits users of other network reconstruction platforms or modeling software to annotate their networks using Metannogen's advanced information management. We implemented word-autocompletion, multipattern highlighting, spell check, brace-expansion and publication management, and improved annotation, cross-referencing and team work requirements. Unspecific enzymes and transporters acting on a spectrum of different substrates are efficiently handled. The network can be exported in sbml format where the annotations are embedded in line with the miriam standard. For more comfort, Metannogen may be tightly coupled with the network editor such that Metannogen becomes an additional view for the focused reaction in the network editor. Finally, Metannogen provides local single user, shared password protected multiuser or public access to the annotation data. Metannogen is available free of charge at: http://www.bioinformatics.org/strap/metannogen/ or http://3d-alignment.eu/metannogen/. christoph.gille@charite.de Supplementary data are available at Bioinformatics online.

  3. Semi-Semantic Annotation: A guideline for the URDU.KON-TB treebank POS annotation

    Directory of Open Access Journals (Sweden)

    Qaiser ABBAS

    2016-12-01

    Full Text Available This work elaborates the semi-semantic part of speech annotation guidelines for the URDU.KON-TB treebank: an annotated corpus. A hierarchical annotation scheme was designed to label the part of speech and then applied on the corpus. This raw corpus was collected from the Urdu Wikipedia and the Jang newspaper and then annotated with the proposed semi-semantic part of speech labels. The corpus contains text of local & international news, social stories, sports, culture, finance, religion, traveling, etc. This exercise finally contributed a part of speech annotation to the URDU.KON-TB treebank. Twenty-two main part of speech categories are divided into subcategories, which conclude the morphological, and semantical information encoded in it. This article reports the annotation guidelines in major; however, it also briefs the development of the URDU.KON-TB treebank, which includes the raw corpus collection, designing & employment of annotation scheme and finally, its statistical evaluation and results. The guidelines presented as follows, will be useful for linguistic community to annotate the sentences not only for the national language Urdu but for the other indigenous languages like Punjab, Sindhi, Pashto, etc., as well.

  4. MixtureTree annotator: a program for automatic colorization and visual annotation of MixtureTree.

    Directory of Open Access Journals (Sweden)

    Shu-Chuan Chen

    Full Text Available The MixtureTree Annotator, written in JAVA, allows the user to automatically color any phylogenetic tree in Newick format generated from any phylogeny reconstruction program and output the Nexus file. By providing the ability to automatically color the tree by sequence name, the MixtureTree Annotator provides a unique advantage over any other programs which perform a similar function. In addition, the MixtureTree Annotator is the only package that can efficiently annotate the output produced by MixtureTree with mutation information and coalescent time information. In order to visualize the resulting output file, a modified version of FigTree is used. Certain popular methods, which lack good built-in visualization tools, for example, MEGA, Mesquite, PHY-FI, TreeView, treeGraph and Geneious, may give results with human errors due to either manually adding colors to each node or with other limitations, for example only using color based on a number, such as branch length, or by taxonomy. In addition to allowing the user to automatically color any given Newick tree by sequence name, the MixtureTree Annotator is the only method that allows the user to automatically annotate the resulting tree created by the MixtureTree program. The MixtureTree Annotator is fast and easy-to-use, while still allowing the user full control over the coloring and annotating process.

  5. Active learning reduces annotation time for clinical concept extraction.

    Science.gov (United States)

    Kholghi, Mahnoosh; Sitbon, Laurianne; Zuccon, Guido; Nguyen, Anthony

    2017-10-01

    To investigate: (1) the annotation time savings by various active learning query strategies compared to supervised learning and a random sampling baseline, and (2) the benefits of active learning-assisted pre-annotations in accelerating the manual annotation process compared to de novo annotation. There are 73 and 120 discharge summary reports provided by Beth Israel institute in the train and test sets of the concept extraction task in the i2b2/VA 2010 challenge, respectively. The 73 reports were used in user study experiments for manual annotation. First, all sequences within the 73 reports were manually annotated from scratch. Next, active learning models were built to generate pre-annotations for the sequences selected by a query strategy. The annotation/reviewing time per sequence was recorded. The 120 test reports were used to measure the effectiveness of the active learning models. When annotating from scratch, active learning reduced the annotation time up to 35% and 28% compared to a fully supervised approach and a random sampling baseline, respectively. Reviewing active learning-assisted pre-annotations resulted in 20% further reduction of the annotation time when compared to de novo annotation. The number of concepts that require manual annotation is a good indicator of the annotation time for various active learning approaches as demonstrated by high correlation between time rate and concept annotation rate. Active learning has a key role in reducing the time required to manually annotate domain concepts from clinical free text, either when annotating from scratch or reviewing active learning-assisted pre-annotations. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. Annotating Logical Forms for EHR Questions.

    Science.gov (United States)

    Roberts, Kirk; Demner-Fushman, Dina

    2016-05-01

    This paper discusses the creation of a semantically annotated corpus of questions about patient data in electronic health records (EHRs). The goal is to provide the training data necessary for semantic parsers to automatically convert EHR questions into a structured query. A layered annotation strategy is used which mirrors a typical natural language processing (NLP) pipeline. First, questions are syntactically analyzed to identify multi-part questions. Second, medical concepts are recognized and normalized to a clinical ontology. Finally, logical forms are created using a lambda calculus representation. We use a corpus of 446 questions asking for patient-specific information. From these, 468 specific questions are found containing 259 unique medical concepts and requiring 53 unique predicates to represent the logical forms. We further present detailed characteristics of the corpus, including inter-annotator agreement results, and describe the challenges automatic NLP systems will face on this task.

  7. Annotation of selection strengths in viral genomes

    DEFF Research Database (Denmark)

    McCauley, Stephen; de Groot, Saskia; Mailund, Thomas

    2007-01-01

    - and intergenomic regions. The presence of multiple coding regions complicates the concept of Ka/Ks ratio, and thus begs for an alternative approach when investigating selection strengths. Building on the paper by McCauley & Hein (2006), we develop a method for annotating a viral genome coding in overlapping...... may thus achieve an annotation both of coding regions as well as selection strengths, allowing us to investigate different selection patterns and hypotheses. Results: We illustrate our method by applying it to a multiple alignment of four HIV2 sequences, as well as four Hepatitis B sequences. We...... obtain an annotation of the coding regions, as well as a posterior probability for each site of the strength of selection acting on it. From this we may deduce the average posterior selection acting on the different genes. Whilst we are encouraged to see in HIV2, that the known to be conserved genes gag...

  8. Motion lecture annotation system to learn Naginata performances

    Science.gov (United States)

    Kobayashi, Daisuke; Sakamoto, Ryota; Nomura, Yoshihiko

    2013-12-01

    This paper describes a learning assistant system using motion capture data and annotation to teach "Naginata-jutsu" (a skill to practice Japanese halberd) performance. There are some video annotation tools such as YouTube. However these video based tools have only single angle of view. Our approach that uses motion-captured data allows us to view any angle. A lecturer can write annotations related to parts of body. We have made a comparison of effectiveness between the annotation tool of YouTube and the proposed system. The experimental result showed that our system triggered more annotations than the annotation tool of YouTube.

  9. An Annotated Dataset of 14 Meat Images

    DEFF Research Database (Denmark)

    Stegmann, Mikkel Bille

    2002-01-01

    This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given.......This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....

  10. Software for computing and annotating genomic ranges.

    Science.gov (United States)

    Lawrence, Michael; Huber, Wolfgang; Pagès, Hervé; Aboyoun, Patrick; Carlson, Marc; Gentleman, Robert; Morgan, Martin T; Carey, Vincent J

    2013-01-01

    We describe Bioconductor infrastructure for representing and computing on annotated genomic ranges and integrating genomic data with the statistical computing features of R and its extensions. At the core of the infrastructure are three packages: IRanges, GenomicRanges, and GenomicFeatures. These packages provide scalable data structures for representing annotated ranges on the genome, with special support for transcript structures, read alignments and coverage vectors. Computational facilities include efficient algorithms for overlap and nearest neighbor detection, coverage calculation and other range operations. This infrastructure directly supports more than 80 other Bioconductor packages, including those for sequence analysis, differential expression analysis and visualization.

  11. Software for computing and annotating genomic ranges.

    Directory of Open Access Journals (Sweden)

    Michael Lawrence

    Full Text Available We describe Bioconductor infrastructure for representing and computing on annotated genomic ranges and integrating genomic data with the statistical computing features of R and its extensions. At the core of the infrastructure are three packages: IRanges, GenomicRanges, and GenomicFeatures. These packages provide scalable data structures for representing annotated ranges on the genome, with special support for transcript structures, read alignments and coverage vectors. Computational facilities include efficient algorithms for overlap and nearest neighbor detection, coverage calculation and other range operations. This infrastructure directly supports more than 80 other Bioconductor packages, including those for sequence analysis, differential expression analysis and visualization.

  12. Ranking Biomedical Annotations with Annotator’s Semantic Relevancy

    Directory of Open Access Journals (Sweden)

    Aihua Wu

    2014-01-01

    Full Text Available Biomedical annotation is a common and affective artifact for researchers to discuss, show opinion, and share discoveries. It becomes increasing popular in many online research communities, and implies much useful information. Ranking biomedical annotations is a critical problem for data user to efficiently get information. As the annotator’s knowledge about the annotated entity normally determines quality of the annotations, we evaluate the knowledge, that is, semantic relationship between them, in two ways. The first is extracting relational information from credible websites by mining association rules between an annotator and a biomedical entity. The second way is frequent pattern mining from historical annotations, which reveals common features of biomedical entities that an annotator can annotate with high quality. We propose a weighted and concept-extended RDF model to represent an annotator, a biomedical entity, and their background attributes and merge information from the two ways as the context of an annotator. Based on that, we present a method to rank the annotations by evaluating their correctness according to user’s vote and the semantic relevancy between the annotator and the annotated entity. The experimental results show that the approach is applicable and efficient even when data set is large.

  13. Qcorp: an annotated classification corpus of Chinese health questions.

    Science.gov (United States)

    Guo, Haihong; Na, Xu; Li, Jiao

    2018-03-22

    Health question-answering (QA) systems have become a typical application scenario of Artificial Intelligent (AI). An annotated question corpus is prerequisite for training machines to understand health information needs of users. Thus, we aimed to develop an annotated classification corpus of Chinese health questions (Qcorp) and make it openly accessible. We developed a two-layered classification schema and corresponding annotation rules on basis of our previous work. Using the schema, we annotated 5000 questions that were randomly selected from 5 Chinese health websites within 6 broad sections. 8 annotators participated in the annotation task, and the inter-annotator agreement was evaluated to ensure the corpus quality. Furthermore, the distribution and relationship of the annotated tags were measured by descriptive statistics and social network map. The questions were annotated using 7101 tags that covers 29 topic categories in the two-layered schema. In our released corpus, the distribution of questions on the top-layered categories was treatment of 64.22%, diagnosis of 37.14%, epidemiology of 14.96%, healthy lifestyle of 10.38%, and health provider choice of 4.54% respectively. Both the annotated health questions and annotation schema were openly accessible on the Qcorp website. Users can download the annotated Chinese questions in CSV, XML, and HTML format. We developed a Chinese health question corpus including 5000 manually annotated questions. It is openly accessible and would contribute to the intelligent health QA system development.

  14. Computer systems for annotation of single molecule fragments

    Science.gov (United States)

    Schwartz, David Charles; Severin, Jessica

    2016-07-19

    There are provided computer systems for visualizing and annotating single molecule images. Annotation systems in accordance with this disclosure allow a user to mark and annotate single molecules of interest and their restriction enzyme cut sites thereby determining the restriction fragments of single nucleic acid molecules. The markings and annotations may be automatically generated by the system in certain embodiments and they may be overlaid translucently onto the single molecule images. An image caching system may be implemented in the computer annotation systems to reduce image processing time. The annotation systems include one or more connectors connecting to one or more databases capable of storing single molecule data as well as other biomedical data. Such diverse array of data can be retrieved and used to validate the markings and annotations. The annotation systems may be implemented and deployed over a computer network. They may be ergonomically optimized to facilitate user interactions.

  15. Evaluation of Three Automated Genome Annotations for Halorhabdus utahensis

    DEFF Research Database (Denmark)

    Bakke, Peter; Carney, Nick; DeLoache, Will

    2009-01-01

    Genome annotations are accumulating rapidly and depend heavily on automated annotation systems. Many genome centers offer annotation systems but no one has compared their output in a systematic way to determine accuracy and inherent errors. Errors in the annotations are routinely deposited...... in databases such as NCBI and used to validate subsequent annotation errors. We submitted the genome sequence of halophilic archaeon Halorhabdus utahensis to be analyzed by three genome annotation services. We have examined the output from each service in a variety of ways in order to compare the methodology...... and effectiveness of the annotations, as well as to explore the genes, pathways, and physiology of the previously unannotated genome. The annotation services differ considerably in gene calls, features, and ease of use. We had to manually identify the origin of replication and the species-specific consensus...

  16. Bibliografia de Aztlan: An Annotated Chicano Bibliography.

    Science.gov (United States)

    Barrios, Ernie, Ed.

    More than 300 books and articles published from 1920 to 1971 are reviewed in this annotated bibliography of literature on the Chicano. The citations and reviews are categorized by subject area and deal with contemporary Chicano history, education, health, history of Mexico, literature, native Americans, philosophy, political science, pre-Columbian…

  17. DIMA – Annotation guidelines for German intonation

    DEFF Research Database (Denmark)

    Kügler, Frank; Smolibocki, Bernadett; Arnold, Denis

    2015-01-01

    This paper presents newly developed guidelines for prosodic annotation of German as a consensus system agreed upon by German intonologists. The DIMA system is rooted in the framework of autosegmental-metrical phonology. One important goal of the consensus is to make exchanging data between groups...

  18. Structuring and presenting annotated media repositories

    NARCIS (Netherlands)

    L. Rutledge (Lloyd); J.R. van Ossenbruggen (Jacco); L. Hardman (Lynda)

    2004-01-01

    textabstractThe Semantic Web envisions a Web that is both human readable and machine processible. In practice, however, there is still a large conceptual gap between annotated content repositories on the one hand, and coherent, human readable Web pages on the other. To bridge this conceptual gap,

  19. Canonical Processes of Semantically Annotated Media Production

    NARCIS (Netherlands)

    Hardman, L.; Obrenović, Ž.; Nack, F.; Troncy, R.; Huet, B.; Schenk, S.

    2011-01-01

    While many multimedia systems allow the association of semantic annotations with media assets, there is no agreed way of sharing these among systems. This chapter identifies a small number of fundamental processes of media production, which the author terms canonical processes, which can be

  20. Canonical processes of semantically annotated media production

    NARCIS (Netherlands)

    L. Hardman (Lynda); Z. Obrenovic; F.-M. Nack (Frank); B. Kerhervé; K. Piersol

    2008-01-01

    htmlabstractWhile many multimedia systems allow the association of semantic annotations with media assets, there is no agreed-upon way of sharing these among systems. As an initial step within the multimedia community, we identify a small number of fundamental processes of media production, which we

  1. Canonical processes of semantically annotated media production

    NARCIS (Netherlands)

    Hardman, L.; Obrenović, Ž.; Nack, F.; Kerhervé, B.; Piersol, K.

    2008-01-01

    While many multimedia systems allow the association of semantic annotations with media assets, there is no agreed-upon way of sharing these among systems. As an initial step within the multimedia community, we identify a small number of fundamental processes of media production, which we term

  2. Suggested Books for Children: An Annotated Bibliography

    Science.gov (United States)

    NHSA Dialog, 2008

    2008-01-01

    This article provides an annotated bibliography of various children's books. It includes listings of books that illustrate the dynamic relationships within the natural environment, economic context, racial and cultural identities, cross-group similarities and differences, gender, different abilities and stories of injustice and resistance.

  3. Teaching Creative Writing: A Selective, Annotated Bibliography.

    Science.gov (United States)

    Bishop, Wendy; And Others

    Focusing on pedagogical issues in creative writing, this annotated bibliography reviews 149 books, articles, and dissertations in the fields of creative writing and composition, and, selectively, feminist and literary theory. Anthologies of original writing and reference books are not included. (MM)

  4. Statistical mechanics of ontology based annotations

    Science.gov (United States)

    Hoyle, David C.; Brass, Andrew

    2016-01-01

    We present a statistical mechanical theory of the process of annotating an object with terms selected from an ontology. The term selection process is formulated as an ideal lattice gas model, but in a highly structured inhomogeneous field. The model enables us to explain patterns recently observed in real-world annotation data sets, in terms of the underlying graph structure of the ontology. By relating the external field strengths to the information content of each node in the ontology graph, the statistical mechanical model also allows us to propose a number of practical metrics for assessing the quality of both the ontology, and the annotations that arise from its use. Using the statistical mechanical formalism we also study an ensemble of ontologies of differing size and complexity; an analysis not readily performed using real data alone. Focusing on regular tree ontology graphs we uncover a rich set of scaling laws describing the growth in the optimal ontology size as the number of objects being annotated increases. In doing so we provide a further possible measure for assessment of ontologies.

  5. An Annotated Bibliography in Financial Therapy

    Directory of Open Access Journals (Sweden)

    Dorothy B. Durband

    2010-10-01

    Full Text Available The following annotated bibliography contains a summary of articles and websites, as well as a list of books related to financial therapy. The resources were compiled through e-mail solicitation from members of the Financial Therapy Forum in November 2008. Members of the forum are marked with an asterisk.

  6. Just-in-time : on strategy annotations

    NARCIS (Netherlands)

    J.C. van de Pol (Jaco)

    2001-01-01

    textabstractA simple kind of strategy annotations is investigated, giving rise to a class of strategies, including leftmost-innermost. It is shown that under certain restrictions, an interpreter can be written which computes the normal form of a term in a bottom-up traversal. The main contribution

  7. Multimedia Annotations on the Semantic Web

    NARCIS (Netherlands)

    Stamou, G.; Ossenbruggen, J.R.; Pan, J.; Schreiber, A.T.

    2006-01-01

    Multimedia in all forms (images, video, graphics, music, speech) is exploding on the Web. The content needs to be annotated and indexed to enable effective search and retrieval. However, recent standards and best practices for multimedia metadata don't provide semantically rich descriptions of

  8. Multimedia Annotations on the Semantic Web

    NARCIS (Netherlands)

    G. Stamou; J.R. van Ossenbruggen (Jacco); J.Z. Pan (Jeff); G. Schreiber (Guus)

    2006-01-01

    textabstractMultimedia in all forms (images, video, graphics, music, speech) is exploding on the Web. The content needs to be annotated and indexed to enable effective search and retrieval. However, recent standards and best practices for multimedia metadata don't provide semantically rich

  9. La Mujer Chicana: An Annotated Bibliography, 1976.

    Science.gov (United States)

    Chapa, Evey, Ed.; And Others

    Intended to provide interested persons, researchers, and educators with information about "la mujer Chicana", this annotated bibliography cites 320 materials published between 1916 and 1975, with the majority being between 1960 and 1975. The 12 sections cover the following subject areas: Chicana publications; Chicana feminism and…

  10. Annotated Bibliography of EDGE2D Use

    International Nuclear Information System (INIS)

    Strachan, J.D.; Corrigan, G.

    2005-01-01

    This annotated bibliography is intended to help EDGE2D users, and particularly new users, find existing published literature that has used EDGE2D. Our idea is that a person can find existing studies which may relate to his intended use, as well as gain ideas about other possible applications by scanning the attached tables

  11. Male-Female Sexuality: An Annotated Bibliography.

    Science.gov (United States)

    Wilson, Janice

    This annotated bibliography contains over 500 sources on the historical and contemporary development and expression of male and female sexuality. There are 68 topic headings which provide easy access for subject areas. A major portion of the bibliography is devoted to contemporary male-female sexuality. These materials consist of research findings…

  12. Mulligan Concept manual therapy: standardizing annotation.

    Science.gov (United States)

    McDowell, Jillian Marie; Johnson, Gillian Margaret; Hetherington, Barbara Helen

    2014-10-01

    Quality technique documentation is integral to the practice of manual therapy, ensuring uniform application and reproducibility of treatment. Manual therapy techniques are described by annotations utilizing a range of acronyms, abbreviations and universal terminology based on biomechanical and anatomical concepts. The various combinations of therapist and patient generated forces utilized in a variety of weight-bearing positions, which are synonymous with Mulligan Concept, challenge practitioners existing annotational skills. An annotation framework with recording rules adapted to the Mulligan Concept is proposed in which the abbreviations incorporate established manual therapy tenets and are detailed in the following sequence of; starting position, side, joint/s, method of application, glide/s, Mulligan technique, movement (or function), whether an assistant is used, overpressure (and by whom) and numbers of repetitions or time and sets. Therapist or patient application of overpressure and utilization of treatment belts or manual techniques must be recorded to capture the complete description. The adoption of the Mulligan Concept annotation framework in this way for documentation purposes will provide uniformity and clarity of information transfer for the future purposes of teaching, clinical practice and audit for its practitioners. Copyright © 2014 Elsevier Ltd. All rights reserved.

  13. An Annotated Publications List on Homelessness.

    Science.gov (United States)

    Tutunjian, Beth Ann

    This annotated publications list on homelessness contains citations for 19 publications, most of which deal with problems of alcohol or drug abuse among homeless persons. Citations are listed alphabetically by author and cover the topics of homelessness and alcoholism, drug abuse, public policy, research methodologies, mental illness, alcohol- and…

  14. Book Reviews, Annotation, and Web Technology.

    Science.gov (United States)

    Schulze, Patricia

    From reading texts to annotating web pages, grade 6-8 students rely on group cooperation and individual reading and writing skills in this research project that spans six 50-minute lessons. Student objectives for this project are that they will: read, discuss, and keep a journal on a book in literature circles; understand the elements of and…

  15. Genotyping and annotation of Affymetrix SNP arrays

    DEFF Research Database (Denmark)

    Lamy, Philippe; Andersen, Claus Lindbjerg; Wikman, Friedrik

    2006-01-01

    allows us to annotate SNPs that have poor performance, either because of poor experimental conditions or because for one of the alleles the probes do not behave in a dose-response manner. Generally, our method agrees well with a method developed by Affymetrix. When both methods make a call they agree...

  16. Snap: an integrated SNP annotation platform

    DEFF Research Database (Denmark)

    Li, Shengting; Ma, Lijia; Li, Heng

    2007-01-01

    Snap (Single Nucleotide Polymorphism Annotation Platform) is a server designed to comprehensively analyze single genes and relationships between genes basing on SNPs in the human genome. The aim of the platform is to facilitate the study of SNP finding and analysis within the framework of medical...

  17. Annotating State of Mind in Meeting Data

    NARCIS (Netherlands)

    Heylen, Dirk K.J.; Reidsma, Dennis; Ordelman, Roeland J.F.; Devillers, L.; Martin, J-C.; Cowie, R.; Batliner, A.

    We discuss the annotation procedure for mental state and emotion that is under development for the AMI (Augmented Multiparty Interaction) corpus. The categories that were found to be most appropriate relate not only to emotions but also to (meta-)cognitive states and interpersonal variables. The

  18. ePNK Applications and Annotations

    DEFF Research Database (Denmark)

    Kindler, Ekkart

    2017-01-01

    newapplicationsfor the ePNK and, in particular, visualizing the result of an application in the graphical editor of the ePNK by singannotations, and interacting with the end user using these annotations. In this paper, we give an overview of the concepts of ePNK applications by discussing the implementation...

  19. Evaluating automatically annotated treebanks for linguistic research

    NARCIS (Netherlands)

    Bloem, J.; Bański, P.; Kupietz, M.; Lüngen, H.; Witt, A.; Barbaresi, A.; Biber, H.; Breiteneder, E.; Clematide, S.

    2016-01-01

    This study discusses evaluation methods for linguists to use when employing an automatically annotated treebank as a source of linguistic evidence. While treebanks are usually evaluated with a general measure over all the data, linguistic studies often focus on a particular construction or a group

  20. Indiana Newspaper History: An Annotated Bibliography.

    Science.gov (United States)

    Popovich, Mark, Comp.; And Others

    The purposes of this bibliography are to bring together materials that relate to the history of newspapers in Indiana and to assess, in a general way, the value of the material. The bibliography contains 415 entries, with descriptive annotations, arranged in seven sections: books; special materials; general newspaper histories and lists of…

  1. Multiview Hessian regularization for image annotation.

    Science.gov (United States)

    Liu, Weifeng; Tao, Dacheng

    2013-07-01

    The rapid development of computer hardware and Internet technology makes large scale data dependent models computationally tractable, and opens a bright avenue for annotating images through innovative machine learning algorithms. Semisupervised learning (SSL) therefore received intensive attention in recent years and was successfully deployed in image annotation. One representative work in SSL is Laplacian regularization (LR), which smoothes the conditional distribution for classification along the manifold encoded in the graph Laplacian, however, it is observed that LR biases the classification function toward a constant function that possibly results in poor generalization. In addition, LR is developed to handle uniformly distributed data (or single-view data), although instances or objects, such as images and videos, are usually represented by multiview features, such as color, shape, and texture. In this paper, we present multiview Hessian regularization (mHR) to address the above two problems in LR-based image annotation. In particular, mHR optimally combines multiple HR, each of which is obtained from a particular view of instances, and steers the classification function that varies linearly along the data manifold. We apply mHR to kernel least squares and support vector machines as two examples for image annotation. Extensive experiments on the PASCAL VOC'07 dataset validate the effectiveness of mHR by comparing it with baseline algorithms, including LR and HR.

  2. Annotated Bibliography of English for Special Purposes.

    Science.gov (United States)

    Allix, Beverley, Comp.

    This annotated bibliography covers the following types of materials of use to teachers of English for Special Purposes: (1) books, monographs, reports, and conference papers; (2) periodical articles and essays in collections; (3) theses and dissertations; (4) bibliographies; (5) dictionaries; and (6) textbooks in series by publisher. Section (1)…

  3. Great Basin Experimental Range: Annotated bibliography

    Science.gov (United States)

    E. Durant McArthur; Bryce A. Richardson; Stanley G. Kitchen

    2013-01-01

    This annotated bibliography documents the research that has been conducted on the Great Basin Experimental Range (GBER, also known as the Utah Experiment Station, Great Basin Station, the Great Basin Branch Experiment Station, Great Basin Experimental Center, and other similar name variants) over the 102 years of its existence. Entries were drawn from the original...

  4. Chemical Principles Revisited: Annotating Reaction Equations.

    Science.gov (United States)

    Tykodi, R. J.

    1987-01-01

    Urges chemistry teachers to have students annotate the chemical reactions in aqueous-solutions that they see in their textbooks and witness in the laboratory. Suggests this will help students recognize the reaction type more readily. Examples are given for gas formation, precipitate formation, redox interaction, acid-base interaction, and…

  5. MEETING: Chlamydomonas Annotation Jamboree - October 2003

    Energy Technology Data Exchange (ETDEWEB)

    Grossman, Arthur R

    2007-04-13

    Shotgun sequencing of the nuclear genome of Chlamydomonas reinhardtii (Chlamydomonas throughout) was performed at an approximate 10X coverage by JGI. Roughly half of the genome is now contained on 26 scaffolds, all of which are at least 1.6 Mb, and the coverage of the genome is ~95%. There are now over 200,000 cDNA sequence reads that we have generated as part of the Chlamydomonas genome project (Grossman, 2003; Shrager et al., 2003; Grossman et al. 2007; Merchant et al., 2007); other sequences have also been generated by the Kasuza sequence group (Asamizu et al., 1999; Asamizu et al., 2000) or individual laboratories that have focused on specific genes. Shrager et al. (2003) placed the reads into distinct contigs (an assemblage of reads with overlapping nucleotide sequences), and contigs that group together as part of the same genes have been designated ACEs (assembly of contigs generated from EST information). All of the reads have also been mapped to the Chlamydomonas nuclear genome and the cDNAs and their corresponding genomic sequences have been reassembled, and the resulting assemblage is called an ACEG (an Assembly of contiguous EST sequences supported by genomic sequence) (Jain et al., 2007). Most of the unique genes or ACEGs are also represented by gene models that have been generated by the Joint Genome Institute (JGI, Walnut Creek, CA). These gene models have been placed onto the DNA scaffolds and are presented as a track on the Chlamydomonas genome browser associated with the genome portal (http://genome.jgi-psf.org/Chlre3/Chlre3.home.html). Ultimately, the meeting grant awarded by DOE has helped enormously in the development of an annotation pipeline (a set of guidelines used in the annotation of genes) and resulted in high quality annotation of over 4,000 genes; the annotators were from both Europe and the USA. Some of the people who led the annotation initiative were Arthur Grossman, Olivier Vallon, and Sabeeha Merchant (with many individual

  6. ASAP: Amplification, sequencing & annotation of plastomes

    Directory of Open Access Journals (Sweden)

    Folta Kevin M

    2005-12-01

    Full Text Available Abstract Background Availability of DNA sequence information is vital for pursuing structural, functional and comparative genomics studies in plastids. Traditionally, the first step in mining the valuable information within a chloroplast genome requires sequencing a chloroplast plasmid library or BAC clones. These activities involve complicated preparatory procedures like chloroplast DNA isolation or identification of the appropriate BAC clones to be sequenced. Rolling circle amplification (RCA is being used currently to amplify the chloroplast genome from purified chloroplast DNA and the resulting products are sheared and cloned prior to sequencing. Herein we present a universal high-throughput, rapid PCR-based technique to amplify, sequence and assemble plastid genome sequence from diverse species in a short time and at reasonable cost from total plant DNA, using the large inverted repeat region from strawberry and peach as proof of concept. The method exploits the highly conserved coding regions or intergenic regions of plastid genes. Using an informatics approach, chloroplast DNA sequence information from 5 available eudicot plastomes was aligned to identify the most conserved regions. Cognate primer pairs were then designed to generate ~1 – 1.2 kb overlapping amplicons from the inverted repeat region in 14 diverse genera. Results 100% coverage of the inverted repeat region was obtained from Arabidopsis, tobacco, orange, strawberry, peach, lettuce, tomato and Amaranthus. Over 80% coverage was obtained from distant species, including Ginkgo, loblolly pine and Equisetum. Sequence from the inverted repeat region of strawberry and peach plastome was obtained, annotated and analyzed. Additionally, a polymorphic region identified from gel electrophoresis was sequenced from tomato and Amaranthus. Sequence analysis revealed large deletions in these species relative to tobacco plastome thus exhibiting the utility of this method for structural and

  7. Automatic Semantic Annotation of Music with Harmonic Structure

    OpenAIRE

    Weyde, T.

    2007-01-01

    This paper presents an annotation model for harmonic structure of a piece of music, and a rule system that supports the automatic generation of harmonic annotations. Musical structure has so far received relatively little attention in the context of musical metadata and annotation, although it is highly relevant for musicians, musicologists and indirectly for music listeners. Activities in semantic annotation of music have so far mostly concentrated on features derived from audio data and fil...

  8. BEACON: automated tool for Bacterial GEnome Annotation ComparisON

    KAUST Repository

    Kalkatawi, Manal M.

    2015-08-18

    Background Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). Results The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON’s utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27 %, while the number of genes without any function assignment is reduced. Conclusions We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/

  9. BEACON: automated tool for Bacterial GEnome Annotation ComparisON.

    Science.gov (United States)

    Kalkatawi, Manal; Alam, Intikhab; Bajic, Vladimir B

    2015-08-18

    Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON's utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27%, while the number of genes without any function assignment is reduced. We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/ .

  10. A SANE approach to annotation in the digital edition

    NARCIS (Netherlands)

    Boot, P.; Braungart, Georg; Jannidis, Fotis; Gendolla, Peter

    2007-01-01

    Robinson and others have recently called for dynamic and collaborative digital scholarly editions. Annotation is a key component for editions that are not merely passive, read-only repositories of knowledge. Annotation facilities (both annotation creation and display), however, require complex

  11. GRaSP: A Multilayered Annotation Scheme for Perspectives

    NARCIS (Netherlands)

    van Son, C.M.; Caselli, T.; Fokkens, A.S.; Maks, E.; Morante Vallejo, R.; Aroyo, L.M.; Vossen, P.T.J.M.

    2016-01-01

    This paper presents a framework and methodology for the annotation of perspectives in text. In the last decade, different aspects of linguistic encoding of perspectives have been targeted as separated phenomena through different annotation initiatives. We propose an annotation scheme that integrates

  12. Annotation Method (AM): SE22_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available SE22_AM1 Annotation based on a grading system Collected mass spectral features, tog...ether with predicted molecular formulae and putative structures, were provided as metabolite annotations. Co...mparison with public databases was performed. A grading system was introduced to describe the evidence supporting the annotations. ...

  13. Annotation of the protein coding regions of the equine genome

    DEFF Research Database (Denmark)

    Hestand, Matthew S.; Kalbfleisch, Theodore S.; Coleman, Stephen J.

    2015-01-01

    Current gene annotation of the horse genome is largely derived from in silico predictions and cross-species alignments. Only a small number of genes are annotated based on equine EST and mRNA sequences. To expand the number of equine genes annotated from equine experimental evidence, we sequenced m...

  14. Systematic interpretation of microarray data using experiment annotations

    Directory of Open Access Journals (Sweden)

    Frohme Marcus

    2006-12-01

    Full Text Available Abstract Background Up to now, microarray data are mostly assessed in context with only one or few parameters characterizing the experimental conditions under study. More explicit experiment annotations, however, are highly useful for interpreting microarray data, when available in a statistically accessible format. Results We provide means to preprocess these additional data, and to extract relevant traits corresponding to the transcription patterns under study. We found correspondence analysis particularly well-suited for mapping such extracted traits. It visualizes associations both among and between the traits, the hereby annotated experiments, and the genes, revealing how they are all interrelated. Here, we apply our methods to the systematic interpretation of radioactive (single channel and two-channel data, stemming from model organisms such as yeast and drosophila up to complex human cancer samples. Inclusion of technical parameters allows for identification of artifacts and flaws in experimental design. Conclusion Biological and clinical traits can act as landmarks in transcription space, systematically mapping the variance of large datasets from the predominant changes down toward intricate details.

  15. Annotation of the Domestic Pig Genome by Quantitative Proteogenomics.

    Science.gov (United States)

    Marx, Harald; Hahne, Hannes; Ulbrich, Susanne E; Schnieke, Angelika; Rottmann, Oswald; Frishman, Dmitrij; Kuster, Bernhard

    2017-08-04

    The pig is one of the earliest domesticated animals in the history of human civilization and represents one of the most important livestock animals. The recent sequencing of the Sus scrofa genome was a major step toward the comprehensive understanding of porcine biology, evolution, and its utility as a promising large animal model for biomedical and xenotransplantation research. However, the functional and structural annotation of the Sus scrofa genome is far from complete. Here, we present mass spectrometry-based quantitative proteomics data of nine juvenile organs and six embryonic stages between 18 and 39 days after gestation. We found that the data provide evidence for and improve the annotation of 8176 protein-coding genes including 588 novel and 321 refined gene models. The analysis of tissue-specific proteins and the temporal expression profiles of embryonic proteins provides an initial functional characterization of expressed protein interaction networks and modules including as yet uncharacterized proteins. Comparative transcript and protein expression analysis to human organs reveal a moderate conservation of protein translation across species. We anticipate that this resource will facilitate basic and applied research on Sus scrofa as well as its porcine relatives.

  16. Identifying and annotating human bifunctional RNAs reveals their versatile functions.

    Science.gov (United States)

    Chen, Geng; Yang, Juan; Chen, Jiwei; Song, Yunjie; Cao, Ruifang; Shi, Tieliu; Shi, Leming

    2016-10-01

    Bifunctional RNAs that possess both protein-coding and noncoding functional properties were less explored and poorly understood. Here we systematically explored the characteristics and functions of such human bifunctional RNAs by integrating tandem mass spectrometry and RNA-seq data. We first constructed a pipeline to identify and annotate bifunctional RNAs, leading to the characterization of 132 high-confidence bifunctional RNAs. Our analyses indicate that bifunctional RNAs may be involved in human embryonic development and can be functional in diverse tissues. Moreover, bifunctional RNAs could interact with multiple miRNAs and RNA-binding proteins to exert their corresponding roles. Bifunctional RNAs may also function as competing endogenous RNAs to regulate the expression of many genes by competing for common targeting miRNAs. Finally, somatic mutations of diverse carcinomas may generate harmful effect on corresponding bifunctional RNAs. Collectively, our study not only provides the pipeline for identifying and annotating bifunctional RNAs but also reveals their important gene-regulatory functions.

  17. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes

    Energy Technology Data Exchange (ETDEWEB)

    Brettin, Thomas; Davis, James J.; Disz, Terry; Edwards, Robert A.; Gerdes, Svetlana; Olsen, Gary J.; Olson, Robert; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D.; Shukla, Maulik; Thomason, James A.; Stevens, Rick; Vonstein, Veronika; Wattam, Alice R.; Xia, Fangfang

    2015-02-10

    The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.

  18. RASTtk: a modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes.

    Science.gov (United States)

    Brettin, Thomas; Davis, James J; Disz, Terry; Edwards, Robert A; Gerdes, Svetlana; Olsen, Gary J; Olson, Robert; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D; Shukla, Maulik; Thomason, James A; Stevens, Rick; Vonstein, Veronika; Wattam, Alice R; Xia, Fangfang

    2015-02-10

    The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.

  19. Multi-Atlas Segmentation using Partially Annotated Data: Methods and Annotation Strategies.

    Science.gov (United States)

    Koch, Lisa M; Rajchl, Martin; Bai, Wenjia; Baumgartner, Christian F; Tong, Tong; Passerat-Palmbach, Jonathan; Aljabar, Paul; Rueckert, Daniel

    2017-08-22

    Multi-atlas segmentation is a widely used tool in medical image analysis, providing robust and accurate results by learning from annotated atlas datasets. However, the availability of fully annotated atlas images for training is limited due to the time required for the labelling task. Segmentation methods requiring only a proportion of each atlas image to be labelled could therefore reduce the workload on expert raters tasked with annotating atlas images. To address this issue, we first re-examine the labelling problem common in many existing approaches and formulate its solution in terms of a Markov Random Field energy minimisation problem on a graph connecting atlases and the target image. This provides a unifying framework for multi-atlas segmentation. We then show how modifications in the graph configuration of the proposed framework enable the use of partially annotated atlas images and investigate different partial annotation strategies. The proposed method was evaluated on two Magnetic Resonance Imaging (MRI) datasets for hippocampal and cardiac segmentation. Experiments were performed aimed at (1) recreating existing segmentation techniques with the proposed framework and (2) demonstrating the potential of employing sparsely annotated atlas data for multi-atlas segmentation.

  20. Molecular characterization and genetic diversity of Jatropha curcas L. in Costa Rica

    Directory of Open Access Journals (Sweden)

    Marcela Vásquez-Mayorga

    2017-02-01

    Full Text Available We estimated the genetic diversity of 50 Jatropha curcas samples from the Costa Rican germplasm bank using 18 EST-SSR, one G-SSR and nrDNA-ITS markers. We also evaluated the phylogenetic relationships among samples using nuclear ribosomal ITS markers. Non-toxicity was evaluated using G-SSRs and SCARs markers. A Neighbor-Joining (NJ tree and a Maximum Likelihood (ML tree were constructed using SSR markers and ITS sequences, respectively. Heterozygosity was moderate (He = 0.346, but considerable compared to worldwide values for J. curcas. The PIC (PIC = 0.274 and inbreeding coefficient (f =  − 0.102 were both low. Clustering was not related to the geographical origin of accessions. International accessions clustered independently of collection sites, suggesting a lack of genetic structure, probably due to the wide distribution of this crop and ample gene flow. Molecular markers identified only one non-toxic accession (JCCR-24 from Mexico. This work is part of a countrywide effort to characterize the genetic diversity of the Jatropha curcas germplasm bank in Costa Rica.

  1. FragKB: structural and literature annotation resource of conserved peptide fragments and residues.

    Directory of Open Access Journals (Sweden)

    Ashish V Tendulkar

    Full Text Available BACKGROUND: FragKB (Fragment Knowledgebase is a repository of clusters of structurally similar fragments from proteins. Fragments are annotated with information at the level of sequence, structure and function, integrating biological descriptions derived from multiple existing resources and text mining. METHODOLOGY: FragKB contains approximately 400,000 conserved fragments from 4,800 representative proteins from PDB. Literature annotations are extracted from more than 1,700 articles and are available for over 12,000 fragments. The underlying systematic annotation workflow of FragKB ensures efficient update and maintenance of this database. The information in FragKB can be accessed through a web interface that facilitates sequence and structural visualization of fragments together with known literature information on the consequences of specific residue mutations and functional annotations of proteins and fragment clusters. FragKB is accessible online at http://ubio.bioinfo.cnio.es/biotools/fragkb/. SIGNIFICANCE: The information presented in FragKB can be used for modeling protein structures, for designing novel proteins and for functional characterization of related fragments. The current release is focused on functional characterization of proteins through inspection of conservation of the fragments.

  2. Annotation: Hyperlexia: disability or superability?

    Science.gov (United States)

    Grigorenko, Elena L; Klin, Ami; Volkmar, Fred

    2003-11-01

    Hyperlexia is the phenomenon of spontaneous and precocious mastery of single-word reading that has been of interest to clinicians and researchers since the beginning of the last century. An extensive search of publications on the subject of hyperlexia was undertaken and all available publications were reviewed. The literature can be subdivided into discussions of the following issues: (1) whether hyperlexia is a phenomenon that is characteristic only of specific clinical populations (e.g., children with developmental delays) or whether it can also be observed in the general population; (2) whether hyperlexia is a distinct syndrome comorbid with a number of different disorders or whether it is a part of the spectrum of some other clinical condition(s); (3) whether hyperlexia should be defined through single-word reading superiority with regard to reading comprehension, vocabulary, general intelligence, any combination of the three, or all three characteristics; (4) whether there is a specific neuropsychological profile associated with hyperlexia; (5) whether hyperlexia is characterized by a particular developmental profile; and (6) whether hyperlexia should be viewed as a disability (deficit) or superability (talent). We interpret the literature as supporting the view that hyperlexia is a superability demonstrated by a very specific group of individuals with developmental disorders (defined through unexpected single-word reading in the context of otherwise suppressed intellectual functioning) rather than as a disability exhibited by a portion of the general population (defined through a discrepancy between levels of single-word reading and comprehension). We simultaneously argue, however, that multifaceted and multi-methodological approaches to studying the phenomenon of hyperlexia, defined within the research framework of understanding single-word reading, are warranted and encouraged.

  3. Development of EST-SSR markers in flowering Chinese cabbage (Brassica campestris L. ssp. chinensis var. utilis Tsen et Lee) based on de novo transcriptomeic assemblies

    Science.gov (United States)

    Flowering Chinese cabbage is one of the most important vegetable crops in southern China. Genetic improvement of various agronomic traits in this crop is underway to meet high market demand in the region, but the progress is hampered by limited number of molecular markers available in this crop. Thi...

  4. Cadec: A corpus of adverse drug event annotations.

    Science.gov (United States)

    Karimi, Sarvnaz; Metke-Jimenez, Alejandro; Kemp, Madonna; Wang, Chen

    2015-06-01

    CSIRO Adverse Drug Event Corpus (Cadec) is a new rich annotated corpus of medical forum posts on patient-reported Adverse Drug Events (ADEs). The corpus is sourced from posts on social media, and contains text that is largely written in colloquial language and often deviates from formal English grammar and punctuation rules. Annotations contain mentions of concepts such as drugs, adverse effects, symptoms, and diseases linked to their corresponding concepts in controlled vocabularies, i.e., SNOMED Clinical Terms and MedDRA. The quality of the annotations is ensured by annotation guidelines, multi-stage annotations, measuring inter-annotator agreement, and final review of the annotations by a clinical terminologist. This corpus is useful for studies in the area of information extraction, or more generally text mining, from social media to detect possible adverse drug reactions from direct patient reports. The corpus is publicly available at https://data.csiro.au.(1). Copyright © 2015 Elsevier Inc. All rights reserved.

  5. Automated annotation of microbial proteomes in SWISS-PROT.

    Science.gov (United States)

    Gattiker, Alexandre; Michoud, Karine; Rivoire, Catherine; Auchincloss, Andrea H; Coudert, Elisabeth; Lima, Tania; Kersey, Paul; Pagni, Marco; Sigrist, Christian J A; Lachaize, Corinne; Veuthey, Anne Lise; Gasteiger, Elisabeth; Bairoch, Amos

    2003-02-01

    Large-scale sequencing of prokaryotic genomes demands the automation of certain annotation tasks currently manually performed in the production of the SWISS-PROT protein knowledgebase. The HAMAP project, or 'High-quality Automated and Manual Annotation of microbial Proteomes', aims to integrate manual and automatic annotation methods in order to enhance the speed of the curation process while preserving the quality of the database annotation. Automatic annotation is only applied to entries that belong to manually defined orthologous families and to entries with no identifiable similarities (ORFans). Many checks are enforced in order to prevent the propagation of wrong annotation and to spot problematic cases, which are channelled to manual curation. The results of this annotation are integrated in SWISS-PROT, and a website is provided at http://www.expasy.org/sprot/hamap/.

  6. Annotated trajectories and the Space-Time-Cube

    DEFF Research Database (Denmark)

    Kveladze, Irma; Kraak, Menno-Jan

    2012-01-01

    Movement data is collected by nearly everyone at any time. This data is not limited the trajectories of people, today’s technology also allows the simultaneous collection of trip related annotations like photos, video’s, voice, and texts. The combination of trajectories and annotations is a rich...... source to monitor movement in a context and discover known and unknown patterns. Often the annotations are implicitly geotagged by the gps-enabled devices like phones and cameras which are used to collect the annotations. This allows a match between the track and annotation based on coordinates....... Otherwise the trajectories and annotations can be matched based on their respective time stamps. The geotagged material is often used on social media sites to exchange the whereabouts of people. The annotations are place on dedicated site such as Flickr and Panoramio. Via mash-ups it is also possible...

  7. Applied bioinformatics: Genome annotation and transcriptome analysis

    DEFF Research Database (Denmark)

    Gupta, Vikas

    japonicus (Lotus), Vaccinium corymbosum (blueberry), Stegodyphus mimosarum (spider) and Trifolium occidentale (clover). From a bioinformatics data analysis perspective, my work can be divided into three parts; genome annotation, small RNA, and gene expression analysis. Lotus is a legume of significant...... biology and genetics studies. We present an improved Lotus genome assembly and annotation, a catalog of natural variation based on re-sequencing of 29 accessions, and describe the involvement of small RNAs in the plant-bacteria symbiosis. Blueberries contain anthocyanins, other pigments and various...... polyphenolic compounds, which have been linked to protection against diabetes, cardiovascular disease and age-related cognitive decline. We present the first genome- guided approach in blueberry to identify genes involved in the synthesis of health-protective compounds. Using RNA-Seq data from five stages...

  8. Annotating functional RNAs in genomes using Infernal.

    Science.gov (United States)

    Nawrocki, Eric P

    2014-01-01

    Many different types of functional non-coding RNAs participate in a wide range of important cellular functions but the large majority of these RNAs are not routinely annotated in published genomes. Several programs have been developed for identifying RNAs, including specific tools tailored to a particular RNA family as well as more general ones designed to work for any family. Many of these tools utilize covariance models (CMs), statistical models of the conserved sequence, and structure of an RNA family. In this chapter, as an illustrative example, the Infernal software package and CMs from the Rfam database are used to identify RNAs in the genome of the archaeon Methanobrevibacter ruminantium, uncovering some additional RNAs not present in the genome's initial annotation. Analysis of the results and comparison with family-specific methods demonstrate some important strengths and weaknesses of this general approach.

  9. Deburring: an annotated bibliography. Volume V

    Energy Technology Data Exchange (ETDEWEB)

    Gillespie, L.K.

    1978-01-01

    An annotated summary of 204 articles and publications on burrs, burr prevention and deburring is presented. Thirty-seven deburring processes are listed. Entries cited include English, Russian, French, Japanese and German language articles. Entries are indexed by deburring processes, author, and language. Indexes also indicate which references discuss equipment and tooling, how to use a process, economics, burr properties, and how to design to minimize burr problems. Research studies are identified as are the materials deburred.

  10. Deburring: an annotated bibliography. Volume V

    International Nuclear Information System (INIS)

    Gillespie, L.K.

    1978-01-01

    An annotated summary of 204 articles and publications on burrs, burr prevention and deburring is presented. Thirty-seven deburring processes are listed. Entries cited include English, Russian, French, Japanese and German language articles. Entries are indexed by deburring processes, author, and language. Indexes also indicate which references discuss equipment and tooling, how to use a process, economics, burr properties, and how to design to minimize burr problems. Research studies are identified as are the materials deburred

  11. Gastrointestinal hormone research - with a Scandinavian annotation

    DEFF Research Database (Denmark)

    Rehfeld, Jens F

    2015-01-01

    Gastrointestinal hormones are peptides released from neuroendocrine cells in the digestive tract. More than 30 hormone genes are currently known to be expressed in the gut, which makes it the largest hormone-producing organ in the body. Modern biology makes it feasible to conceive the hormones un......, but also constitute regulatory systems operating in the whole organism. This overview of gut hormone biology is supplemented with an annotation on some Scandinavian contributions to gastrointestinal hormone research....

  12. Nonlinear Deep Kernel Learning for Image Annotation.

    Science.gov (United States)

    Jiu, Mingyuan; Sahbi, Hichem

    2017-02-08

    Multiple kernel learning (MKL) is a widely used technique for kernel design. Its principle consists in learning, for a given support vector classifier, the most suitable convex (or sparse) linear combination of standard elementary kernels. However, these combinations are shallow and often powerless to capture the actual similarity between highly semantic data, especially for challenging classification tasks such as image annotation. In this paper, we redefine multiple kernels using deep multi-layer networks. In this new contribution, a deep multiple kernel is recursively defined as a multi-layered combination of nonlinear activation functions, each one involves a combination of several elementary or intermediate kernels, and results into a positive semi-definite deep kernel. We propose four different frameworks in order to learn the weights of these networks: supervised, unsupervised, kernel-based semisupervised and Laplacian-based semi-supervised. When plugged into support vector machines (SVMs), the resulting deep kernel networks show clear gain, compared to several shallow kernels for the task of image annotation. Extensive experiments and analysis on the challenging ImageCLEF photo annotation benchmark, the COREL5k database and the Banana dataset validate the effectiveness of the proposed method.

  13. Annotating breast cancer microarray samples using ontologies

    Science.gov (United States)

    Liu, Hongfang; Li, Xin; Yoon, Victoria; Clarke, Robert

    2008-01-01

    As the most common cancer among women, breast cancer results from the accumulation of mutations in essential genes. Recent advance in high-throughput gene expression microarray technology has inspired researchers to use the technology to assist breast cancer diagnosis, prognosis, and treatment prediction. However, the high dimensionality of microarray experiments and public access of data from many experiments have caused inconsistencies which initiated the development of controlled terminologies and ontologies for annotating microarray experiments, such as the standard microarray Gene Expression Data (MGED) ontology (MO). In this paper, we developed BCM-CO, an ontology tailored specifically for indexing clinical annotations of breast cancer microarray samples from the NCI Thesaurus. Our research showed that the coverage of NCI Thesaurus is very limited with respect to i) terms used by researchers to describe breast cancer histology (covering 22 out of 48 histology terms); ii) breast cancer cell lines (covering one out of 12 cell lines); and iii) classes corresponding to the breast cancer grading and staging. By incorporating a wider range of those terms into BCM-CO, we were able to indexed breast cancer microarray samples from GEO using BCM-CO and MGED ontology and developed a prototype system with web interface that allows the retrieval of microarray data based on the ontology annotations. PMID:18999108

  14. Annotation Graphs: A Graph-Based Visualization for Meta-Analysis of Data Based on User-Authored Annotations.

    Science.gov (United States)

    Zhao, Jian; Glueck, Michael; Breslav, Simon; Chevalier, Fanny; Khan, Azam

    2017-01-01

    User-authored annotations of data can support analysts in the activity of hypothesis generation and sensemaking, where it is not only critical to document key observations, but also to communicate insights between analysts. We present annotation graphs, a dynamic graph visualization that enables meta-analysis of data based on user-authored annotations. The annotation graph topology encodes annotation semantics, which describe the content of and relations between data selections, comments, and tags. We present a mixed-initiative approach to graph layout that integrates an analyst's manual manipulations with an automatic method based on similarity inferred from the annotation semantics. Various visual graph layout styles reveal different perspectives on the annotation semantics. Annotation graphs are implemented within C8, a system that supports authoring annotations during exploratory analysis of a dataset. We apply principles of Exploratory Sequential Data Analysis (ESDA) in designing C8, and further link these to an existing task typology in the visualization literature. We develop and evaluate the system through an iterative user-centered design process with three experts, situated in the domain of analyzing HCI experiment data. The results suggest that annotation graphs are effective as a method of visually extending user-authored annotations to data meta-analysis for discovery and organization of ideas.

  15. Global profiling of Shewanella oneidensis MR-1: Expression of hypothetical genes and improved functional annotations

    Energy Technology Data Exchange (ETDEWEB)

    Picone, Alex F. [Biatech, Bothell WA; Galperin, Michael Y. [National Center for Biotechnology Information; Romine, Margaret [Pacific Northwest National Laboratory (PNNL); Higdon, Roger [Biatech, Bothell WA; Makarova, Kira S. [National Center for Biotechnology Information; Kolker, Natali [Biatech, Bothell WA; Anderson, Gordon A [ORNL; Qiu, Xiaoyun [ORNL; Babnigg, Gyorgy [Oak Ridge National Laboratory (ORNL); Beliaev, Alexander S [ORNL; Edlefsen, Paul [Biatech, Bothell WA; Elias, Dwayne A. [Pacific Northwest National Laboratory (PNNL); Gorby, Dr. Yuri A. [J. Craig Venter Institute; Holzman, Ted [Biatech, Bothell WA; Klappenbach, Joel [Michigan State University, East Lansing; Konstantinidis, Konstantinos T [Michigan State University, East Lansing; Land, Miriam L [ORNL; Lipton, Mary S. [Pacific Northwest National Laboratory (PNNL); McCue, Lee Ann [Pacific Northwest National Laboratory (PNNL); Monroe, Matthew [Pacific Northwest National Laboratory (PNNL); Pasa-Tolic, Ljiljana [Pacific Northwest National Laboratory (PNNL); Pinchuk, Grigoriy [Pacific Northwest National Laboratory (PNNL); Purvine, Samuel [Pacific Northwest National Laboratory (PNNL); Serres, Margrethe H. [Woods Hole Oceanographic Institution (WHOI), Woods Hole, MA; Tsapin, Sasha [University of Southern California; Zakrajsek, Brian A. [Pacific Northwest National Laboratory (PNNL); Zhu, Wenguang [Harvard University; Zhou, Jizhong [University of Oklahoma; Larimer, Frank W [ORNL; Lawrence, Charles E. [Wadsworth Center, Albany, NY; Riley, Monica [Woods Hole Oceanographic Institution (WHOI), Woods Hole, MA; Collart, Frank [Argonne National Laboratory (ANL); YatesIII, John R. [Scripps Research Institute, The, La Jolla, CA; Smith, Richard D. [Pacific Northwest National Laboratory (PNNL); Nealson, Kenneth H. [University of Southern California; Fredrickson, James K [Pacific Northwest National Laboratory (PNNL); Tiedje, James M. [Michigan State University, East Lansing

    2005-01-01

    The gamma-proteobacterium Shewanella oneidensis strain MR-1 is a metabolically versatile organism that can reduce a wide range of organic compounds, metal ions, and radionuclides. Similar to most other sequenced organisms, approximate to40% of the predicted ORFs in the S. oneidensis genome were annotated as uncharacterized "hypothetical" genes. We implemented an integrative approach by using experimental and computational analyses to provide more detailed insight into gene function. Global expression profiles were determined for cells after UV irradiation and under aerobic and suboxic growth conditions. Transcriptomic and proteomic analyses confidently identified 538 hypothetical genes as expressed in S. oneidensis cells both as mRNAs and proteins (33% of all predicted hypothetical proteins). Publicly available analysis tools and databases and the expression data were applied to improve the annotation of these genes. The annotation results were scored by using a seven-category schema that ranked both confidence and precision of the functional assignment. We were able to identify homologs for nearly all of these hypothetical proteins (97%), but could confidently assign exact biochemical functions for only 16 proteins (category 1; 3%). Altogether, computational and experimental evidence provided functional assignments or insights for 240 more genes (categories 2-5; 45%). These functional annotations advance our understanding of genes involved in vital cellular processes, including energy conversion, ion transport, secondary metabolism, and signal transduction. We propose that this integrative approach offers a valuable means to undertake the enormous challenge of characterizing the rapidly growing number of hypothetical proteins with each newly sequenced genome.

  16. Similarity maps and hierarchical clustering for annotating FT-IR spectral images.

    Science.gov (United States)

    Zhong, Qiaoyong; Yang, Chen; Großerüschkamp, Frederik; Kallenbach-Thieltges, Angela; Serocka, Peter; Gerwert, Klaus; Mosig, Axel

    2013-11-20

    Unsupervised segmentation of multi-spectral images plays an important role in annotating infrared microscopic images and is an essential step in label-free spectral histopathology. In this context, diverse clustering approaches have been utilized and evaluated in order to achieve segmentations of Fourier Transform Infrared (FT-IR) microscopic images that agree with histopathological characterization. We introduce so-called interactive similarity maps as an alternative annotation strategy for annotating infrared microscopic images. We demonstrate that segmentations obtained from interactive similarity maps lead to similarly accurate segmentations as segmentations obtained from conventionally used hierarchical clustering approaches. In order to perform this comparison on quantitative grounds, we provide a scheme that allows to identify non-horizontal cuts in dendrograms. This yields a validation scheme for hierarchical clustering approaches commonly used in infrared microscopy. We demonstrate that interactive similarity maps may identify more accurate segmentations than hierarchical clustering based approaches, and thus are a viable and due to their interactive nature attractive alternative to hierarchical clustering. Our validation scheme furthermore shows that performance of hierarchical two-means is comparable to the traditionally used Ward's clustering. As the former is much more efficient in time and memory, our results suggest another less resource demanding alternative for annotating large spectral images.

  17. Gene Ontology annotation of the rice blast fungus, Magnaporthe oryzae

    Directory of Open Access Journals (Sweden)

    Deng Jixin

    2009-02-01

    Full Text Available Abstract Background Magnaporthe oryzae, the causal agent of blast disease of rice, is the most destructive disease of rice worldwide. The genome of this fungal pathogen has been sequenced and an automated annotation has recently been updated to Version 6 http://www.broad.mit.edu/annotation/genome/magnaporthe_grisea/MultiDownloads.html. However, a comprehensive manual curation remains to be performed. Gene Ontology (GO annotation is a valuable means of assigning functional information using standardized vocabulary. We report an overview of the GO annotation for Version 5 of M. oryzae genome assembly. Methods A similarity-based (i.e., computational GO annotation with manual review was conducted, which was then integrated with a literature-based GO annotation with computational assistance. For similarity-based GO annotation a stringent reciprocal best hits method was used to identify similarity between predicted proteins of M. oryzae and GO proteins from multiple organisms with published associations to GO terms. Significant alignment pairs were manually reviewed. Functional assignments were further cross-validated with manually reviewed data, conserved domains, or data determined by wet lab experiments. Additionally, biological appropriateness of the functional assignments was manually checked. Results In total, 6,286 proteins received GO term assignment via the homology-based annotation, including 2,870 hypothetical proteins. Literature-based experimental evidence, such as microarray, MPSS, T-DNA insertion mutation, or gene knockout mutation, resulted in 2,810 proteins being annotated with GO terms. Of these, 1,673 proteins were annotated with new terms developed for Plant-Associated Microbe Gene Ontology (PAMGO. In addition, 67 experiment-determined secreted proteins were annotated with PAMGO terms. Integration of the two data sets resulted in 7,412 proteins (57% being annotated with 1,957 distinct and specific GO terms. Unannotated proteins

  18. Plann: A command-line application for annotating plastome sequences.

    Science.gov (United States)

    Huang, Daisie I; Cronk, Quentin C B

    2015-08-01

    Plann automates the process of annotating a plastome sequence in GenBank format for either downstream processing or for GenBank submission by annotating a new plastome based on a similar, well-annotated plastome. Plann is a Perl script to be executed on the command line. Plann compares a new plastome sequence to the features annotated in a reference plastome and then shifts the intervals of any matching features to the locations in the new plastome. Plann's output can be used in the National Center for Biotechnology Information's tbl2asn to create a Sequin file for GenBank submission. Unlike Web-based annotation packages, Plann is a locally executable script that will accurately annotate a plastome sequence to a locally specified reference plastome. Because it executes from the command line, it is ready to use in other software pipelines and can be easily rerun as a draft plastome is improved.

  19. Annotated bibliography of Software Engineering Laboratory literature

    Science.gov (United States)

    Morusiewicz, Linda; Valett, Jon D.

    1991-01-01

    An annotated bibliography of technical papers, documents, and memorandums produced by or related to the Software Engineering Laboratory is given. More than 100 publications are summarized. These publications cover many areas of software engineering and range from research reports to software documentation. All materials have been grouped into eight general subject areas for easy reference: The Software Engineering Laboratory; The Software Engineering Laboratory: Software Development Documents; Software Tools; Software Models; Software Measurement; Technology Evaluations; Ada Technology; and Data Collection. Subject and author indexes further classify these documents by specific topic and individual author.

  20. GARNET – gene set analysis with exploration of annotation relations

    Directory of Open Access Journals (Sweden)

    Seo Jihae

    2011-02-01

    Full Text Available Abstract Background Gene set analysis is a powerful method of deducing biological meaning for an a priori defined set of genes. Numerous tools have been developed to test statistical enrichment or depletion in specific pathways or gene ontology (GO terms. Major difficulties towards biological interpretation are integrating diverse types of annotation categories and exploring the relationships between annotation terms of similar information. Results GARNET (Gene Annotation Relationship NEtwork Tools is an integrative platform for gene set analysis with many novel features. It includes tools for retrieval of genes from annotation database, statistical analysis & visualization of annotation relationships, and managing gene sets. In an effort to allow access to a full spectrum of amassed biological knowledge, we have integrated a variety of annotation data that include the GO, domain, disease, drug, chromosomal location, and custom-defined annotations. Diverse types of molecular networks (pathways, transcription and microRNA regulations, protein-protein interaction are also included. The pair-wise relationship between annotation gene sets was calculated using kappa statistics. GARNET consists of three modules - gene set manager, gene set analysis and gene set retrieval, which are tightly integrated to provide virtually automatic analysis for gene sets. A dedicated viewer for annotation network has been developed to facilitate exploration of the related annotations. Conclusions GARNET (gene annotation relationship network tools is an integrative platform for diverse types of gene set analysis, where complex relationships among gene annotations can be easily explored with an intuitive network visualization tool (http://garnet.isysbio.org/ or http://ercsb.ewha.ac.kr/garnet/.

  1. Third party annotation gene data set of eutherian lysozyme genes

    Directory of Open Access Journals (Sweden)

    Marko Premzl

    2014-12-01

    Full Text Available The eutherian comparative genomic analysis protocol annotated most comprehensive eutherian lysozyme gene data set. Among 209 potential coding sequences, the third party annotation gene data set of eutherian lysozyme genes included 116 complete coding sequences that first described seven major gene clusters. As one new framework of future experiments, the present integrated gene annotations, phylogenetic analysis and protein molecular evolution analysis proposed new classification and nomenclature of eutherian lysozyme genes.

  2. Third party annotation gene data set of eutherian lysozyme genes

    OpenAIRE

    Premzl, Marko

    2014-01-01

    The eutherian comparative genomic analysis protocol annotated most comprehensive eutherian lysozyme gene data set. Among 209 potential coding sequences, the third party annotation gene data set of eutherian lysozyme genes included 116 complete coding sequences that first described seven major gene clusters. As one new framework of future experiments, the present integrated gene annotations, phylogenetic analysis and protein molecular evolution analysis proposed new classification and nomencla...

  3. African American Literature, 1989-94: An Annotated Bibliography.

    Science.gov (United States)

    Miller, R. Baxter; Butts, Tracy; Jones, Sharon

    1997-01-01

    Contains an annotated bibliography of African American literature (published between 1989 and 1994), including anthologies, fiction, poetry, drama, criticism, cultural studies, biography, interviews, and letters. (TB)

  4. Review of actinide-sediment reactions with an annotated bibliography

    Energy Technology Data Exchange (ETDEWEB)

    Ames, L.L.; Rai, D.; Serne, R.J.

    1976-02-10

    The annotated bibliography is divided into sections on chemistry and geochemistry, migration and accumulation, cultural distributions, natural distributions, and bibliographies and annual reviews. (LK)

  5. A Novel Approach to Semantic and Coreference Annotation at LLNL

    Energy Technology Data Exchange (ETDEWEB)

    Firpo, M

    2005-02-04

    A case is made for the importance of high quality semantic and coreference annotation. The challenges of providing such annotation are described. Asperger's Syndrome is introduced, and the connections are drawn between the needs of text annotation and the abilities of persons with Asperger's Syndrome to meet those needs. Finally, a pilot program is recommended wherein semantic annotation is performed by people with Asperger's Syndrome. The primary points embodied in this paper are as follows: (1) Document annotation is essential to the Natural Language Processing (NLP) projects at Lawrence Livermore National Laboratory (LLNL); (2) LLNL does not currently have a system in place to meet its need for text annotation; (3) Text annotation is challenging for a variety of reasons, many related to its very rote nature; (4) Persons with Asperger's Syndrome are particularly skilled at rote verbal tasks, and behavioral experts agree that they would excel at text annotation; and (6) A pilot study is recommend in which two to three people with Asperger's Syndrome annotate documents and then the quality and throughput of their work is evaluated relative to that of their neuro-typical peers.

  6. Annotation Method (AM): SE40_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available SE40_AM1 PowerGet annotation In annotation process, KEGG, KNApSAcK and LipidMAPS ar...can assign, predicted molecular formulas are used for the annotation. MS/MS patterns was used to suggest fun...p/) and MS-MS Fragment Viewer (http://webs2.kazusa.or.jp/msmsfragmentviewer/) are used for ann...lcone, Nicotinamide, Nicotinate, Pantothenate, Phloretin, Prunin, Rutin, S-Adenosyl-L-methionine, Tomatine, UMP, Uridine) are used for annotation and identification of the compounds. ...

  7. Correction of the Caulobacter crescentus NA1000 genome annotation.

    Directory of Open Access Journals (Sweden)

    Bert Ely

    Full Text Available Bacterial genome annotations are accumulating rapidly in the GenBank database and the use of automated annotation technologies to create these annotations has become the norm. However, these automated methods commonly result in a small, but significant percentage of genome annotation errors. To improve accuracy and reliability, we analyzed the Caulobacter crescentus NA1000 genome utilizing computer programs Artemis and MICheck to manually examine the third codon position GC content, alignment to a third codon position GC frame plot peak, and matches in the GenBank database. We identified 11 new genes, modified the start site of 113 genes, and changed the reading frame of 38 genes that had been incorrectly annotated. Furthermore, our manual method of identifying protein-coding genes allowed us to remove 112 non-coding regions that had been designated as coding regions. The improved NA1000 genome annotation resulted in a reduction in the use of rare codons since noncoding regions with atypical codon usage were removed from the annotation and 49 new coding regions were added to the annotation. Thus, a more accurate codon usage table was generated as well. These results demonstrate that a comparison of the location of peaks third codon position GC content to the location of protein coding regions could be used to verify the annotation of any genome that has a GC content that is greater than 60%.

  8. Feedback Driven Annotation and Refactoring of Parallel Programs

    DEFF Research Database (Denmark)

    Larsen, Per

    , some program properties are beyond reach of such analysis for theoretical and practical reasons - but can be described by programmers. Three aspects are explored. The first is annotation of the source code. Two annotations are introduced. These allow more accurate modeling of parallelism...... and communication in embedded programs. Runtime checks are developed to ensure that annotations correctly describe observable program behavior. The performance impact of runtime checking is evaluated on several benchmark kernels and is negligible in all cases. The second aspect is compilation feedback. Annotations...

  9. Annotating non-coding regions of the genome.

    Science.gov (United States)

    Alexander, Roger P; Fang, Gang; Rozowsky, Joel; Snyder, Michael; Gerstein, Mark B

    2010-08-01

    Most of the human genome consists of non-protein-coding DNA. Recently, progress has been made in annotating these non-coding regions through the interpretation of functional genomics experiments and comparative sequence analysis. One can conceptualize functional genomics analysis as involving a sequence of steps: turning the output of an experiment into a 'signal' at each base pair of the genome; smoothing this signal and segmenting it into small blocks of initial annotation; and then clustering these small blocks into larger derived annotations and networks. Finally, one can relate functional genomics annotations to conserved units and measures of conservation derived from comparative sequence analysis.

  10. Proteogenomic analysis of polymorphisms and gene annotation divergences in prokaryotes using a clustered mass spectrometry-friendly database.

    NARCIS (Netherlands)

    Souza, G.A. de; Arntzen, M.O.; Fortuin, S.; Schurch, A.C.; Malen, H.; McEvoy, C.R.; Soolingen, D. van; Thiede, B.; Warren, R.M.; Wiker, H.G.

    2011-01-01

    Precise annotation of genes or open reading frames is still a difficult task that results in divergence even for data generated from the same genomic sequence. This has an impact in further proteomic studies, and also compromises the characterization of clinical isolates with many specific genetic

  11. Discovering and annotating fish early life-stage (FELS) adverse outcome pathways: Putting the research strategy into practice

    Science.gov (United States)

    In May 2012, a HESI-sponsored expert workshop yielded a proposed research strategy for systematically discovering, characterizing, and annotating fish early life-stage (FELS) adverse outcome pathways (AOPs) as well as prioritizing AOP development in light of current restrictions ...

  12. Phenex: ontological annotation of phenotypic diversity.

    Directory of Open Access Journals (Sweden)

    James P Balhoff

    2010-05-01

    Full Text Available Phenotypic differences among species have long been systematically itemized and described by biologists in the process of investigating phylogenetic relationships and trait evolution. Traditionally, these descriptions have been expressed in natural language within the context of individual journal publications or monographs. As such, this rich store of phenotype data has been largely unavailable for statistical and computational comparisons across studies or integration with other biological knowledge.Here we describe Phenex, a platform-independent desktop application designed to facilitate efficient and consistent annotation of phenotypic similarities and differences using Entity-Quality syntax, drawing on terms from community ontologies for anatomical entities, phenotypic qualities, and taxonomic names. Phenex can be configured to load only those ontologies pertinent to a taxonomic group of interest. The graphical user interface was optimized for evolutionary biologists accustomed to working with lists of taxa, characters, character states, and character-by-taxon matrices.Annotation of phenotypic data using ontologies and globally unique taxonomic identifiers will allow biologists to integrate phenotypic data from different organisms and studies, leveraging decades of work in systematics and comparative morphology.

  13. Training nuclei detection algorithms with simple annotations

    Directory of Open Access Journals (Sweden)

    Henning Kost

    2017-01-01

    Full Text Available Background: Generating good training datasets is essential for machine learning-based nuclei detection methods. However, creating exhaustive nuclei contour annotations, to derive optimal training data from, is often infeasible. Methods: We compared different approaches for training nuclei detection methods solely based on nucleus center markers. Such markers contain less accurate information, especially with regard to nuclear boundaries, but can be produced much easier and in greater quantities. The approaches use different automated sample extraction methods to derive image positions and class labels from nucleus center markers. In addition, the approaches use different automated sample selection methods to improve the detection quality of the classification algorithm and reduce the run time of the training process. We evaluated the approaches based on a previously published generic nuclei detection algorithm and a set of Ki-67-stained breast cancer images. Results: A Voronoi tessellation-based sample extraction method produced the best performing training sets. However, subsampling of the extracted training samples was crucial. Even simple class balancing improved the detection quality considerably. The incorporation of active learning led to a further increase in detection quality. Conclusions: With appropriate sample extraction and selection methods, nuclei detection algorithms trained on the basis of simple center marker annotations can produce comparable quality to algorithms trained on conventionally created training sets.

  14. Evolutionary interrogation of human biology in well-annotated genomic framework of rhesus macaque.

    Science.gov (United States)

    Zhang, Shi-Jian; Liu, Chu-Jun; Yu, Peng; Zhong, Xiaoming; Chen, Jia-Yu; Yang, Xinzhuang; Peng, Jiguang; Yan, Shouyu; Wang, Chenqu; Zhu, Xiaotong; Xiong, Jingwei; Zhang, Yong E; Tan, Bertrand Chin-Ming; Li, Chuan-Yun

    2014-05-01

    With genome sequence and composition highly analogous to human, rhesus macaque represents a unique reference for evolutionary studies of human biology. Here, we developed a comprehensive genomic framework of rhesus macaque, the RhesusBase2, for evolutionary interrogation of human genes and the associated regulations. A total of 1,667 next-generation sequencing (NGS) data sets were processed, integrated, and evaluated, generating 51.2 million new functional annotation records. With extensive NGS annotations, RhesusBase2 refined the fine-scale structures in 30% of the macaque Ensembl transcripts, reporting an accurate, up-to-date set of macaque gene models. On the basis of these annotations and accurate macaque gene models, we further developed an NGS-oriented Molecular Evolution Gateway to access and visualize macaque annotations in reference to human orthologous genes and associated regulations (www.rhesusbase.org/molEvo). We highlighted the application of this well-annotated genomic framework in generating hypothetical link of human-biased regulations to human-specific traits, by using mechanistic characterization of the DIEXF gene as an example that provides novel clues to the understanding of digestive system reduction in human evolution. On a global scale, we also identified a catalog of 9,295 human-biased regulatory events, which may represent novel elements that have a substantial impact on shaping human transcriptome and possibly underpin recent human phenotypic evolution. Taken together, we provide an NGS data-driven, information-rich framework that will broadly benefit genomics research in general and serves as an important resource for in-depth evolutionary studies of human biology.

  15. Annotated receipts capture household food purchases from a broad range of sources

    Directory of Open Access Journals (Sweden)

    Shimotsu Scott T

    2009-07-01

    Full Text Available Abstract Background Accurate measurement of household food purchase behavior (HFPB is important for understanding its association with household characteristics, individual dietary intake and neighborhood food retail outlets. However, little research has been done to develop measures of HFPB. The main objective of this paper is to describe the development of a measure of HFPB using annotated food purchase receipts. Methods Households collected and annotated food purchase receipts for a four-week period as part of the baseline assessment of a household nutrition intervention. Receipts were collected from all food sources, including grocery stores and restaurants. Households (n = 90 were recruited from the community as part of an obesity prevention intervention conducted in 2007–2008 in Minneapolis, Minnesota, USA. Household primary shoppers were trained to follow a standardized receipt collection and annotation protocol. Annotated receipts were mailed weekly to research staff. Staff coded the receipt data and entered it into a database. Total food dollars, proportion of food dollars, and ounces of food purchased were examined for different food sources and food categories. Descriptive statistics and correlations are presented. Results A total of 2,483 receipts were returned by 90 households. Home sources comprised 45% of receipts and eating-out sources 55%. Eating-out entrees were proportionally the largest single food category based on counts (16.6% and dollars ($106 per month. Two-week expenditures were highly correlated (r = 0.83 with four-week expenditures. Conclusion Receipt data provided important quantitative information about HFPB from a wide range of sources and food categories. Two weeks may be adequate to reliably characterize HFPB using annotated receipts.

  16. Annotated receipts capture household food purchases from a broad range of sources.

    Science.gov (United States)

    French, Simone A; Wall, Melanie; Mitchell, Nathan R; Shimotsu, Scott T; Welsh, Ericka

    2009-07-01

    Accurate measurement of household food purchase behavior (HFPB) is important for understanding its association with household characteristics, individual dietary intake and neighborhood food retail outlets. However, little research has been done to develop measures of HFPB. The main objective of this paper is to describe the development of a measure of HFPB using annotated food purchase receipts. Households collected and annotated food purchase receipts for a four-week period as part of the baseline assessment of a household nutrition intervention. Receipts were collected from all food sources, including grocery stores and restaurants. Households (n = 90) were recruited from the community as part of an obesity prevention intervention conducted in 2007-2008 in Minneapolis, Minnesota, USA. Household primary shoppers were trained to follow a standardized receipt collection and annotation protocol. Annotated receipts were mailed weekly to research staff. Staff coded the receipt data and entered it into a database. Total food dollars, proportion of food dollars, and ounces of food purchased were examined for different food sources and food categories. Descriptive statistics and correlations are presented. A total of 2,483 receipts were returned by 90 households. Home sources comprised 45% of receipts and eating-out sources 55%. Eating-out entrees were proportionally the largest single food category based on counts (16.6%) and dollars ($106 per month). Two-week expenditures were highly correlated (r = 0.83) with four-week expenditures. Receipt data provided important quantitative information about HFPB from a wide range of sources and food categories. Two weeks may be adequate to reliably characterize HFPB using annotated receipts.

  17. Annotation-Based Learner's Personality Modeling in Distance Learning Context

    Science.gov (United States)

    Omheni, Nizar; Kalboussi, Anis; Mazhoud, Omar; Kacem, Ahmed Hadj

    2016-01-01

    Researchers in distance education are interested in observing and modeling learners' personality profiles, and adapting their learning experiences accordingly. When learners read and interact with their reading materials, they do unselfconscious activities like annotation which may be key feature of their personalities. Annotation activity…

  18. Collaborative Paper-Based Annotation of Lecture Slides

    Science.gov (United States)

    Steimle, Jurgen; Brdiczka, Oliver; Muhlhauser, Max

    2009-01-01

    In a study of notetaking in university courses, we found that the large majority of students prefer paper to computer-based media like Tablet PCs for taking notes and making annotations. Based on this finding, we developed CoScribe, a concept and system which supports students in making collaborative handwritten annotations on printed lecture…

  19. Ab initio gene identification: prokaryote genome annotation with ...

    Indian Academy of Sciences (India)

    Unknown

    In this paper we compare the predictions of two of the nonconsensus methods, namely GeneScan and GLIMMER with annotation of three completely sequenced genomes of the organisms Haemophilus influenzae, Helicobacter pylori, and Campylobacter jejuni. All these organisms have been annotated previously using the ...

  20. Propagating annotations of molecular networks using in silico fragmentation.

    Science.gov (United States)

    da Silva, Ricardo R; Wang, Mingxun; Nothias, Louis-Félix; van der Hooft, Justin J J; Caraballo-Rodríguez, Andrés Mauricio; Fox, Evan; Balunas, Marcy J; Klassen, Jonathan L; Lopes, Norberto Peporine; Dorrestein, Pieter C

    2018-04-18

    The annotation of small molecules is one of the most challenging and important steps in untargeted mass spectrometry analysis, as most of our biological interpretations rely on structural annotations. Molecular networking has emerged as a structured way to organize and mine data from untargeted tandem mass spectrometry (MS/MS) experiments and has been widely applied to propagate annotations. However, propagation is done through manual inspection of MS/MS spectra connected in the spectral networks and is only possible when a reference library spectrum is available. One of the alternative approaches used to annotate an unknown fragmentation mass spectrum is through the use of in silico predictions. One of the challenges of in silico annotation is the uncertainty around the correct structure among the predicted candidate lists. Here we show how molecular networking can be used to improve the accuracy of in silico predictions through propagation of structural annotations, even when there is no match to a MS/MS spectrum in spectral libraries. This is accomplished through creating a network consensus of re-ranked structural candidates using the molecular network topology and structural similarity to improve in silico annotations. The Network Annotation Propagation (NAP) tool is accessible through the GNPS web-platform https://gnps.ucsd.edu/ProteoSAFe/static/gnps-theoretical.jsp.

  1. The GATO gene annotation tool for research laboratories

    Directory of Open Access Journals (Sweden)

    A. Fujita

    2005-11-01

    Full Text Available Large-scale genome projects have generated a rapidly increasing number of DNA sequences. Therefore, development of computational methods to rapidly analyze these sequences is essential for progress in genomic research. Here we present an automatic annotation system for preliminary analysis of DNA sequences. The gene annotation tool (GATO is a Bioinformatics pipeline designed to facilitate routine functional annotation and easy access to annotated genes. It was designed in view of the frequent need of genomic researchers to access data pertaining to a common set of genes. In the GATO system, annotation is generated by querying some of the Web-accessible resources and the information is stored in a local database, which keeps a record of all previous annotation results. GATO may be accessed from everywhere through the internet or may be run locally if a large number of sequences are going to be annotated. It is implemented in PHP and Perl and may be run on any suitable Web server. Usually, installation and application of annotation systems require experience and are time consuming, but GATO is simple and practical, allowing anyone with basic skills in informatics to access it without any special training. GATO can be downloaded at [http://mariwork.iq.usp.br/gato/]. Minimum computer free space required is 2 MB.

  2. First generation annotations for the fathead minnow (Pimephales promelas) genome

    Science.gov (United States)

    Ab initio gene prediction and evidence alignment were used to produce the first annotations for the fathead minnow SOAPdenovo genome assembly. Additionally, a genome browser hosted at genome.setac.org provides simplified access to the annotation data in context with fathead minno...

  3. Online Metacognitive Strategies, Hypermedia Annotations, and Motivation on Hypertext Comprehension

    Science.gov (United States)

    Shang, Hui-Fang

    2016-01-01

    This study examined the effect of online metacognitive strategies, hypermedia annotations, and motivation on reading comprehension in a Taiwanese hypertext environment. A path analysis model was proposed based on the assumption that if English as a foreign language learners frequently use online metacognitive strategies and hypermedia annotations,…

  4. Behavioral Contributions to "Teaching of Psychology": An Annotated Bibliography

    Science.gov (United States)

    Karsten, A. M.; Carr, J. E.

    2008-01-01

    An annotated bibliography that summarizes behavioral contributions to the journal "Teaching of Psychology" from 1974 to 2006 is provided. A total of 116 articles of potential utility to college-level instructors of behavior analysis and related areas were identified, annotated, and organized into nine categories for ease of accessibility.…

  5. Code Generation from Pragmatics Annotated Coloured Petri Nets

    DEFF Research Database (Denmark)

    Simonsen, Kent Inge

    using a sub-class of CPNs, called Pragmatics Annotated CPNs (PACPNs). PA-CPNs give structure to the protocol models and allows the models to be annotated with code generation pragmatics. These pragmatics are used by our code generation approach to identify and execute the appropriate code generation...

  6. Orienteering: An Annotated Bibliography = Orientierungslauf: Eine kommentierte Bibliographie.

    Science.gov (United States)

    Seiler, Roland, Ed.; Hartmann, Wolfgang, Ed.

    1994-01-01

    Annotated bibliography of 220 books, monographs, and journal articles on orienteering published 1984-94, from SPOLIT database of the Federal Institute of Sport Science (Cologne, Germany). Annotations in English or German. Ten sections including psychological, physiological, health, sociological, and environmental aspects; training and coaching;…

  7. International Development and the Human Environment. An Annotated Bibliography.

    Science.gov (United States)

    1974

    Most of the material in this annotated bibliography has been selected from the literature published between 1968 and 1972. Each annotation and citation is indexed by author, subject, and publisher. Entries are organized into 11 chapters: Environment, Development, and Conservation of Natural Resources; The Third World: Development and Economic…

  8. Twenty Years of "Writing Center Journal Scholarship": An Annotated Bibliography.

    Science.gov (United States)

    DeShaw, Dana; Mullin, Joan; DeCiccio, Albert C.

    2000-01-01

    Presents an annotated bibliography tracing 20 years of "Writing Center Journal" scholarship covering a variety of issues which include teacher training, critical thinking, writing apprehension, peer tutoring, Internet sources and individual instruction. Contains annotations of all the articles and reviews published in this journal's…

  9. MUTAGEN: Multi-user tool for annotating GENomes

    DEFF Research Database (Denmark)

    Brugger, K.; Redder, P.; Skovgaard, Marie

    2003-01-01

    MUTAGEN is a free prokaryotic annotation system. It offers the advantages of genome comparison, graphical sequence browsers, search facilities and open-source for user-specific adjustments. The web-interface allows several users to access the system from standard desktop computers. The Sulfolobus...... acidocaldarius genome, and several plasmids and viruses have so far been analysed and annotated using MUTAGEN....

  10. Annotating with Propp's Morphology of the Folktale: Reproducibility and Trainability

    NARCIS (Netherlands)

    Fisseni, B.; Kurji, A.; Löwe, B.

    2014-01-01

    We continue the study of the reproducibility of Propp’s annotations from Bod et al. (2012). We present four experiments in which test subjects were taught Propp’s annotation system; we conclude that Propp’s system needs a significant amount of training, but that with sufficient time investment, it

  11. Developing Annotation Solutions for Online Data Driven Learning

    Science.gov (United States)

    Perez-Paredes, Pascual; Alcaraz-Calero, Jose M.

    2009-01-01

    Although "annotation" is a widely-researched topic in Corpus Linguistics (CL), its potential role in Data Driven Learning (DDL) has not been addressed in depth by Foreign Language Teaching (FLT) practitioners. Furthermore, most of the research in the use of DDL methods pays little attention to annotation in the design and implementation…

  12. Automatic Annotation Method on Learners' Opinions in Case Method Discussion

    Science.gov (United States)

    Samejima, Masaki; Hisakane, Daichi; Komoda, Norihisa

    2015-01-01

    Purpose: The purpose of this paper is to annotate an attribute of a problem, a solution or no annotation on learners' opinions automatically for supporting the learners' discussion without a facilitator. The case method aims at discussing problems and solutions in a target case. However, the learners miss discussing some of problems and solutions.…

  13. The GATO gene annotation tool for research laboratories.

    Science.gov (United States)

    Fujita, A; Massirer, K B; Durham, A M; Ferreira, C E; Sogayar, M C

    2005-11-01

    Large-scale genome projects have generated a rapidly increasing number of DNA sequences. Therefore, development of computational methods to rapidly analyze these sequences is essential for progress in genomic research. Here we present an automatic annotation system for preliminary analysis of DNA sequences. The gene annotation tool (GATO) is a Bioinformatics pipeline designed to facilitate routine functional annotation and easy access to annotated genes. It was designed in view of the frequent need of genomic researchers to access data pertaining to a common set of genes. In the GATO system, annotation is generated by querying some of the Web-accessible resources and the information is stored in a local database, which keeps a record of all previous annotation results. GATO may be accessed from everywhere through the internet or may be run locally if a large number of sequences are going to be annotated. It is implemented in PHP and Perl and may be run on any suitable Web server. Usually, installation and application of annotation systems require experience and are time consuming, but GATO is simple and practical, allowing anyone with basic skills in informatics to access it without any special training. GATO can be downloaded at [http://mariwork.iq.usp.br/gato/]. Minimum computer free space required is 2 MB.

  14. Gene calling and bacterial genome annotation with BG7.

    Science.gov (United States)

    Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo

    2015-01-01

    New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).

  15. The RAST Server: Rapid Annotations using Subsystems Technology

    Directory of Open Access Journals (Sweden)

    Overbeek Ross A

    2008-02-01

    Full Text Available Abstract Background The number of prokaryotic genome sequences becoming available is growing steadily and is growing faster than our ability to accurately annotate them. Description We describe a fully automated service for annotating bacterial and archaeal genomes. The service identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user. In addition, the annotated genome can be browsed in an environment that supports comparative analysis with the annotated genomes maintained in the SEED environment. The service normally makes the annotated genome available within 12–24 hours of submission, but ultimately the quality of such a service will be judged in terms of accuracy, consistency, and completeness of the produced annotations. We summarize our attempts to address these issues and discuss plans for incrementally enhancing the service. Conclusion By providing accurate, rapid annotation freely to the community we have created an important community resource. The service has now been utilized by over 120 external users annotating over 350 distinct genomes.

  16. MitoFish and MitoAnnotator: a mitochondrial genome database of fish with an accurate and automatic annotation pipeline.

    Science.gov (United States)

    Iwasaki, Wataru; Fukunaga, Tsukasa; Isagozawa, Ryota; Yamada, Koichiro; Maeda, Yasunobu; Satoh, Takashi P; Sado, Tetsuya; Mabuchi, Kohji; Takeshima, Hirohiko; Miya, Masaki; Nishida, Mutsumi

    2013-11-01

    Mitofish is a database of fish mitochondrial genomes (mitogenomes) that includes powerful and precise de novo annotations for mitogenome sequences. Fish occupy an important position in the evolution of vertebrates and the ecology of the hydrosphere, and mitogenomic sequence data have served as a rich source of information for resolving fish phylogenies and identifying new fish species. The importance of a mitogenomic database continues to grow at a rapid pace as massive amounts of mitogenomic data are generated with the advent of new sequencing technologies. A severe bottleneck seems likely to occur with regard to mitogenome annotation because of the overwhelming pace of data accumulation and the intrinsic difficulties in annotating sequences with degenerating transfer RNA structures, divergent start/stop codons of the coding elements, and the overlapping of adjacent elements. To ease this data backlog, we developed an annotation pipeline named MitoAnnotator. MitoAnnotator automatically annotates a fish mitogenome with a high degree of accuracy in approximately 5 min; thus, it is readily applicable to data sets of dozens of sequences. MitoFish also contains re-annotations of previously sequenced fish mitogenomes, enabling researchers to refer to them when they find annotations that are likely to be erroneous or while conducting comparative mitogenomic analyses. For users who need more information on the taxonomy, habitats, phenotypes, or life cycles of fish, MitoFish provides links to related databases. MitoFish and MitoAnnotator are freely available at http://mitofish.aori.u-tokyo.ac.jp/ (last accessed August 28, 2013); all of the data can be batch downloaded, and the annotation pipeline can be used via a web interface.

  17. Sharing Map Annotations in Small Groups: X Marks the Spot

    Science.gov (United States)

    Congleton, Ben; Cerretani, Jacqueline; Newman, Mark W.; Ackerman, Mark S.

    Advances in location-sensing technology, coupled with an increasingly pervasive wireless Internet, have made it possible (and increasingly easy) to access and share information with context of one’s geospatial location. We conducted a four-phase study, with 27 students, to explore the practices surrounding the creation, interpretation and sharing of map annotations in specific social contexts. We found that annotation authors consider multiple factors when deciding how to annotate maps, including the perceived utility to the audience and how their contributions will reflect on the image they project to others. Consumers of annotations value the novelty of information, but must be convinced of the author’s credibility. In this paper we describe our study, present the results, and discuss implications for the design of software for sharing map annotations.

  18. Semantator: annotating clinical narratives with semantic web ontologies.

    Science.gov (United States)

    Song, Dezhao; Chute, Christopher G; Tao, Cui

    2012-01-01

    To facilitate clinical research, clinical data needs to be stored in a machine processable and understandable way. Manual annotating clinical data is time consuming. Automatic approaches (e.g., Natural Language Processing systems) have been adopted to convert such data into structured formats; however, the quality of such automatically extracted data may not always be satisfying. In this paper, we propose Semantator, a semi-automatic tool for document annotation with Semantic Web ontologies. With a loaded free text document and an ontology, Semantator supports the creation/deletion of ontology instances for any document fragment, linking/disconnecting instances with the properties in the ontology, and also enables automatic annotation by connecting to the NCBO annotator and cTAKES. By representing annotations in Semantic Web standards, Semantator supports reasoning based upon the underlying semantics of the owl:disjointWith and owl:equivalentClass predicates. We present discussions based on user experiences of using Semantator.

  19. AUTOMATIC MUSCLE PERIMYSIUM ANNOTATION USING DEEP CONVOLUTIONAL NEURAL NETWORK.

    Science.gov (United States)

    Sapkota, Manish; Xing, Fuyong; Su, Hai; Yang, Lin

    2015-04-01

    Diseased skeletal muscle expresses mononuclear cell infiltration in the regions of perimysium. Accurate annotation or segmentation of perimysium can help biologists and clinicians to determine individualized patient treatment and allow for reasonable prognostication. However, manual perimysium annotation is time consuming and prone to inter-observer variations. Meanwhile, the presence of ambiguous patterns in muscle images significantly challenge many traditional automatic annotation algorithms. In this paper, we propose an automatic perimysium annotation algorithm based on deep convolutional neural network (CNN). We formulate the automatic annotation of perimysium in muscle images as a pixel-wise classification problem, and the CNN is trained to label each image pixel with raw RGB values of the patch centered at the pixel. The algorithm is applied to 82 diseased skeletal muscle images. We have achieved an average precision of 94% on the test dataset.

  20. Accurate annotation of protein-coding genes in mitochondrial genomes.

    Science.gov (United States)

    Al Arab, Marwa; Höner Zu Siederdissen, Christian; Tout, Kifah; Sahyoun, Abdullah H; Stadler, Peter F; Bernt, Matthias

    2017-01-01

    Mitochondrial genome sequences are available in large number and new sequences become published nowadays with increasing pace. Fast, automatic, consistent, and high quality annotations are a prerequisite for downstream analyses. Therefore, we present an automated pipeline for fast de novo annotation of mitochondrial protein-coding genes. The annotation is based on enhanced phylogeny-aware hidden Markov models (HMMs). The pipeline builds taxon-specific enhanced multiple sequence alignments (MSA) of already annotated sequences and corresponding HMMs using an approximation of the phylogeny. The MSAs are enhanced by fixing unannotated frameshifts, purging of wrong sequences, and removal of non-conserved columns from both ends. A comparison with reference annotations highlights the high quality of the results. The frameshift correction method predicts a large number of frameshifts, many of which are unknown. A detailed analysis of the frameshifts in nad3 of the Archosauria-Testudines group has been conducted. Copyright © 2016 Elsevier Inc. All rights reserved.

  1. Annotated bibliography of software engineering laboratory literature

    Science.gov (United States)

    Kistler, David; Bristow, John; Smith, Don

    1994-01-01

    This document is an annotated bibliography of technical papers, documents, and memorandums produced by or related to the Software Engineering Laboratory. Nearly 200 publications are summarized. These publications cover many areas of software engineering and range from research reports to software documentation. This document has been updated and reorganized substantially since the original version (SEL-82-006, November 1982). All materials have been grouped into eight general subject areas for easy reference: (1) The Software Engineering Laboratory; (2) The Software Engineering Laboratory: Software Development Documents; (3) Software Tools; (4) Software Models; (5) Software Measurement; (6) Technology Evaluations; (7) Ada Technology; and (8) Data Collection. This document contains an index of these publications classified by individual author.

  2. Entrainment: an annotated bibliography. Interim report

    International Nuclear Information System (INIS)

    Carrier, R.F.; Hannon, E.H.

    1979-04-01

    The 604 annotated references in this bibliography on the effects of pumped entrainment of aquatic organisms through the cooling systems of thermal power plants were compiled from published and unpublished literature and cover the years 1947 through 1977. References to published literature were obtained by searching large-scale commercial data bases, ORNL in-house-generated data bases, relevant journals, and periodical bibliographies. The unpublished literature is a compilation of Sections 316(a) and 316(b) demonstrations, environmental impact statements, and environmental reports prepared by the utilities in compliance with Federal Water Pollution Control Administration regulations. The bibliography includes references on monitoring studies at power plant sites, laboratory studies of physical and biological effects on entrained organisms, engineering strategies for the mitigation of entrainment effects, and selected theoretical studies concerned with the methodology for determining entrainment effects

  3. Re-annotation of the woodland strawberry (Fragaria vesca) genome.

    Science.gov (United States)

    Darwish, Omar; Shahan, Rachel; Liu, Zhongchi; Slovin, Janet P; Alkharouf, Nadim W

    2015-01-27

    Fragaria vesca is a low-growing, small-fruited diploid strawberry species commonly called woodland strawberry. It is native to temperate regions of Eurasia and North America and while it produces edible fruits, it is most highly useful as an experimental perennial plant system that can serve as a model for the agriculturally important Rosaceae family. A draft of the F. vesca genome sequence was published in 2011 [Nat Genet 43:223,2011]. The first generation annotation (version 1.1) were developed using GeneMark-ES+[Nuc Acids Res 33:6494,2005]which is a self-training gene prediction tool that relies primarily on the combination of ab initio predictions with mapping high confidence ESTs in addition to mapping gene deserts from transposable elements. Based on over 25 different tissue transcriptomes, we have revised the F. vesca genome annotation, thereby providing several improvements over version 1.1. The new annotation, which was achieved using Maker, describes many more predicted protein coding genes compared to the GeneMark generated annotation that is currently hosted at the Genome Database for Rosaceae ( http://www.rosaceae.org/ ). Our new annotation also results in an increase in the overall total coding length, and the number of coding regions found. The total number of gene predictions that do not overlap with the previous annotations is 2286, most of which were found to be homologous to other plant genes. We have experimentally verified one of the new gene model predictions to validate our results. Using the RNA-Seq transcriptome sequences from 25 diverse tissue types, the re-annotation pipeline improved existing annotations by increasing the annotation accuracy based on extensive transcriptome data. It uncovered new genes, added exons to current genes, and extended or merged exons. This complete genome re-annotation will significantly benefit functional genomic studies of the strawberry and other members of the Rosaceae.

  4. The effectiveness of annotated (vs. non-annotated) digital pathology slides as a teaching tool during dermatology and pathology residencies.

    Science.gov (United States)

    Marsch, Amanda F; Espiritu, Baltazar; Groth, John; Hutchens, Kelli A

    2014-06-01

    With today's technology, paraffin-embedded, hematoxylin & eosin-stained pathology slides can be scanned to generate high quality virtual slides. Using proprietary software, digital images can also be annotated with arrows, circles and boxes to highlight certain diagnostic features. Previous studies assessing digital microscopy as a teaching tool did not involve the annotation of digital images. The objective of this study was to compare the effectiveness of annotated digital pathology slides versus non-annotated digital pathology slides as a teaching tool during dermatology and pathology residencies. A study group composed of 31 dermatology and pathology residents was asked to complete an online pre-quiz consisting of 20 multiple choice style questions, each associated with a static digital pathology image. After completion, participants were given access to an online tutorial composed of digitally annotated pathology slides and subsequently asked to complete a post-quiz. A control group of 12 residents completed a non-annotated version of the tutorial. Nearly all participants in the study group improved their quiz score, with an average improvement of 17%, versus only 3% (P = 0.005) in the control group. These results support the notion that annotated digital pathology slides are superior to non-annotated slides for the purpose of resident education. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  5. Current and future trends in marine image annotation software

    Science.gov (United States)

    Gomes-Pereira, Jose Nuno; Auger, Vincent; Beisiegel, Kolja; Benjamin, Robert; Bergmann, Melanie; Bowden, David; Buhl-Mortensen, Pal; De Leo, Fabio C.; Dionísio, Gisela; Durden, Jennifer M.; Edwards, Luke; Friedman, Ariell; Greinert, Jens; Jacobsen-Stout, Nancy; Lerner, Steve; Leslie, Murray; Nattkemper, Tim W.; Sameoto, Jessica A.; Schoening, Timm; Schouten, Ronald; Seager, James; Singh, Hanumant; Soubigou, Olivier; Tojeira, Inês; van den Beld, Inge; Dias, Frederico; Tempera, Fernando; Santos, Ricardo S.

    2016-12-01

    Given the need to describe, analyze and index large quantities of marine imagery data for exploration and monitoring activities, a range of specialized image annotation tools have been developed worldwide. Image annotation - the process of transposing objects or events represented in a video or still image to the semantic level, may involve human interactions and computer-assisted solutions. Marine image annotation software (MIAS) have enabled over 500 publications to date. We review the functioning, application trends and developments, by comparing general and advanced features of 23 different tools utilized in underwater image analysis. MIAS requiring human input are basically a graphical user interface, with a video player or image browser that recognizes a specific time code or image code, allowing to log events in a time-stamped (and/or geo-referenced) manner. MIAS differ from similar software by the capability of integrating data associated to video collection, the most simple being the position coordinates of the video recording platform. MIAS have three main characteristics: annotating events in real time, posteriorly to annotation and interact with a database. These range from simple annotation interfaces, to full onboard data management systems, with a variety of toolboxes. Advanced packages allow to input and display data from multiple sensors or multiple annotators via intranet or internet. Posterior human-mediated annotation often include tools for data display and image analysis, e.g. length, area, image segmentation, point count; and in a few cases the possibility of browsing and editing previous dive logs or to analyze the annotations. The interaction with a database allows the automatic integration of annotations from different surveys, repeated annotation and collaborative annotation of shared datasets, browsing and querying of data. Progress in the field of automated annotation is mostly in post processing, for stable platforms or still images

  6. Fuzzy Emotional Semantic Analysis and Automated Annotation of Scene Images

    Science.gov (United States)

    Cao, Jianfang; Chen, Lichao

    2015-01-01

    With the advances in electronic and imaging techniques, the production of digital images has rapidly increased, and the extraction and automated annotation of emotional semantics implied by images have become issues that must be urgently addressed. To better simulate human subjectivity and ambiguity for understanding scene images, the current study proposes an emotional semantic annotation method for scene images based on fuzzy set theory. A fuzzy membership degree was calculated to describe the emotional degree of a scene image and was implemented using the Adaboost algorithm and a back-propagation (BP) neural network. The automated annotation method was trained and tested using scene images from the SUN Database. The annotation results were then compared with those based on artificial annotation. Our method showed an annotation accuracy rate of 91.2% for basic emotional values and 82.4% after extended emotional values were added, which correspond to increases of 5.5% and 8.9%, respectively, compared with the results from using a single BP neural network algorithm. Furthermore, the retrieval accuracy rate based on our method reached approximately 89%. This study attempts to lay a solid foundation for the automated emotional semantic annotation of more types of images and therefore is of practical significance. PMID:25838818

  7. Fuzzy Emotional Semantic Analysis and Automated Annotation of Scene Images

    Directory of Open Access Journals (Sweden)

    Jianfang Cao

    2015-01-01

    Full Text Available With the advances in electronic and imaging techniques, the production of digital images has rapidly increased, and the extraction and automated annotation of emotional semantics implied by images have become issues that must be urgently addressed. To better simulate human subjectivity and ambiguity for understanding scene images, the current study proposes an emotional semantic annotation method for scene images based on fuzzy set theory. A fuzzy membership degree was calculated to describe the emotional degree of a scene image and was implemented using the Adaboost algorithm and a back-propagation (BP neural network. The automated annotation method was trained and tested using scene images from the SUN Database. The annotation results were then compared with those based on artificial annotation. Our method showed an annotation accuracy rate of 91.2% for basic emotional values and 82.4% after extended emotional values were added, which correspond to increases of 5.5% and 8.9%, respectively, compared with the results from using a single BP neural network algorithm. Furthermore, the retrieval accuracy rate based on our method reached approximately 89%. This study attempts to lay a solid foundation for the automated emotional semantic annotation of more types of images and therefore is of practical significance.

  8. Annotation of the Evaluative Language in a Dependency Treebank

    Directory of Open Access Journals (Sweden)

    Šindlerová Jana

    2017-12-01

    Full Text Available In the paper, we present our efforts to annotate evaluative language in the Prague Dependency Treebank 2.0. The project is a follow-up of the series of annotations of small plaintext corpora. It uses automatic identification of potentially evaluative nodes through mapping a Czech subjectivity lexicon to syntactically annotated data. These nodes are then manually checked by an annotator and either dismissed as standing in a non-evaluative context, or confirmed as evaluative. In the latter case, information about the polarity orientation, the source and target of evaluation is added by the annotator. The annotations unveiled several advantages and disadvantages of the chosen framework. The advantages involve more structured and easy-to-handle environment for the annotator, visibility of syntactic patterning of the evaluative state, effective solving of discontinuous structures or a new perspective on the influence of good/bad news. The disadvantages include little capability of treating cases with evaluation spread among more syntactically connected nodes at once, little capability of treating metaphorical expressions, or disregarding the effects of negation and intensification in the current scheme.

  9. Comparison of concept recognizers for building the Open Biomedical Annotator

    Directory of Open Access Journals (Sweden)

    Rubin Daniel

    2009-09-01

    Full Text Available Abstract The National Center for Biomedical Ontology (NCBO is developing a system for automated, ontology-based access to online biomedical resources (Shah NH, et al.: Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinformatics 2009, 10(Suppl 2:S1. The system's indexing workflow processes the text metadata of diverse resources such as datasets from GEO and ArrayExpress to annotate and index them with concepts from appropriate ontologies. This indexing requires the use of a concept-recognition tool to identify ontology concepts in the resource's textual metadata. In this paper, we present a comparison of two concept recognizers – NLM's MetaMap and the University of Michigan's Mgrep. We utilize a number of data sources and dictionaries to evaluate the concept recognizers in terms of precision, recall, speed of execution, scalability and customizability. Our evaluations demonstrate that Mgrep has a clear edge over MetaMap for large-scale service oriented applications. Based on our analysis we also suggest areas of potential improvements for Mgrep. We have subsequently used Mgrep to build the Open Biomedical Annotator service. The Annotator service has access to a large dictionary of biomedical terms derived from the United Medical Language System (UMLS and NCBO ontologies. The Annotator also leverages the hierarchical structure of the ontologies and their mappings to expand annotations. The Annotator service is available to the community as a REST Web service for creating ontology-based annotations of their data.

  10. MimoSA: a system for minimotif annotation

    Directory of Open Access Journals (Sweden)

    Kundeti Vamsi

    2010-06-01

    Full Text Available Abstract Background Minimotifs are short peptide sequences within one protein, which are recognized by other proteins or molecules. While there are now several minimotif databases, they are incomplete. There are reports of many minimotifs in the primary literature, which have yet to be annotated, while entirely novel minimotifs continue to be published on a weekly basis. Our recently proposed function and sequence syntax for minimotifs enables us to build a general tool that will facilitate structured annotation and management of minimotif data from the biomedical literature. Results We have built the MimoSA application for minimotif annotation. The application supports management of the Minimotif Miner database, literature tracking, and annotation of new minimotifs. MimoSA enables the visualization, organization, selection and editing functions of minimotifs and their attributes in the MnM database. For the literature components, Mimosa provides paper status tracking and scoring of papers for annotation through a freely available machine learning approach, which is based on word correlation. The paper scoring algorithm is also available as a separate program, TextMine. Form-driven annotation of minimotif attributes enables entry of new minimotifs into the MnM database. Several supporting features increase the efficiency of annotation. The layered architecture of MimoSA allows for extensibility by separating the functions of paper scoring, minimotif visualization, and database management. MimoSA is readily adaptable to other annotation efforts that manually curate literature into a MySQL database. Conclusions MimoSA is an extensible application that facilitates minimotif annotation and integrates with the Minimotif Miner database. We have built MimoSA as an application that integrates dynamic abstract scoring with a high performance relational model of minimotif syntax. MimoSA's TextMine, an efficient paper-scoring algorithm, can be used to

  11. An Updated Functional Annotation of Protein-Coding Genes in the Cucumber Genome

    Directory of Open Access Journals (Sweden)

    Hongtao Song

    2018-03-01

    alternative resource for the functional annotation of predicted cucumber protein-coding genes, which we expect will be beneficial for the cucumber's biological study, accessible from http://cmb.bnu.edu.cn/functional_annotation. Meanwhile, using the cucumber reference genome as a case study, we presented an efficient strategy for transferring gene functional information from previously well-characterized protein-coding genes in model species to newly sequenced or “non-model” plant species.

  12. Eval: A software package for analysis of genome annotations

    Directory of Open Access Journals (Sweden)

    Brent Michael R

    2003-10-01

    Full Text Available Abstract Summary Eval is a flexible tool for analyzing the performance of gene annotation systems. It provides summaries and graphical distributions for many descriptive statistics about any set of annotations, regardless of their source. It also compares sets of predictions to standard annotations and to one another. Input is in the standard Gene Transfer Format (GTF. Eval can be run interactively or via the command line, in which case output options include easily parsable tab-delimited files. Availability To obtain the module package with documentation, go to http://genes.cse.wustl.edu/ and follow links for Resources, then Software. Please contact brent@cse.wustl.edu

  13. JAFA: a protein function annotation meta-server

    DEFF Research Database (Denmark)

    Friedberg, Iddo; Harder, Tim; Godzik, Adam

    2006-01-01

    With the high number of sequences and structures streaming in from genomic projects, there is a need for more powerful and sophisticated annotation tools. Most problematic of the annotation efforts is predicting gene and protein function. Over the past few years there has been considerable progress...... Annotations, or JAFA server. JAFA queries several function prediction servers with a protein sequence and assembles the returned predictions in a legible, non-redundant format. In this manner, JAFA combines the predictions of several servers to provide a comprehensive view of what are the predicted functions...

  14. Roadmap for annotating transposable elements in eukaryote genomes.

    Science.gov (United States)

    Permal, Emmanuelle; Flutre, Timothée; Quesneville, Hadi

    2012-01-01

    Current high-throughput techniques have made it feasible to sequence even the genomes of non-model organisms. However, the annotation process now represents a bottleneck to genome analysis, especially when dealing with transposable elements (TE). Combined approaches, using both de novo and knowledge-based methods to detect TEs, are likely to produce reasonably comprehensive and sensitive results. This chapter provides a roadmap for researchers involved in genome projects to address this issue. At each step of the TE annotation process, from the identification of TE families to the annotation of TE copies, we outline the tools and good practices to be used.

  15. The UniProtKB/Swiss-Prot knowledgebase and its Plant Proteome Annotation Program.

    Science.gov (United States)

    Schneider, Michel; Lane, Lydie; Boutet, Emmanuel; Lieberherr, Damien; Tognolli, Michael; Bougueleret, Lydie; Bairoch, Amos

    2009-04-13

    The UniProt knowledgebase, UniProtKB, is the main product of the UniProt consortium. It consists of two sections, UniProtKB/Swiss-Prot, the manually curated section, and UniProtKB/TrEMBL, the computer translation of the EMBL/GenBank/DDBJ nucleotide sequence database. Taken together, these two sections cover all the proteins characterized or inferred from all publicly available nucleotide sequences. The Plant Proteome Annotation Program (PPAP) of UniProtKB/Swiss-Prot focuses on the manual annotation of plant-specific proteins and protein families. Our major effort is currently directed towards the two model plants Arabidopsis thaliana and Oryza sativa. In UniProtKB/Swiss-Prot, redundancy is minimized by merging all data from different sources in a single entry. The proposed protein sequence is frequently modified after comparison with ESTs, full length transcripts or homologous proteins from other species. The information present in manually curated entries allows the reconstruction of all described isoforms. The annotation also includes proteomics data such as PTM and protein identification MS experimental results. UniProtKB and the other products of the UniProt consortium are accessible online at www.uniprot.org.

  16. A meta-approach for improving the prediction and the functional annotation of ortholog groups.

    Science.gov (United States)

    Pereira, Cécile; Denise, Alain; Lespinet, Olivier

    2014-01-01

    In comparative genomics, orthologs are used to transfer annotation from genes already characterized to newly sequenced genomes. Many methods have been developed for finding orthologs in sets of genomes. However, the application of different methods on the same proteome set can lead to distinct orthology predictions. We developed a method based on a meta-approach that is able to combine the results of several methods for orthologous group prediction. The purpose of this method is to produce better quality results by using the overlapping results obtained from several individual orthologous gene prediction procedures. Our method proceeds in two steps. The first aims to construct seeds for groups of orthologous genes; these seeds correspond to the exact overlaps between the results of all or several methods. In the second step, these seed groups are expanded by using HMM profiles. We evaluated our method on two standard reference benchmarks, OrthoBench and Orthology Benchmark Service. Our method presents a higher level of accurately predicted groups than the individual input methods of orthologous group prediction. Moreover, our method increases the number of annotated orthologous pairs without decreasing the annotation quality compared to twelve state-of-the-art methods. The meta-approach based method appears to be a reliable procedure for predicting orthologous groups. Since a large number of methods for predicting groups of orthologous genes exist, it is quite conceivable to apply this meta-approach to several combinations of different methods.

  17. New genes expressed in human brains: implications for annotating evolving genomes.

    Science.gov (United States)

    Zhang, Yong E; Landback, Patrick; Vibranovski, Maria; Long, Manyuan

    2012-11-01

    New genes have frequently formed and spread to fixation in a wide variety of organisms, constituting abundant sets of lineage-specific genes. It was recently reported that an excess of primate-specific and human-specific genes were upregulated in the brains of fetuses and infants, and especially in the prefrontal cortex, which is involved in cognition. These findings reveal the prevalent addition of new genetic components to the transcriptome of the human brain. More generally, these findings suggest that genomes are continually evolving in both sequence and content, eroding the conservation endowed by common ancestry. Despite increasing recognition of the importance of new genes, we highlight here that these genes are still seriously under-characterized in functional studies and that new gene annotation is inconsistent in current practice. We propose an integrative approach to annotate new genes, taking advantage of functional and evolutionary genomic methods. We finally discuss how the refinement of new gene annotation will be important for the detection of evolutionary forces governing new gene origination. Copyright © 2012 WILEY Periodicals, Inc.

  18. Annotated bibliography of the physical data of Rainier Mesa and Yucca Mountain

    International Nuclear Information System (INIS)

    Russell, C.E.

    1988-09-01

    Yucca Mountain, located on and adjacent to the Nevada Test Site (NTS) has been designated as the only site to undergo characterization to determine if it meets the criteria to become the Nation's first high-level nuclear waste repository. During this process, care must be taken to not compromise the site's integrity through excessive testing. In order to supplement the limited data to be gathered at Yucca Mountain, analog areas are to be considered. This annotated bibliography was compiled by the Desert Research Institute to help investigate ways in which Rainier Mesa could either be used as a supplemental repository test site or where existing Rainier Mesa data can be used either to support or refute test results from Yucca Mountain. Rainier Mesa, the location of numerous underground nuclear tests on the NTS, possesses some geologic characteristics similar to those of Yucca Mountain, which makes it a likely candidate for comparison. Almost 500 references regarding geology, hydrology, meteorology, biology, and archaeology were annotated and entered alpha-numerically into the bibliography. These references were categorized into 50 topics which are defined in Section 2 and presented in Section 3. Each reference is categorized as to whether it contains Yucca Mountain data, Rainier Mesa data, or both, and a final category consists of those reports that contain Rainier Mesa data that have already been applied to Yucca Mountain research. The annotated bibliography is presented in Section 4

  19. Automated annotation of mobile antibiotic resistance in Gram-negative bacteria: the Multiple Antibiotic Resistance Annotator (MARA) and database.

    Science.gov (United States)

    Partridge, Sally R; Tsafnat, Guy

    2018-04-01

    Multiresistance in Gram-negative bacteria is often due to acquisition of several different antibiotic resistance genes, each associated with a different mobile genetic element, that tend to cluster together in complex conglomerations. Accurate, consistent annotation of resistance genes, the boundaries and fragments of mobile elements, and signatures of insertion, such as DR, facilitates comparative analysis of complex multiresistance regions and plasmids to better understand their evolution and how resistance genes spread. To extend the Repository of Antibiotic resistance Cassettes (RAC) web site, which includes a database of 'features', and the Attacca automatic DNA annotation system, to encompass additional resistance genes and all types of associated mobile elements. Antibiotic resistance genes and mobile elements were added to RAC, from existing registries where possible. Attacca grammars were extended to accommodate the expanded database, to allow overlapping features to be annotated and to identify and annotate features such as composite transposons and DR. The Multiple Antibiotic Resistance Annotator (MARA) database includes antibiotic resistance genes and selected mobile elements from Gram-negative bacteria, distinguishing important variants. Sequences can be submitted to the MARA web site for annotation. A list of positions and orientations of annotated features, indicating those that are truncated, DR and potential composite transposons is provided for each sequence, as well as a diagram showing annotated features approximately to scale. The MARA web site (http://mara.spokade.com) provides a comprehensive database for mobile antibiotic resistance in Gram-negative bacteria and accurately annotates resistance genes and associated mobile elements in submitted sequences to facilitate comparative analysis.

  20. TOPSAN: use of a collaborative environment for annotating, analyzing and disseminating data on JCSG and PSI structures

    International Nuclear Information System (INIS)

    Krishna, S. Sri; Weekes, Dana; Bakolitsa, Constantina; Elsliger, Marc-André; Wilson, Ian A.; Godzik, Adam; Wooley, John

    2010-01-01

    Specific use cases of TOPSAN, an innovative collaborative platform for creating, sharing and distributing annotations and insights about protein structures, such as those determined by high-throughput structural genomics in the Protein Structure Initiative (PSI), are described. TOPSAN is the main annotation platform for JCSG structures and serves as a conduit for initiating collaborations with the biological community, as illustrated in this special issue of Acta Crystallographica Section F. Developed at the JCSG with the goal of opening a dialogue on the novel protein structures with the broader biological community, TOPSAN is a unique tool for fostering distributed collaborations and provides an efficient pathway to peer-reviewed publications. The NIH Protein Structure Initiative centers, such as the Joint Center for Structural Genomics (JCSG), have developed highly efficient technological platforms that are capable of experimentally determining the three-dimensional structures of hundreds of proteins per year. However, the overwhelming majority of the almost 5000 protein structures determined by these centers have yet to be described in the peer-reviewed literature. In a high-throughput structural genomics environment, the process of structure determination occurs independently of any associated experimental characterization of function, which creates a challenge for the annotation and analysis of structures and the publication of these results. This challenge has been addressed by developing TOPSAN (‘The Open Protein Structure Annotation Network’), which enables the generation of knowledge via collaborations among globally distributed contributors supported by automated amalgamation of available information. TOPSAN currently provides annotations for all protein structures determined by the JCSG in addition to preliminary annotations on a large number of structures from the other PSI production centers. TOPSAN-enabled collaborations have resulted in

  1. Fluid inclusions in salt: an annotated bibliography

    Energy Technology Data Exchange (ETDEWEB)

    Isherwood, D.J.

    1979-01-26

    An annotated bibliography is presented which was compiled while searching the literature for information on fluid inclusions in salt for the Nuclear Regulatory Commission's study on the deep-geologic disposal of nuclear waste. The migration of fluid inclusions in a thermal gradient is a potential hazard to the safe disposal of nuclear waste in a salt repository. At the present time, a prediction as to whether this hazard precludes the use of salt for waste disposal can not be made. Limited data from the Salt-Vault in situ heater experiments in the early 1960's (Bradshaw and McClain, 1971) leave little doubt that fluid inclusions can migrate towards a heat source. In addition to the bibliography, there is a brief summary of the physical and chemical characteristics that together with the temperature of the waste will determine the chemical composition of the brine in contact with the waste canister, the rate of fluid migration, and the brine-canister-waste interactions.

  2. Fluid inclusions in salt: an annotated bibliography

    International Nuclear Information System (INIS)

    Isherwood, D.J.

    1979-01-01

    An annotated bibliography is presented which was compiled while searching the literature for information on fluid inclusions in salt for the Nuclear Regulatory Commission's study on the deep-geologic disposal of nuclear waste. The migration of fluid inclusions in a thermal gradient is a potential hazard to the safe disposal of nuclear waste in a salt repository. At the present time, a prediction as to whether this hazard precludes the use of salt for waste disposal can not be made. Limited data from the Salt-Vault in situ heater experiments in the early 1960's (Bradshaw and McClain, 1971) leave little doubt that fluid inclusions can migrate towards a heat source. In addition to the bibliography, there is a brief summary of the physical and chemical characteristics that together with the temperature of the waste will determine the chemical composition of the brine in contact with the waste canister, the rate of fluid migration, and the brine-canister-waste interactions

  3. Frame on frames: an annotated bibliography

    International Nuclear Information System (INIS)

    Wright, T.; Tsao, H.J.

    1983-01-01

    The success or failure of any sample survey of a finite population is largely dependent upon the condition and adequacy of the list or frame from which the probability sample is selected. Much of the published survey sampling related work has focused on the measurement of sampling errors and, more recently, on nonsampling errors to a lesser extent. Recent studies on data quality for various types of data collection systems have revealed that the extent of the nonsampling errors far exceeds that of the sampling errors in many cases. While much of this nonsampling error, which is difficult to measure, can be attributed to poor frames, relatively little effort or theoretical work has focused on this contribution to total error. The objective of this paper is to present an annotated bibliography on frames with the hope that it will bring together, for experimenters, a number of suggestions for action when sampling from imperfect frames and that more attention will be given to this area of survey methods research

  4. Annotating Human P-Glycoprotein Bioassay Data.

    Science.gov (United States)

    Zdrazil, Barbara; Pinto, Marta; Vasanthanathan, Poongavanam; Williams, Antony J; Balderud, Linda Zander; Engkvist, Ola; Chichester, Christine; Hersey, Anne; Overington, John P; Ecker, Gerhard F

    2012-08-01

    Huge amounts of small compound bioactivity data have been entering the public domain as a consequence of open innovation initiatives. It is now the time to carefully analyse existing bioassay data and give it a systematic structure. Our study aims to annotate prominent in vitro assays used for the determination of bioactivities of human P-glycoprotein inhibitors and substrates as they are represented in the ChEMBL and TP-search open source databases. Furthermore, the ability of data, determined in different assays, to be combined with each other is explored. As a result of this study, it is suggested that for inhibitors of human P-glycoprotein it is possible to combine data coming from the same assay type, if the cell lines used are also identical and the fluorescent or radiolabeled substrate have overlapping binding sites. In addition, it demonstrates that there is a need for larger chemical diverse datasets that have been measured in a panel of different assays. This would certainly alleviate the search for other inter-correlations between bioactivity data yielded by different assay setups.

  5. The Bologna Annotation Resource (BAR 3.0): improving protein functional annotation.

    Science.gov (United States)

    Profiti, Giuseppe; Martelli, Pier Luigi; Casadio, Rita

    2017-07-03

    BAR 3.0 updates our server BAR (Bologna Annotation Resource) for predicting protein structural and functional features from sequence. We increase data volume, query capabilities and information conveyed to the user. The core of BAR 3.0 is a graph-based clustering procedure of UniProtKB sequences, following strict pairwise similarity criteria (sequence identity ≥40% with alignment coverage ≥90%). Each cluster contains the available annotation downloaded from UniProtKB, GO, PFAM and PDB. After statistical validation, GO terms and PFAM domains are cluster-specific and annotate new sequences entering the cluster after satisfying similarity constraints. BAR 3.0 includes 28 869 663 sequences in 1 361 773 clusters, of which 22.2% (22 241 661 sequences) and 47.4% (24 555 055 sequences) have at least one validated GO term and one PFAM domain, respectively. 1.4% of the clusters (36% of all sequences) include PDB structures and the cluster is associated to a hidden Markov model that allows building template-target alignment suitable for structural modeling. Some other 3 399 026 sequences are singletons. BAR 3.0 offers an improved search interface, allowing queries by UniProtKB-accession, Fasta sequence, GO-term, PFAM-domain, organism, PDB and ligand/s. When evaluated on the CAFA2 targets, BAR 3.0 largely outperforms our previous version and scores among state-of-the-art methods. BAR 3.0 is publicly available and accessible at http://bar.biocomp.unibo.it/bar3. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Literature-based concept profiles for gene annotation: the issue of weighting.

    Science.gov (United States)

    Jelier, Rob; Schuemie, Martijn J; Roes, Peter-Jan; van Mulligen, Erik M; Kors, Jan A

    2008-05-01

    Text-mining has been used to link biomedical concepts, such as genes or biological processes, to each other for annotation purposes or the generation of new hypotheses. To relate two concepts to each other several authors have used the vector space model, as vectors can be compared efficiently and transparently. Using this model, a concept is characterized by a list of associated concepts, together with weights that indicate the strength of the association. The associated concepts in the vectors and their weights are derived from a set of documents linked to the concept of interest. An important issue with this approach is the determination of the weights of the associated concepts. Various schemes have been proposed to determine these weights, but no comparative studies of the different approaches are available. Here we compare several weighting approaches in a large scale classification experiment. Three different techniques were evaluated: (1) weighting based on averaging, an empirical approach; (2) the log likelihood ratio, a test-based measure; (3) the uncertainty coefficient, an information-theory based measure. The weighting schemes were applied in a system that annotates genes with Gene Ontology codes. As the gold standard for our study we used the annotations provided by the Gene Ontology Annotation project. Classification performance was evaluated by means of the receiver operating characteristics (ROC) curve using the area under the curve (AUC) as the measure of performance. All methods performed well with median AUC scores greater than 0.84, and scored considerably higher than a binary approach without any weighting. Especially for the more specific Gene Ontology codes excellent performance was observed. The differences between the methods were small when considering the whole experiment. However, the number of documents that were linked to a concept proved to be an important variable. When larger amounts of texts were available for the generation of

  7. Transcriptator: An Automated Computational Pipeline to Annotate Assembled Reads and Identify Non Coding RNA.

    Directory of Open Access Journals (Sweden)

    Kumar Parijat Tripathi

    Full Text Available RNA-seq is a new tool to measure RNA transcript counts, using high-throughput sequencing at an extraordinary accuracy. It provides quantitative means to explore the transcriptome of an organism of interest. However, interpreting this extremely large data into biological knowledge is a problem, and biologist-friendly tools are lacking. In our lab, we developed Transcriptator, a web application based on a computational Python pipeline with a user-friendly Java interface. This pipeline uses the web services available for BLAST (Basis Local Search Alignment Tool, QuickGO and DAVID (Database for Annotation, Visualization and Integrated Discovery tools. It offers a report on statistical analysis of functional and Gene Ontology (GO annotation's enrichment. It helps users to identify enriched biological themes, particularly GO terms, pathways, domains, gene/proteins features and protein-protein interactions related informations. It clusters the transcripts based on functional annotations and generates a tabular report for functional and gene ontology annotations for each submitted transcript to the web server. The implementation of QuickGo web-services in our pipeline enable the users to carry out GO-Slim analysis, whereas the integration of PORTRAIT (Prediction of transcriptomic non coding RNA (ncRNA by ab initio methods helps to identify the non coding RNAs and their regulatory role in transcriptome. In summary, Transcriptator is a useful software for both NGS and array data. It helps the users to characterize the de-novo assembled reads, obtained from NGS experiments for non-referenced organisms, while it also performs the functional enrichment analysis of differentially expressed transcripts/genes for both RNA-seq and micro-array experiments. It generates easy to read tables and interactive charts for better understanding of the data. The pipeline is modular in nature, and provides an opportunity to add new plugins in the future. Web application is

  8. Systematically profiling and annotating long intergenic non-coding RNAs in human embryonic stem cell.

    Science.gov (United States)

    Tang, Xing; Hou, Mei; Ding, Yang; Li, Zhaohui; Ren, Lichen; Gao, Ge

    2013-01-01

    While more and more long intergenic non-coding RNAs (lincRNAs) were identified to take important roles in both maintaining pluripotency and regulating differentiation, how these lincRNAs may define and drive cell fate decisions on a global scale are still mostly elusive. Systematical profiling and comprehensive annotation of embryonic stem cells lincRNAs may not only bring a clearer big picture of these novel regulators but also shed light on their functionalities. Based on multiple RNA-Seq datasets, we systematically identified 300 human embryonic stem cell lincRNAs (hES lincRNAs). Of which, one forth (78 out of 300) hES lincRNAs were further identified to be biasedly expressed in human ES cells. Functional analysis showed that they were preferentially involved in several early-development related biological processes. Comparative genomics analysis further suggested that around half of the identified hES lincRNAs were conserved in mouse. To facilitate further investigation of these hES lincRNAs, we constructed an online portal for biologists to access all their sequences and annotations interactively. In addition to navigation through a genome browse interface, users can also locate lincRNAs through an advanced query interface based on both keywords and expression profiles, and analyze results through multiple tools. By integrating multiple RNA-Seq datasets, we systematically characterized and annotated 300 hES lincRNAs. A full functional web portal is available freely at http://scbrowse.cbi.pku.edu.cn. As the first global profiling and annotating of human embryonic stem cell lincRNAs, this work aims to provide a valuable resource for both experimental biologists and bioinformaticians.

  9. MitoBamAnnotator: A web-based tool for detecting and annotating heteroplasmy in human mitochondrial DNA sequences.

    Science.gov (United States)

    Zhidkov, Ilia; Nagar, Tal; Mishmar, Dan; Rubin, Eitan

    2011-11-01

    The use of Next-Generation Sequencing of mitochondrial DNA is becoming widespread in biological and clinical research. This, in turn, creates a need for a convenient tool that detects and analyzes heteroplasmy. Here we present MitoBamAnnotator, a user friendly web-based tool that allows maximum flexibility and control in heteroplasmy research. MitoBamAnnotator provides the user with a comprehensively annotated overview of mitochondrial genetic variation, allowing for an in-depth analysis with no prior knowledge in programming. Copyright © 2011 Elsevier B.V. and Mitochondria Research Society. All rights reserved. All rights reserved.

  10. Managing and Querying Image Annotation and Markup in XML.

    Science.gov (United States)

    Wang, Fusheng; Pan, Tony; Sharma, Ashish; Saltz, Joel

    2010-01-01

    Proprietary approaches for representing annotations and image markup are serious barriers for researchers to share image data and knowledge. The Annotation and Image Markup (AIM) project is developing a standard based information model for image annotation and markup in health care and clinical trial environments. The complex hierarchical structures of AIM data model pose new challenges for managing such data in terms of performance and support of complex queries. In this paper, we present our work on managing AIM data through a native XML approach, and supporting complex image and annotation queries through native extension of XQuery language. Through integration with xService, AIM databases can now be conveniently shared through caGrid.

  11. Responsibility in Governmental-Political Communication: A Selected, Annotated Bibliography.

    Science.gov (United States)

    Johannesen, Richard L.

    This annotated bibliography lists 43 books, periodicals, and essays in the area of governmental-political communication. Topics include: social justice, lying, cheating, ethics, public duties, public policy, language, rhetorical strategies, and propaganda. (MS)

  12. Creating New Medical Ontologies for Image Annotation A Case Study

    CERN Document Server

    Stanescu, Liana; Brezovan, Marius; Mihai, Cristian Gabriel

    2012-01-01

    Creating New Medical Ontologies for Image Annotation focuses on the problem of the medical images automatic annotation process, which is solved in an original manner by the authors. All the steps of this process are described in detail with algorithms, experiments and results. The original algorithms proposed by authors are compared with other efficient similar algorithms. In addition, the authors treat the problem of creating ontologies in an automatic way, starting from Medical Subject Headings (MESH). They have presented some efficient and relevant annotation models and also the basics of the annotation model used by the proposed system: Cross Media Relevance Models. Based on a text query the system will retrieve the images that contain objects described by the keywords.

  13. Social organization and transportation energy: an annotated bibliography

    Energy Technology Data Exchange (ETDEWEB)

    Watts, W. W.

    1974-07-01

    This annotated bibliography lists items organized according to the following themes: (1) fuel consumption and modal split, (2) economics, (3) public decision-making, (4) transportation planning, and (5) effectiveness of municipal services.

  14. Managing and querying image annotation and markup in XML

    Science.gov (United States)

    Wang, Fusheng; Pan, Tony; Sharma, Ashish; Saltz, Joel

    2010-03-01

    Proprietary approaches for representing annotations and image markup are serious barriers for researchers to share image data and knowledge. The Annotation and Image Markup (AIM) project is developing a standard based information model for image annotation and markup in health care and clinical trial environments. The complex hierarchical structures of AIM data model pose new challenges for managing such data in terms of performance and support of complex queries. In this paper, we present our work on managing AIM data through a native XML approach, and supporting complex image and annotation queries through native extension of XQuery language. Through integration with xService, AIM databases can now be conveniently shared through caGrid.

  15. Geothermal wetlands: an annotated bibliography of pertinent literature

    Energy Technology Data Exchange (ETDEWEB)

    Stanley, N.E.; Thurow, T.L.; Russell, B.F.; Sullivan, J.F.

    1980-05-01

    This annotated bibliography covers the following topics: algae, wetland ecosystems; institutional aspects; macrophytes - general, production rates, and mineral absorption; trace metal absorption; wetland soils; water quality; and other aspects of marsh ecosystems. (MHR)

  16. Annotated bibliography of South African indigenous evergreen forest ecology

    CSIR Research Space (South Africa)

    Geldenhuys, CJ

    1985-01-01

    Full Text Available Annotated references to 519 publications are presented, together with keyword listings and keyword, regional, place name and taxonomic indices. This bibliography forms part of the first phase of the activities of the Forest Biome Task Group....

  17. Annotating Evidence Based Clinical Guidelines : A Lightweight Ontology

    NARCIS (Netherlands)

    Hoekstra, R.; de Waard, A.; Vdovjak, R.; Paschke, A.; Burger, A.; Romano, P.; Marshall, M.S.; Splendiani, A.

    2012-01-01

    This paper describes a lightweight ontology for representing annotations of declarative evidence based clinical guidelines. We present the motivation and requirements for this representation, based on an analysis of several guidelines. The ontology provides the means to connect clinical questions

  18. 06491 Summary -- Digital Historical Corpora- Architecture, Annotation, and Retrieval

    OpenAIRE

    Burnard, Lou; Dobreva, Milena; Fuhr, Norbert; Lüdeling, Anke

    2007-01-01

    The seminar "Digital Historical Corpora" brought together scholars from (historical) linguistics, (historical) philology, computational linguistics and computer science who work with collections of historical texts. The issues that were discussed include digitization, corpus design, corpus architecture, annotation, search, and retrieval.

  19. eXamine: Exploring annotated modules in networks

    NARCIS (Netherlands)

    Dinkla, K.; El-Kebir, M.; Bucur, C.I.; Siderius, M.; Smit, M.J.; Westenberg, M.A.; Klau, G.W.

    2014-01-01

    Background: Biological networks have a growing importance for the interpretation of high-throughput " omics" data. Integrative network analysis makes use of statistical and combinatorial methods to extract smaller subnetwork modules, and performs enrichment analysis to annotate the modules with

  20. GIFtS: annotation landscape analysis with GeneCards

    Directory of Open Access Journals (Sweden)

    Dalah Irina

    2009-10-01

    Full Text Available Abstract Background Gene annotation is a pivotal component in computational genomics, encompassing prediction of gene function, expression analysis, and sequence scrutiny. Hence, quantitative measures of the annotation landscape constitute a pertinent bioinformatics tool. GeneCards® is a gene-centric compendium of rich annotative information for over 50,000 human gene entries, building upon 68 data sources, including Gene Ontology (GO, pathways, interactions, phenotypes, publications and many more. Results We present the GeneCards Inferred Functionality Score (GIFtS which allows a quantitative assessment of a gene's annotation status, by exploiting the unique wealth and diversity of GeneCards information. The GIFtS tool, linked from the GeneCards home page, facilitates browsing the human genome by searching for the annotation level of a specified gene, retrieving a list of genes within a specified range of GIFtS value, obtaining random genes with a specific GIFtS value, and experimenting with the GIFtS weighting algorithm for a variety of annotation categories. The bimodal shape of the GIFtS distribution suggests a division of the human gene repertoire into two main groups: the high-GIFtS peak consists almost entirely of protein-coding genes; the low-GIFtS peak consists of genes from all of the categories. Cluster analysis of GIFtS annotation vectors provides the classification of gene groups by detailed positioning in the annotation arena. GIFtS also provide measures which enable the evaluation of the databases that serve as GeneCards sources. An inverse correlation is found (for GIFtS>25 between the number of genes annotated by each source, and the average GIFtS value of genes associated with that source. Three typical source prototypes are revealed by their GIFtS distribution: genome-wide sources, sources comprising mainly highly annotated genes, and sources comprising mainly poorly annotated genes. The degree of accumulated knowledge for a

  1. Combined evidence annotation of transposable elements in genome sequences.

    Directory of Open Access Journals (Sweden)

    Hadi Quesneville

    2005-07-01

    Full Text Available Transposable elements (TEs are mobile, repetitive sequences that make up significant fractions of metazoan genomes. Despite their near ubiquity and importance in genome and chromosome biology, most efforts to annotate TEs in genome sequences rely on the results of a single computational program, RepeatMasker. In contrast, recent advances in gene annotation indicate that high-quality gene models can be produced from combining multiple independent sources of computational evidence. To elevate the quality of TE annotations to a level comparable to that of gene models, we have developed a combined evidence-model TE annotation pipeline, analogous to systems used for gene annotation, by integrating results from multiple homology-based and de novo TE identification methods. As proof of principle, we have annotated "TE models" in Drosophila melanogaster Release 4 genomic sequences using the combined computational evidence derived from RepeatMasker, BLASTER, TBLASTX, all-by-all BLASTN, RECON, TE-HMM and the previous Release 3.1 annotation. Our system is designed for use with the Apollo genome annotation tool, allowing automatic results to be curated manually to produce reliable annotations. The euchromatic TE fraction of D. melanogaster is now estimated at 5.3% (cf. 3.86% in Release 3.1, and we found a substantially higher number of TEs (n = 6,013 than previously identified (n = 1,572. Most of the new TEs derive from small fragments of a few hundred nucleotides long and highly abundant families not previously annotated (e.g., INE-1. We also estimated that 518 TE copies (8.6% are inserted into at least one other TE, forming a nest of elements. The pipeline allows rapid and thorough annotation of even the most complex TE models, including highly deleted and/or nested elements such as those often found in heterochromatic sequences. Our pipeline can be easily adapted to other genome sequences, such as those of the D. melanogaster heterochromatin or other

  2. A Machine Learning Based Analytical Framework for Semantic Annotation Requirements

    OpenAIRE

    Hamed Hassanzadeh; MohammadReza Keyvanpour

    2011-01-01

    The Semantic Web is an extension of the current web in which information is given well-defined meaning. The perspective of Semantic Web is to promote the quality and intelligence of the current web by changing its contents into machine understandable form. Therefore, semantic level information is one of the cornerstones of the Semantic Web. The process of adding semantic metadata to web resources is called Semantic Annotation. There are many obstacles against the Semantic Annotation, such as ...

  3. MIRACLE’s Naive Approach to Medical Images Annotation

    OpenAIRE

    Villena Román, Julio; González Cristóbal, José Carlos; Goñi Menoyo, José Miguel; Martínez Fernández, José Luis

    2005-01-01

    One of the proposed tasks of the ImageCLEF 2005 campaign has been an Automatic Annotation Task. The objective is to provide the classification of a given set of 1,000 previously unseen medical (radiological) images according to 57 predefined categories covering different medical pathologies. 9,000 classified training images are given which can be used in any way to train a classifier. The Automatic Annotation task uses no textual information, but image-content information only. This paper des...

  4. Challenges in Whole-Genome Annotation of Pyrosequenced Eukaryotic Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Kuo, Alan; Grigoriev, Igor

    2009-04-17

    Pyrosequencing technologies such as 454/Roche and Solexa/Illumina vastly lower the cost of nucleotide sequencing compared to the traditional Sanger method, and thus promise to greatly expand the number of sequenced eukaryotic genomes. However, the new technologies also bring new challenges such as shorter reads and new kinds and higher rates of sequencing errors, which complicate genome assembly and gene prediction. At JGI we are deploying 454 technology for the sequencing and assembly of ever-larger eukaryotic genomes. Here we describe our first whole-genome annotation of a purely 454-sequenced fungal genome that is larger than a yeast (>30 Mbp). The pezizomycotine (filamentous ascomycote) Aspergillus carbonarius belongs to the Aspergillus section Nigri species complex, members of which are significant as platforms for bioenergy and bioindustrial technology, as members of soil microbial communities and players in the global carbon cycle, and as agricultural toxigens. Application of a modified version of the standard JGI Annotation Pipeline has so far predicted ~;;10k genes. ~;;12percent of these preliminary annotations suffer a potential frameshift error, which is somewhat higher than the ~;;9percent rate in the Sanger-sequenced and conventionally assembled and annotated genome of fellow Aspergillus section Nigri member A. niger. Also,>90percent of A. niger genes have potential homologs in the A. carbonarius preliminary annotation. Weconclude, and with further annotation and comparative analysis expect to confirm, that 454 sequencing strategies provide a promising substrate for annotation of modestly sized eukaryotic genomes. We will also present results of annotation of a number of other pyrosequenced fungal genomes of bioenergy interest.

  5. Genome Annotation and Transcriptomics of Oil-Producing Algae

    Science.gov (United States)

    2015-03-16

    AFRL-OSR-VA-TR-2015-0103 GENOME ANNOTATION AND TRANSCRIPTOMICS OF OIL-PRODUCING ALGAE Sabeeha Merchant UNIVERSITY OF CALIFORNIA LOS ANGELES Final...2010 To 12-31-2014 4. TITLE AND SUBTITLE GENOME ANNOTATION AND TRANSCRIPTOMICS OF OIL-PRODUCING ALGAE 5a. CONTRACT NUMBER FA9550-10-1-0095 5b...NOTES 14. ABSTRACT Most algae accumulate triacylglycerols (TAGs) when they are starved for essential nutrients like N, S, P (or Si in the case of some

  6. Experimental Polish-Lithuanian Corpus with the Semantic Annotation Elements

    Directory of Open Access Journals (Sweden)

    Danuta Roszko

    2015-06-01

    Full Text Available Experimental Polish-Lithuanian Corpus with the Semantic Annotation Elements In the article the authors present the experimental Polish-Lithuanian corpus (ECorpPL-LT formed for the idea of Polish-Lithuanian theoretical contrastive studies, a Polish-Lithuanian electronic dictionary, and as help for a sworn translator. The semantic annotation being brought into ECorpPL-LT is extremely useful in Polish-Lithuanian contrastive studies, and also proves helpful in translation work.

  7. Analysis of LYSA-calculus with explicit confidentiality annotations

    DEFF Research Database (Denmark)

    Gao, Han; Nielson, Hanne Riis

    2006-01-01

    Recently there has been an increased research interest in applying process calculi in the verification of cryptographic protocols due to their ability to formally model protocols. This work presents LYSA with explicit confidentiality annotations for indicating the expected behavior of target...... malicious activities performed by attackers as specified by the confidentiality annotations. The proposed analysis approach is fully automatic without the need of human intervention and has been applied successfully to a number of protocols....

  8. JAABA: interactive machine learning for automatic annotation of animal behavior

    OpenAIRE

    Kabra, Mayank; Robie, Alice A; Rivera-Alba, Marta; Branson, Steven; Branson, Kristin

    2013-01-01

    We present a machine learning-based system for automatically computing interpretable, quantitative measures of animal behavior. Through our interactive system, users encode their intuition about behavior by annotating a small set of video frames. These manual labels are converted into classifiers that can automatically annotate behaviors in screen-scale data sets. Our general-purpose system can create a variety of accurate individual and social behavior classifiers for different organisms, in...

  9. Annotation Method (AM): SE6_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE6_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  10. Annotation Method (AM): SE7_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE7_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  11. Annotation Method (AM): SE28_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE28_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  12. Annotation Method (AM): SE1_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE1_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  13. Annotation Method (AM): SE20_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE20_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  14. Annotation Method (AM): SE17_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE17_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  15. Annotation Method (AM): SE2_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE2_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  16. Annotation Method (AM): SE9_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE9_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  17. Annotation Method (AM): SE27_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE27_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  18. Annotation Method (AM): SE30_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE30_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  19. Annotation Method (AM): SE33_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE33_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  20. Annotation Method (AM): SE32_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE32_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  1. Annotation Method (AM): SE12_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE12_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  2. Annotation Method (AM): SE3_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE3_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  3. Annotation Method (AM): SE31_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE31_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  4. Annotation Method (AM): SE25_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE25_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  5. Annotation Method (AM): SE13_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE13_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  6. Annotation Method (AM): SE8_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE8_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  7. Annotation Method (AM): SE34_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE34_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  8. Annotation Method (AM): SE35_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE35_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  9. Annotation Method (AM): SE14_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE14_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  10. Annotation Method (AM): SE29_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE29_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  11. Annotation Method (AM): SE36_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE36_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  12. Annotation Method (AM): SE5_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE5_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  13. Annotation Method (AM): SE11_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE11_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  14. Annotation Method (AM): SE16_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE16_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  15. Annotation Method (AM): SE26_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE26_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  16. Annotation Method (AM): SE4_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE4_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  17. Annotation Method (AM): SE10_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE10_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  18. Annotation Method (AM): SE15_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE15_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  19. Annotated outline for the SCP conceptual design report: Office of Geologic Repositories

    International Nuclear Information System (INIS)

    1987-06-01

    The Nuclear Waste Policy Act of 1982 (NWPA) requires that site characterization plans (SCPs) be submitted to the Nuclear Regulatory Commission (NRC), affected States and Indian tribes, and the general public for review and comment prior to the sinking of shafts at a candidate repository site. The SCP is also required by the NRC licensing procedures for the disposal of high-level waste. An Annotated Outline (AO) for Site Characterization Plans (OGR/B-5) has been prepared to provide DOE's standard format and guidance for preparation of SCPs. Consistent with the AO for SCPs. Chapter 6 of the SCP is to provide the requirements and references the media-specific design data base, describe the current design concepts, and discuss design information needs. In order to develop this design information, the Office of Geologic Repositories program is planning a SCP conceptual design phase as part of the overall repository design process. This phase is the first step in the design process, and the result and design can be expected to change as the program moves through the site characterization phase. The Annotated Outline which follows provides the standard format and guidance for the preparation of the SCP Conceptual Design Reports. It is considered to meet the intent of NRC's proposed Generic Technical Position philosophy contained therein. The SCP Conceptual Design Report will be the primary basis for preparation of Chapter 6 of the SCP and will be stand-alone reference document for the SCP. Appendix 1 to this Annotated Outline provides a correlation between Chapter 6 of the SCP and SCP Conceptual Design Report for the information purposes

  20. A survey on annotation tools for the biomedical literature.

    Science.gov (United States)

    Neves, Mariana; Leser, Ulf

    2014-03-01

    New approaches to biomedical text mining crucially depend on the existence of comprehensive annotated corpora. Such corpora, commonly called gold standards, are important for learning patterns or models during the training phase, for evaluating and comparing the performance of algorithms and also for better understanding the information sought for by means of examples. Gold standards depend on human understanding and manual annotation of natural language text. This process is very time-consuming and expensive because it requires high intellectual effort from domain experts. Accordingly, the lack of gold standards is considered as one of the main bottlenecks for developing novel text mining methods. This situation led the development of tools that support humans in annotating texts. Such tools should be intuitive to use, should support a range of different input formats, should include visualization of annotated texts and should generate an easy-to-parse output format. Today, a range of tools which implement some of these functionalities are available. In this survey, we present a comprehensive survey of tools for supporting annotation of biomedical texts. Altogether, we considered almost 30 tools, 13 of which were selected for an in-depth comparison. The comparison was performed using predefined criteria and was accompanied by hands-on experiences whenever possible. Our survey shows that current tools can support many of the tasks in biomedical text annotation in a satisfying manner, but also that no tool can be considered as a true comprehensive solution.

  1. GENCODE: the reference human genome annotation for The ENCODE Project.

    Science.gov (United States)

    Harrow, Jennifer; Frankish, Adam; Gonzalez, Jose M; Tapanari, Electra; Diekhans, Mark; Kokocinski, Felix; Aken, Bronwen L; Barrell, Daniel; Zadissa, Amonida; Searle, Stephen; Barnes, If; Bignell, Alexandra; Boychenko, Veronika; Hunt, Toby; Kay, Mike; Mukherjee, Gaurab; Rajan, Jeena; Despacio-Reyes, Gloria; Saunders, Gary; Steward, Charles; Harte, Rachel; Lin, Michael; Howald, Cédric; Tanzer, Andrea; Derrien, Thomas; Chrast, Jacqueline; Walters, Nathalie; Balasubramanian, Suganthi; Pei, Baikang; Tress, Michael; Rodriguez, Jose Manuel; Ezkurdia, Iakes; van Baren, Jeltje; Brent, Michael; Haussler, David; Kellis, Manolis; Valencia, Alfonso; Reymond, Alexandre; Gerstein, Mark; Guigó, Roderic; Hubbard, Tim J

    2012-09-01

    The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.

  2. Exploiting ontology graph for predicting sparsely annotated gene function.

    Science.gov (United States)

    Wang, Sheng; Cho, Hyunghoon; Zhai, ChengXiang; Berger, Bonnie; Peng, Jian

    2015-06-15

    Systematically predicting gene (or protein) function based on molecular interaction networks has become an important tool in refining and enhancing the existing annotation catalogs, such as the Gene Ontology (GO) database. However, functional labels with only a few (algorithm that independently considers each label faces a paucity of information and thus is prone to capture non-generalizable patterns in the data, resulting in poor predictive performance. There exist a variety of algorithms for function prediction, but none properly address this 'overfitting' issue of sparsely annotated functions, or do so in a manner scalable to tens of thousands of functions in the human catalog. We propose a novel function prediction algorithm, clusDCA, which transfers information between similar functional labels to alleviate the overfitting problem for sparsely annotated functions. Our method is scalable to datasets with a large number of annotations. In a cross-validation experiment in yeast, mouse and human, our method greatly outperformed previous state-of-the-art function prediction algorithms in predicting sparsely annotated functions, without sacrificing the performance on labels with sufficient information. Furthermore, we show that our method can accurately predict genes that will be assigned a functional label that has no known annotations, based only on the ontology graph structure and genes associated with other labels, which further suggests that our method effectively utilizes the similarity between gene functions. https://github.com/wangshenguiuc/clusDCA. © The Author 2015. Published by Oxford University Press.

  3. MetaStorm: A Public Resource for Customizable Metagenomics Annotation

    Science.gov (United States)

    Arango-Argoty, Gustavo; Singh, Gargi; Heath, Lenwood S.; Pruden, Amy; Xiao, Weidong; Zhang, Liqing

    2016-01-01

    Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/), which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution. PMID:27632579

  4. GENCODE: The reference human genome annotation for The ENCODE Project

    Science.gov (United States)

    Harrow, Jennifer; Frankish, Adam; Gonzalez, Jose M.; Tapanari, Electra; Diekhans, Mark; Kokocinski, Felix; Aken, Bronwen L.; Barrell, Daniel; Zadissa, Amonida; Searle, Stephen; Barnes, If; Bignell, Alexandra; Boychenko, Veronika; Hunt, Toby; Kay, Mike; Mukherjee, Gaurab; Rajan, Jeena; Despacio-Reyes, Gloria; Saunders, Gary; Steward, Charles; Harte, Rachel; Lin, Michael; Howald, Cédric; Tanzer, Andrea; Derrien, Thomas; Chrast, Jacqueline; Walters, Nathalie; Balasubramanian, Suganthi; Pei, Baikang; Tress, Michael; Rodriguez, Jose Manuel; Ezkurdia, Iakes; van Baren, Jeltje; Brent, Michael; Haussler, David; Kellis, Manolis; Valencia, Alfonso; Reymond, Alexandre; Gerstein, Mark; Guigó, Roderic; Hubbard, Tim J.

    2012-01-01

    The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers. PMID:22955987

  5. Systematic tissue-specific functional annotation of the human genome highlights immune-related DNA elements for late-onset Alzheimer's disease.

    Directory of Open Access Journals (Sweden)

    Qiongshi Lu

    2017-07-01

    Full Text Available Continuing efforts from large international consortia have made genome-wide epigenomic and transcriptomic annotation data publicly available for a variety of cell and tissue types. However, synthesis of these datasets into effective summary metrics to characterize the functional non-coding genome remains a challenge. Here, we present GenoSkyline-Plus, an extension of our previous work through integration of an expanded set of epigenomic and transcriptomic annotations to produce high-resolution, single tissue annotations. After validating our annotations with a catalog of tissue-specific non-coding elements previously identified in the literature, we apply our method using data from 127 different cell and tissue types to present an atlas of heritability enrichment across 45 different GWAS traits. We show that broader organ system categories (e.g. immune system increase statistical power in identifying biologically relevant tissue types for complex diseases while annotations of individual cell types (e.g. monocytes or B-cells provide deeper insights into disease etiology. Additionally, we use our GenoSkyline-Plus annotations in an in-depth case study of late-onset Alzheimer's disease (LOAD. Our analyses suggest a strong connection between LOAD heritability and genetic variants contained in regions of the genome functional in monocytes. Furthermore, we show that LOAD shares a similar localization of SNPs to monocyte-functional regions with Parkinson's disease. Overall, we demonstrate that integrated genome annotations at the single tissue level provide a valuable tool for understanding the etiology of complex human diseases. Our GenoSkyline-Plus annotations are freely available at http://genocanyon.med.yale.edu/GenoSkyline.

  6. Systematic tissue-specific functional annotation of the human genome highlights immune-related DNA elements for late-onset Alzheimer's disease.

    Science.gov (United States)

    Lu, Qiongshi; Powles, Ryan L; Abdallah, Sarah; Ou, Derek; Wang, Qian; Hu, Yiming; Lu, Yisi; Liu, Wei; Li, Boyang; Mukherjee, Shubhabrata; Crane, Paul K; Zhao, Hongyu

    2017-07-01

    Continuing efforts from large international consortia have made genome-wide epigenomic and transcriptomic annotation data publicly available for a variety of cell and tissue types. However, synthesis of these datasets into effective summary metrics to characterize the functional non-coding genome remains a challenge. Here, we present GenoSkyline-Plus, an extension of our previous work through integration of an expanded set of epigenomic and transcriptomic annotations to produce high-resolution, single tissue annotations. After validating our annotations with a catalog of tissue-specific non-coding elements previously identified in the literature, we apply our method using data from 127 different cell and tissue types to present an atlas of heritability enrichment across 45 different GWAS traits. We show that broader organ system categories (e.g. immune system) increase statistical power in identifying biologically relevant tissue types for complex diseases while annotations of individual cell types (e.g. monocytes or B-cells) provide deeper insights into disease etiology. Additionally, we use our GenoSkyline-Plus annotations in an in-depth case study of late-onset Alzheimer's disease (LOAD). Our analyses suggest a strong connection between LOAD heritability and genetic variants contained in regions of the genome functional in monocytes. Furthermore, we show that LOAD shares a similar localization of SNPs to monocyte-functional regions with Parkinson's disease. Overall, we demonstrate that integrated genome annotations at the single tissue level provide a valuable tool for understanding the etiology of complex human diseases. Our GenoSkyline-Plus annotations are freely available at http://genocanyon.med.yale.edu/GenoSkyline.

  7. Systematic tissue-specific functional annotation of the human genome highlights immune-related DNA elements for late-onset Alzheimer’s disease

    Science.gov (United States)

    Abdallah, Sarah; Ou, Derek; Wang, Qian; Hu, Yiming; Lu, Yisi; Liu, Wei; Li, Boyang; Mukherjee, Shubhabrata; Crane, Paul K.; Zhao, Hongyu

    2017-01-01

    Continuing efforts from large international consortia have made genome-wide epigenomic and transcriptomic annotation data publicly available for a variety of cell and tissue types. However, synthesis of these datasets into effective summary metrics to characterize the functional non-coding genome remains a challenge. Here, we present GenoSkyline-Plus, an extension of our previous work through integration of an expanded set of epigenomic and transcriptomic annotations to produce high-resolution, single tissue annotations. After validating our annotations with a catalog of tissue-specific non-coding elements previously identified in the literature, we apply our method using data from 127 different cell and tissue types to present an atlas of heritability enrichment across 45 different GWAS traits. We show that broader organ system categories (e.g. immune system) increase statistical power in identifying biologically relevant tissue types for complex diseases while annotations of individual cell types (e.g. monocytes or B-cells) provide deeper insights into disease etiology. Additionally, we use our GenoSkyline-Plus annotations in an in-depth case study of late-onset Alzheimer’s disease (LOAD). Our analyses suggest a strong connection between LOAD heritability and genetic variants contained in regions of the genome functional in monocytes. Furthermore, we show that LOAD shares a similar localization of SNPs to monocyte-functional regions with Parkinson’s disease. Overall, we demonstrate that integrated genome annotations at the single tissue level provide a valuable tool for understanding the etiology of complex human diseases. Our GenoSkyline-Plus annotations are freely available at http://genocanyon.med.yale.edu/GenoSkyline. PMID:28742084

  8. Large-scale prokaryotic gene prediction and comparison to genome annotation

    DEFF Research Database (Denmark)

    Nielsen, Pernille; Krogh, Anders Stærmose

    2005-01-01

    -annotated. These results are based on the difference between the number of annotated genes not found by EasyGene and the number of predicted genes that are not annotated in GenBank. We argue that the average performance of our standardized and fully automated method is slightly better than the annotation....... genefinder EasyGene. Comparison of the GenBank and RefSeq annotations with the EasyGene predictions reveals that in some genomes up to 60% of the genes may have been annotated with a wrong start codon, especially in the GC-rich genomes. The fractional difference between annotated and predicted confirms...

  9. Experiments with crowdsourced re-annotation of a POS tagging data set

    DEFF Research Database (Denmark)

    Hovy, Dirk; Plank, Barbara; Søgaard, Anders

    2014-01-01

    Crowdsourcing lets us collect multiple annotations for an item from several annotators. Typically, these are annotations for non-sequential classification tasks. While there has been some work on crowdsourcing named entity annotations, researchers have assumed that syntactic tasks such as part......-of-speech (POS) tagging cannot be crowdsourced. This paper shows that workers can actually annotate sequential data almost as well as experts. Further, we show that the models learned from crowdsourced annotations fare as well as the models learned from expert annotations in downstream tasks....

  10. Gastrointestinal hormone research - with a Scandinavian annotation.

    Science.gov (United States)

    Rehfeld, Jens F

    2015-06-01

    Gastrointestinal hormones are peptides released from neuroendocrine cells in the digestive tract. More than 30 hormone genes are currently known to be expressed in the gut, which makes it the largest hormone-producing organ in the body. Modern biology makes it feasible to conceive the hormones under five headings: The structural homology groups a majority of the hormones into nine families, each of which is assumed to originate from one ancestral gene. The individual hormone gene often has multiple phenotypes due to alternative splicing, tandem organization or differentiated posttranslational maturation of the prohormone. By a combination of these mechanisms, more than 100 different hormonally active peptides are released from the gut. Gut hormone genes are also widely expressed outside the gut, some only in extraintestinal endocrine cells and cerebral or peripheral neurons but others also in other cell types. The extraintestinal cells may release different bioactive fragments of the same prohormone due to cell-specific processing pathways. Moreover, endocrine cells, neurons, cancer cells and, for instance, spermatozoa secrete gut peptides in different ways, so the same peptide may act as a blood-borne hormone, a neurotransmitter, a local growth factor or a fertility factor. The targets of gastrointestinal hormones are specific G-protein-coupled receptors that are expressed in the cell membranes also outside the digestive tract. Thus, gut hormones not only regulate digestive functions, but also constitute regulatory systems operating in the whole organism. This overview of gut hormone biology is supplemented with an annotation on some Scandinavian contributions to gastrointestinal hormone research.

  11. Deep Question Answering for protein annotation.

    Science.gov (United States)

    Gobeill, Julien; Gaudinat, Arnaud; Pasche, Emilie; Vishnyakova, Dina; Gaudet, Pascale; Bairoch, Amos; Ruch, Patrick

    2015-01-01

    Biomedical professionals have access to a huge amount of literature, but when they use a search engine, they often have to deal with too many documents to efficiently find the appropriate information in a reasonable time. In this perspective, question-answering (QA) engines are designed to display answers, which were automatically extracted from the retrieved documents. Standard QA engines in literature process a user question, then retrieve relevant documents and finally extract some possible answers out of these documents using various named-entity recognition processes. In our study, we try to answer complex genomics questions, which can be adequately answered only using Gene Ontology (GO) concepts. Such complex answers cannot be found using state-of-the-art dictionary- and redundancy-based QA engines. We compare the effectiveness of two dictionary-based classifiers for extracting correct GO answers from a large set of 100 retrieved abstracts per question. In the same way, we also investigate the power of GOCat, a GO supervised classifier. GOCat exploits the GOA database to propose GO concepts that were annotated by curators for similar abstracts. This approach is called deep QA, as it adds an original classification step, and exploits curated biological data to infer answers, which are not explicitly mentioned in the retrieved documents. We show that for complex answers such as protein functional descriptions, the redundancy phenomenon has a limited effect. Similarly usual dictionary-based approaches are relatively ineffective. In contrast, we demonstrate how existing curated data, beyond information extraction, can be exploited by a supervised classifier, such as GOCat, to massively improve both the quantity and the quality of the answers with a +100% improvement for both recall and precision. Database URL: http://eagl.unige.ch/DeepQA4PA/. © The Author(s) 2015. Published by Oxford University Press.

  12. COGNIZER: A Framework for Functional Annotation of Metagenomic Datasets.

    Science.gov (United States)

    Bose, Tungadri; Haque, Mohammed Monzoorul; Reddy, Cvsk; Mande, Sharmila S

    2015-01-01

    Recent advances in sequencing technologies have resulted in an unprecedented increase in the number of metagenomes that are being sequenced world-wide. Given their volume, functional annotation of metagenomic sequence datasets requires specialized computational tools/techniques. In spite of having high accuracy, existing stand-alone functional annotation tools necessitate end-users to perform compute-intensive homology searches of metagenomic datasets against "multiple" databases prior to functional analysis. Although, web-based functional annotation servers address to some extent the problem of availability of compute resources, uploading and analyzing huge volumes of sequence data on a shared public web-service has its own set of limitations. In this study, we present COGNIZER, a comprehensive stand-alone annotation framework which enables end-users to functionally annotate sequences constituting metagenomic datasets. The COGNIZER framework provides multiple workflow options. A subset of these options employs a novel directed-search strategy which helps in reducing the overall compute requirements for end-users. The COGNIZER framework includes a cross-mapping database that enables end-users to simultaneously derive/infer KEGG, Pfam, GO, and SEED subsystem information from the COG annotations. Validation experiments performed with real-world metagenomes and metatranscriptomes, generated using diverse sequencing technologies, indicate that the novel directed-search strategy employed in COGNIZER helps in reducing the compute requirements without significant loss in annotation accuracy. A comparison of COGNIZER's results with pre-computed benchmark values indicate the reliability of the cross-mapping database employed in COGNIZER. The COGNIZER framework is capable of comprehensively annotating any metagenomic or metatranscriptomic dataset from varied sequencing platforms in functional terms. Multiple search options in COGNIZER provide end-users the flexibility of

  13. COGNIZER: A Framework for Functional Annotation of Metagenomic Datasets.

    Directory of Open Access Journals (Sweden)

    Tungadri Bose

    Full Text Available Recent advances in sequencing technologies have resulted in an unprecedented increase in the number of metagenomes that are being sequenced world-wide. Given their volume, functional annotation of metagenomic sequence datasets requires specialized computational tools/techniques. In spite of having high accuracy, existing stand-alone functional annotation tools necessitate end-users to perform compute-intensive homology searches of metagenomic datasets against "multiple" databases prior to functional analysis. Although, web-based functional annotation servers address to some extent the problem of availability of compute resources, uploading and analyzing huge volumes of sequence data on a shared public web-service has its own set of limitations. In this study, we present COGNIZER, a comprehensive stand-alone annotation framework which enables end-users to functionally annotate sequences constituting metagenomic datasets. The COGNIZER framework provides multiple workflow options. A subset of these options employs a novel directed-search strategy which helps in reducing the overall compute requirements for end-users. The COGNIZER framework includes a cross-mapping database that enables end-users to simultaneously derive/infer KEGG, Pfam, GO, and SEED subsystem information from the COG annotations.Validation experiments performed with real-world metagenomes and metatranscriptomes, generated using diverse sequencing technologies, indicate that the novel directed-search strategy employed in COGNIZER helps in reducing the compute requirements without significant loss in annotation accuracy. A comparison of COGNIZER's results with pre-computed benchmark values indicate the reliability of the cross-mapping database employed in COGNIZER.The COGNIZER framework is capable of comprehensively annotating any metagenomic or metatranscriptomic dataset from varied sequencing platforms in functional terms. Multiple search options in COGNIZER provide end-users the

  14. Evidence-based gene models for structural and functional annotations of the oil palm genome.

    Science.gov (United States)

    Chan, Kuang-Lim; Tatarinova, Tatiana V; Rosli, Rozana; Amiruddin, Nadzirah; Azizi, Norazah; Halim, Mohd Amin Ab; Sanusi, Nik Shazana Nik Mohd; Jayanthi, Nagappan; Ponomarenko, Petr; Triska, Martin; Solovyev, Victor; Firdaus-Raih, Mohd; Sambanthamurthi, Ravigadevi; Murphy, Denis; Low, Eng-Ti Leslie

    2017-09-08

    Oil palm is an important source of edible oil. The importance of the crop, as well as its long breeding cycle (10-12 years) has led to the sequencing of its genome in 2013 to pave the way for genomics-guided breeding. Nevertheless, the first set of gene predictions, although useful, had many fragmented genes. Classification and characterization of genes associated with traits of interest, such as those for fatty acid biosynthesis and disease resistance, were also limited. Lipid-, especially fatty acid (FA)-related genes are of particular interest for the oil palm as they specify oil yields and quality. This paper presents the characterization of the oil palm genome using different gene prediction methods and comparative genomics analysis, identification of FA biosynthesis and disease resistance genes, and the development of an annotation database and bioinformatics tools. Using two independent gene-prediction pipelines, Fgenesh++ and Seqping, 26,059 oil palm genes with transcriptome and RefSeq support were identified from the oil palm genome. These coding regions of the genome have a characteristic broad distribution of GC 3 (fraction of cytosine and guanine in the third position of a codon) with over half the GC 3 -rich genes (GC 3  ≥ 0.75286) being intronless. In comparison, only one-seventh of the oil palm genes identified are intronless. Using comparative genomics analysis, characterization of conserved domains and active sites, and expression analysis, 42 key genes involved in FA biosynthesis in oil palm were identified. For three of them, namely EgFABF, EgFABH and EgFAD3, segmental duplication events were detected. Our analysis also identified 210 candidate resistance genes in six classes, grouped by their protein domain structures. We present an accurate and comprehensive annotation of the oil palm genome, focusing on analysis of important categories of genes (GC 3 -rich and intronless), as well as those associated with important functions, such as FA

  15. Graph-based sequence annotation using a data integration approach

    Directory of Open Access Journals (Sweden)

    Pesch Robert

    2008-06-01

    Full Text Available The automated annotation of data from high throughput sequencing and genomics experiments is a significant challenge for bioinformatics. Most current approaches rely on sequential pipelines of gene finding and gene function prediction methods that annotate a gene with information from different reference data sources. Each function prediction method contributes evidence supporting a functional assignment. Such approaches generally ignore the links between the information in the reference datasets. These links, however, are valuable for assessing the plausibility of a function assignment and can be used to evaluate the confidence in a prediction. We are working towards a novel annotation system that uses the network of information supporting the function assignment to enrich the annotation process for use by expert curators and predicting the function of previously unannotated genes. In this paper we describe our success in the first stages of this development. We present the data integration steps that are needed to create the core database of integrated reference databases (UniProt, PFAM, PDB, GO and the pathway database Ara- Cyc which has been established in the ONDEX data integration system. We also present a comparison between different methods for integration of GO terms as part of the function assignment pipeline and discuss the consequences of this analysis for improving the accuracy of gene function annotation.

  16. Graph-based sequence annotation using a data integration approach.

    Science.gov (United States)

    Pesch, Robert; Lysenko, Artem; Hindle, Matthew; Hassani-Pak, Keywan; Thiele, Ralf; Rawlings, Christopher; Köhler, Jacob; Taubert, Jan

    2008-08-25

    The automated annotation of data from high throughput sequencing and genomics experiments is a significant challenge for bioinformatics. Most current approaches rely on sequential pipelines of gene finding and gene function prediction methods that annotate a gene with information from different reference data sources. Each function prediction method contributes evidence supporting a functional assignment. Such approaches generally ignore the links between the information in the reference datasets. These links, however, are valuable for assessing the plausibility of a function assignment and can be used to evaluate the confidence in a prediction. We are working towards a novel annotation system that uses the network of information supporting the function assignment to enrich the annotation process for use by expert curators and predicting the function of previously unannotated genes. In this paper we describe our success in the first stages of this development. We present the data integration steps that are needed to create the core database of integrated reference databases (UniProt, PFAM, PDB, GO and the pathway database Ara-Cyc) which has been established in the ONDEX data integration system. We also present a comparison between different methods for integration of GO terms as part of the function assignment pipeline and discuss the consequences of this analysis for improving the accuracy of gene function annotation. The methods and algorithms presented in this publication are an integral part of the ONDEX system which is freely available from http://ondex.sf.net/.

  17. Dizeez: an online game for human gene-disease annotation.

    Science.gov (United States)

    Loguercio, Salvatore; Good, Benjamin M; Su, Andrew I

    2013-01-01

    Structured gene annotations are a foundation upon which many bioinformatics and statistical analyses are built. However the structured annotations available in public databases are a sparse representation of biological knowledge as a whole. The rate of biomedical data generation is such that centralized biocuration efforts struggle to keep up. New models for gene annotation need to be explored that expand the pace at which we are able to structure biomedical knowledge. Recently, online games have emerged as an effective way to recruit, engage and organize large numbers of volunteers to help address difficult biological challenges. For example, games have been successfully developed for protein folding (Foldit), multiple sequence alignment (Phylo) and RNA structure design (EteRNA). Here we present Dizeez, a simple online game built with the purpose of structuring knowledge of gene-disease associations. Preliminary results from game play online and at scientific conferences suggest that Dizeez is producing valid gene-disease annotations not yet present in any public database. These early results provide a basic proof of principle that online games can be successfully applied to the challenge of gene annotation. Dizeez is available at http://genegames.org.

  18. Statistical algorithms for ontology-based annotation of scientific literature.

    Science.gov (United States)

    Chakrabarti, Chayan; Jones, Thomas B; Luger, George F; Xu, Jiawei F; Turner, Matthew D; Laird, Angela R; Turner, Jessica A

    2014-01-01

    Ontologies encode relationships within a domain in robust data structures that can be used to annotate data objects, including scientific papers, in ways that ease tasks such as search and meta-analysis. However, the annotation process requires significant time and effort when performed by humans. Text mining algorithms can facilitate this process, but they render an analysis mainly based upon keyword, synonym and semantic matching. They do not leverage information embedded in an ontology's structure. We present a probabilistic framework that facilitates the automatic annotation of literature by indirectly modeling the restrictions among the different classes in the ontology. Our research focuses on annotating human functional neuroimaging literature within the Cognitive Paradigm Ontology (CogPO). We use an approach that combines the stochastic simplicity of naïve Bayes with the formal transparency of decision trees. Our data structure is easily modifiable to reflect changing domain knowledge. We compare our results across naïve Bayes, Bayesian Decision Trees, and Constrained Decision Tree classifiers that keep a human expert in the loop, in terms of the quality measure of the F1-mirco score. Unlike traditional text mining algorithms, our framework can model the knowledge encoded by the dependencies in an ontology, albeit indirectly. We successfully exploit the fact that CogPO has explicitly stated restrictions, and implicit dependencies in the form of patterns in the expert curated annotations.

  19. DisBlue+: A distributed annotation-based C# compiler

    Directory of Open Access Journals (Sweden)

    Samir E. AbdelRahman

    2010-06-01

    Full Text Available Many programming languages utilize annotations to add useful information to the program but they still result in more tokens to be compiled and hence slower compilation time. Any current distributed compiler breaks the program into scattered disjoint pieces to speed up the compilation. However, these pieces cooperate synchronously and depend highly on each other. This causes massive overhead since messages, symbols, or codes must be roamed throughout the network. This paper presents two promising compilers named annotation-based C# (Blue+ and distributed annotation-based C# (DisBlue+. The proposed Blue+ annotation is based on axiomatic semantics to replace the if/loop constructs. As the developer tends to use many (complex conditions and repeat them in the program, such annotations reduce the compilation scanning time and increases the whole code readability. Built on the top of Blue+, DisBlue+ presents its proposed distributed concept which is to divide each program class to its prototype and definition, as disjoint distributed pieces, such that each class definition is compiled with only its related compiled prototypes (interfaces. Such concept reduces the amount of code transferred over the network, minimizes the dependencies among the disjoint pieces, and removes any possible synchronization between them. To test their efficiencies, Blue+ and DisBlue+ were verified with large-size codes against some existing compilers namely Javac, DJavac, and CDjava.

  20. Automatic annotation of lecture videos for multimedia driven pedagogical platforms

    Directory of Open Access Journals (Sweden)

    Ali Shariq Imran

    2016-12-01

    Full Text Available Today’s eLearning websites are heavily loaded with multimedia contents, which are often unstructured, unedited, unsynchronized, and lack inter-links among different multimedia components. Hyperlinking different media modality may provide a solution for quick navigation and easy retrieval of pedagogical content in media driven eLearning websites. In addition, finding meta-data information to describe and annotate media content in eLearning platforms is challenging, laborious, prone to errors, and time-consuming task. Thus annotations for multimedia especially of lecture videos became an important part of video learning objects. To address this issue, this paper proposes three major contributions namely, automated video annotation, the 3-Dimensional (3D tag clouds, and the hyper interactive presenter (HIP eLearning platform. Combining existing state-of-the-art SIFT together with tag cloud, a novel approach for automatic lecture video annotation for the HIP is proposed. New video annotations are implemented automatically providing the needed random access in lecture videos within the platform, and a 3D tag cloud is proposed as a new way of user interaction mechanism. A preliminary study of the usefulness of the system has been carried out, and the initial results suggest that 70% of the students opted for using HIP as their preferred eLearning platform at Gjøvik University College (GUC.

  1. Annotation of nerve cord transcriptome in earthworm Eisenia fetida

    Directory of Open Access Journals (Sweden)

    Vasanthakumar Ponesakki

    2017-12-01

    Full Text Available In annelid worms, the nerve cord serves as a crucial organ to control the sensory and behavioral physiology. The inadequate genome resource of earthworms has prioritized the comprehensive analysis of their transcriptome dataset to monitor the genes express in the nerve cord and predict their role in the neurotransmission and sensory perception of the species. The present study focuses on identifying the potential transcripts and predicting their functional features by annotating the transcriptome dataset of nerve cord tissues prepared by Gong et al., 2010 from the earthworm Eisenia fetida. Totally 9762 transcripts were successfully annotated against the NCBI nr database using the BLASTX algorithm and among them 7680 transcripts were assigned to a total of 44,354 GO terms. The conserve domain analysis indicated the over representation of P-loop NTPase domain and calcium binding EF-hand domain. The COG functional annotation classified 5860 transcript sequences into 25 functional categories. Further, 4502 contig sequences were found to map with 124 KEGG pathways. The annotated contig dataset exhibited 22 crucial neuropeptides having considerable matches to the marine annelid Platynereis dumerilii, suggesting their possible role in neurotransmission and neuromodulation. In addition, 108 human stem cell marker homologs were identified including the crucial epigenetic regulators, transcriptional repressors and cell cycle regulators, which may contribute to the neuronal and segmental regeneration. The complete functional annotation of this nerve cord transcriptome can be further utilized to interpret genetic and molecular mechanisms associated with neuronal development, nervous system regeneration and nerve cord function.

  2. Annotating abstract pronominal anaphora in the DAD project

    DEFF Research Database (Denmark)

    Navarretta, Costanza; Olsen, Sussi Anni

    2008-01-01

    n this paper we present an extension of the MATE/GNOME annotation scheme for anaphora (Poesio 2004) which accounts for abstract anaphora in Danish and Italian. By abstract anaphora it is here meant pronouns whose linguistic antecedents are verbal phrases, clauses and discourse segments. The exten......n this paper we present an extension of the MATE/GNOME annotation scheme for anaphora (Poesio 2004) which accounts for abstract anaphora in Danish and Italian. By abstract anaphora it is here meant pronouns whose linguistic antecedents are verbal phrases, clauses and discourse segments....... The extended scheme, which we call the DAD annotation scheme, allows to annotate information about abstract anaphora which is important to investigate their use, see Webber (1988), Gundel et al. (2003), Navarretta (2004) and which can influence their automatic treatment. Intercoder agreement scores obtained...... by applying the DAD annotation scheme on texts and dialogues in the two languages are given and show that th information proposed in the scheme can be recognised in a reliable way....

  3. Fast Arc-Annotated Subsequence Matching in Linear Space

    DEFF Research Database (Denmark)

    Bille, Philip; Gørtz, Inge Li

    2012-01-01

    An arc-annotated string is a string of characters, called bases, augmented with a set of pairs, called arcs, each connecting two bases. Given arc-annotated strings P and Q the arc-preserving subsequence problem is to determine if P can be obtained from Q by deleting bases from Q. Whenever a base...... is deleted any arc with an endpoint in that base is also deleted. Arc-annotated strings where the arcs are "nested" are a natural model of RNA molecules that captures both the primary and secondary structure of these. The arc-preserving subsequence problem for nested arc-annotated strings is basic primitive...... the previous time bound while significantly reducing the space from a quadratic term to linear. This is essential to process large RNA molecules where the space is likely to be a bottleneck. To obtain our result we introduce several novel ideas which may be of independent interest for related problems on arc-annotated...

  4. Fast Arc-Annotated Subsequence Matching in Linear Space

    DEFF Research Database (Denmark)

    Bille, Philip; Gørtz, Inge Li

    2010-01-01

    An arc-annotated string is a string of characters, called bases, augmented with a set; of pairs, called arcs, each connecting two bases. Given arc-annotated strings P and Q the arc-preserving subsequence problem is to determine if P can be obtained from Q by deleting bases from Q. Whenever a base...... is deleted any arc with an endpoint in that base is also deleted. Arc-annotated strings where the arcs are "nested" are a natural model of RNA molecules that captures both the primary and secondary structure of these. The arc-preserving subsequence problem for nested arc-annotated strings is basic primitive...... the previous time bound while significantly reducing the space from a quadratic term to linear. This is essential to process large RNA molecules where the space is a likely to be a bottleneck. To obtain our result we introduce several novel ideas which may be of independent interest for related problems on arc-annotated...

  5. Representative proteomes: a stable, scalable and unbiased proteome set for sequence analysis and functional annotation.

    Directory of Open Access Journals (Sweden)

    Chuming Chen

    2011-04-01

    Full Text Available The accelerating growth in the number of protein sequences taxes both the computational and manual resources needed to analyze them. One approach to dealing with this problem is to minimize the number of proteins subjected to such analysis in a way that minimizes loss of information. To this end we have developed a set of Representative Proteomes (RPs, each selected from a Representative Proteome Group (RPG containing similar proteomes calculated based on co-membership in UniRef50 clusters. A Representative Proteome is the proteome that can best represent all the proteomes in its group in terms of the majority of the sequence space and information. RPs at 75%, 55%, 35% and 15% co-membership threshold (CMT are provided to allow users to decrease or increase the granularity of the sequence space based on their requirements. We find that a CMT of 55% (RP55 most closely follows standard taxonomic classifications. Further analysis of this set reveals that sequence space is reduced by more than 80% relative to UniProtKB, while retaining both sequence diversity (over 95% of InterPro domains and annotation information (93% of experimentally characterized proteins. All sets can be browsed and are available for sequence similarity searches and download at http://www.proteininformationresource.org/rps, while the set of 637 RPs determined using a 55% CMT are also available for text searches. Potential applications include sequence similarity searches, protein classification and targeted protein annotation and characterization.

  6. Use of Annotations for Component and Framework Interoperability

    Science.gov (United States)

    David, O.; Lloyd, W.; Carlson, J.; Leavesley, G. H.; Geter, F.

    2009-12-01

    The popular programming languages Java and C# provide annotations, a form of meta-data construct. Software frameworks for web integration, web services, database access, and unit testing now take advantage of annotations to reduce the complexity of APIs and the quantity of integration code between the application and framework infrastructure. Adopting annotation features in frameworks has been observed to lead to cleaner and leaner application code. The USDA Object Modeling System (OMS) version 3.0 fully embraces the annotation approach and additionally defines a meta-data standard for components and models. In version 3.0 framework/model integration previously accomplished using API calls is now achieved using descriptive annotations. This enables the framework to provide additional functionality non-invasively such as implicit multithreading, and auto-documenting capabilities while achieving a significant reduction in the size of the model source code. Using a non-invasive methodology leads to models and modeling components with only minimal dependencies on the modeling framework. Since models and modeling components are not directly bound to framework by the use of specific APIs and/or data types they can more easily be reused both within the framework as well as outside of it. To study the effectiveness of an annotation based framework approach with other modeling frameworks, a framework-invasiveness study was conducted to evaluate the effects of framework design on model code quality. A monthly water balance model was implemented across several modeling frameworks and several software metrics were collected. The metrics selected were measures of non-invasive design methods for modeling frameworks from a software engineering perspective. It appears that the use of annotations positively impacts several software quality measures. In a next step, the PRMS model was implemented in OMS 3.0 and is currently being implemented for water supply forecasting in the

  7. Annotation-Based Whole Genomic Prediction and Selection

    DEFF Research Database (Denmark)

    Kadarmideen, Haja; Do, Duy Ngoc; Janss, Luc

    Genomic selection is widely used in both animal and plant species, however, it is performed with no input from known genomic or biological role of genetic variants and therefore is a black box approach in a genomic era. This study investigated the role of different genomic regions and detected QTLs...... in their contribution to estimated genomic variances and in prediction of genomic breeding values by applying SNP annotation approaches to feed efficiency. Ensembl Variant Predictor (EVP) and Pig QTL database were used as the source of genomic annotation for 60K chip. Genomic prediction was performed using the Bayes...... classes. Predictive accuracy was 0.531, 0.532, 0.302, and 0.344 for DFI, RFI, ADG and BF, respectively. The contribution per SNP to total genomic variance was similar among annotated classes across different traits. Predictive performance of SNP classes did not significantly differ from randomized SNP...

  8. Annotating abstract pronominal anaphora in the DAD project

    DEFF Research Database (Denmark)

    Navarretta, Costanza; Olsen, Sussi Anni

    2008-01-01

    n this paper we present an extension of the MATE/GNOME annotation scheme for anaphora (Poesio 2004) which accounts for abstract anaphora in Danish and Italian. By abstract anaphora it is here meant pronouns whose linguistic antecedents are verbal phrases, clauses and discourse segments. The exten......n this paper we present an extension of the MATE/GNOME annotation scheme for anaphora (Poesio 2004) which accounts for abstract anaphora in Danish and Italian. By abstract anaphora it is here meant pronouns whose linguistic antecedents are verbal phrases, clauses and discourse segments...... by applying the DAD annotation scheme on texts and dialogues in the two languages are given and show that th information proposed in the scheme can be recognised in a reliable way....

  9. META2: Intercellular DNA Methylation Pairwise Annotation and Integrative Analysis

    Directory of Open Access Journals (Sweden)

    Binhua Tang

    2016-01-01

    Full Text Available Genome-wide deciphering intercellular differential DNA methylation as well as its roles in transcriptional regulation remains elusive in cancer epigenetics. Here we developed a toolkit META2 for DNA methylation annotation and analysis, which aims to perform integrative analysis on differentially methylated loci and regions through deep mining and statistical comparison methods. META2 contains multiple versatile functions for investigating and annotating DNA methylation profiles. Benchmarked with T-47D cell, we interrogated the association within differentially methylated CpG (DMC and region (DMR candidate count and region length and identified major transition zones as clues for inferring statistically significant DMRs; together we validated those DMRs with the functional annotation. Thus META2 can provide a comprehensive analysis approach for epigenetic research and clinical study.

  10. Code Generation for Protocols from CPN models Annotated with Pragmatics

    DEFF Research Database (Denmark)

    Simonsen, Kent Inge; Kristensen, Lars Michael; Kindler, Ekkart

    of the same model and sufficiently detailed to serve as a basis for automated code generation when annotated with code generation pragmatics. Pragmatics are syntactical annotations designed to make the CPN models descriptive and to address the problem that models with enough details for generating code from...... them tend to be verbose and cluttered. Our code generation approach consists of three main steps, starting from a CPN model that the modeller has annotated with a set of pragmatics that make the protocol structure and the control-flow explicit. The first step is to compute for the CPN model, a set...... of derived pragmatics that identify control-flow structures and operations, e. g., for sending and receiving packets, and for manipulating the state. In the second step, an abstract template tree (ATT) is constructed providing an association between pragmatics and code generation templates. The ATT...

  11. Processing sequence annotation data using the Lua programming language.

    Science.gov (United States)

    Ueno, Yutaka; Arita, Masanori; Kumagai, Toshitaka; Asai, Kiyoshi

    2003-01-01

    The data processing language in a graphical software tool that manages sequence annotation data from genome databases should provide flexible functions for the tasks in molecular biology research. Among currently available languages we adopted the Lua programming language. It fulfills our requirements to perform computational tasks for sequence map layouts, i.e. the handling of data containers, symbolic reference to data, and a simple programming syntax. Upon importing a foreign file, the original data are first decomposed in the Lua language while maintaining the original data schema. The converted data are parsed by the Lua interpreter and the contents are stored in our data warehouse. Then, portions of annotations are selected and arranged into our catalog format to be depicted on the sequence map. Our sequence visualization program was successfully implemented, embedding the Lua language for processing of annotation data and layout script. The program is available at http://staff.aist.go.jp/yutaka.ueno/guppy/.

  12. An Atlas of annotations of Hydra vulgaris transcriptome.

    Science.gov (United States)

    Evangelista, Daniela; Tripathi, Kumar Parijat; Guarracino, Mario Rosario

    2016-09-22

    RNA sequencing takes advantage of the Next Generation Sequencing (NGS) technologies for analyzing RNA transcript counts with an excellent accuracy. Trying to interpret this huge amount of data in biological information is still a key issue, reason for which the creation of web-resources useful for their analysis is highly desiderable. Starting from a previous work, Transcriptator, we present the Atlas of Hydra's vulgaris, an extensible web tool in which its complete transcriptome is annotated. In order to provide to the users an advantageous resource that include the whole functional annotated transcriptome of Hydra vulgaris water polyp, we implemented the Atlas web-tool contains 31.988 accesible and downloadable transcripts of this non-reference model organism. Atlas, as a freely available resource, can be considered a valuable tool to rapidly retrieve functional annotation for transcripts differentially expressed in Hydra vulgaris exposed to the distinct experimental treatments. WEB RESOURCE URL: http://www-labgtp.na.icar.cnr.it/Atlas .

  13. Rfam: annotating families of non-coding RNA sequences.

    Science.gov (United States)

    Daub, Jennifer; Eberhardt, Ruth Y; Tate, John G; Burge, Sarah W

    2015-01-01

    The primary task of the Rfam database is to collate experimentally validated noncoding RNA (ncRNA) sequences from the published literature and facilitate the prediction and annotation of new homologues in novel nucleotide sequences. We group homologous ncRNA sequences into "families" and related families are further grouped into "clans." We collate and manually curate data cross-references for these families from other databases and external resources. Our Web site offers researchers a simple interface to Rfam and provides tools with which to annotate their own sequences using our covariance models (CMs), through our tools for searching, browsing, and downloading information on Rfam families. In this chapter, we will work through examples of annotating a query sequence, collating family information, and searching for data.

  14. Automatically Annotated Mapping for Indoor Mobile Robot Applications

    DEFF Research Database (Denmark)

    Özkil, Ali Gürcan; Howard, Thomas J.

    2012-01-01

    This paper presents a new and practical method for mapping and annotating indoor environments for mobile robot use. The method makes use of 2D occupancy grid maps for metric representation, and topology maps to indicate the connectivity of the ‘places-of-interests’ in the environment. Novel use...... localization and mapping in topology space, and fuses camera and robot pose estimations to build an automatically annotated global topo-metric map. It is developed as a framework for a hospital service robot and tested in a real hospital. Experiments show that the method is capable of producing globally...... consistent, automatically annotated hybrid metric-topological maps that is needed by mobile service robots....

  15. Metafier - a Tool for Annotating and Structuring Building Metadata

    DEFF Research Database (Denmark)

    Holmegaard, Emil; Johansen, Aslak; Kjærgaard, Mikkel Baun

    2017-01-01

    , describing the instrumentation of the building. We have created Metafier, a tool for annotating and structuring metadata for buildings. Metafier optimizes the workflow of establishing metadata for buildings by enabling a human-in-the-loop to validate, search and group points. We have evaluated Metafier...... for two buildings, with different sizes, locations, ages and purposes. The evaluation was performed as a user test with three subjects with different backgrounds. The evaluation results indicates that the tool enabled the users to validate, search and group points while annotating metadata. One challenge...... is to get users to understand the concept of metadata for the tool to be useable. Based on our evaluation, we have listed guidelines for creating a tool for annotating building metadata....

  16. Image annotation based on positive-negative instances learning

    Science.gov (United States)

    Zhang, Kai; Hu, Jiwei; Liu, Quan; Lou, Ping

    2017-07-01

    Automatic image annotation is now a tough task in computer vision, the main sense of this tech is to deal with managing the massive image on the Internet and assisting intelligent retrieval. This paper designs a new image annotation model based on visual bag of words, using the low level features like color and texture information as well as mid-level feature as SIFT, and mixture the pic2pic, label2pic and label2label correlation to measure the correlation degree of labels and images. We aim to prune the specific features for each single label and formalize the annotation task as a learning process base on Positive-Negative Instances Learning. Experiments are performed using the Corel5K Dataset, and provide a quite promising result when comparing with other existing methods.

  17. Annotating smart environment sensor data for activity learning.

    Science.gov (United States)

    Szewcyzk, S; Dwan, K; Minor, B; Swedlove, B; Cook, D

    2009-01-01

    The pervasive sensing technologies found in smart homes offer unprecedented opportunities for providing health monitoring and assistance to individuals experiencing difficulties living independently at home. In order to monitor the functional health of smart home residents, we need to design technologies that recognize and track the activities that people perform at home. Machine learning techniques can perform this task, but the software algorithms rely upon large amounts of sample data that is correctly labeled with the corresponding activity. Labeling, or annotating, sensor data with the corresponding activity can be time consuming, may require input from the smart home resident, and is often inaccurate. Therefore, in this paper we investigate four alternative mechanisms for annotating sensor data with a corresponding activity label. We evaluate the alternative methods along the dimensions of annotation time, resident burden, and accuracy using sensor data collected in a real smart apartment.

  18. Automatically Annotated Mapping for Indoor Mobile Robot Applications

    DEFF Research Database (Denmark)

    Özkil, Ali Gürcan; Howard, Thomas J.

    2012-01-01

    localization and mapping in topology space, and fuses camera and robot pose estimations to build an automatically annotated global topo-metric map. It is developed as a framework for a hospital service robot and tested in a real hospital. Experiments show that the method is capable of producing globally...... consistent, automatically annotated hybrid metric-topological maps that is needed by mobile service robots.......This paper presents a new and practical method for mapping and annotating indoor environments for mobile robot use. The method makes use of 2D occupancy grid maps for metric representation, and topology maps to indicate the connectivity of the ‘places-of-interests’ in the environment. Novel use...

  19. Tagging like Humans: Diverse and Distinct Image Annotation

    KAUST Repository

    Wu, Baoyuan

    2018-03-31

    In this work we propose a new automatic image annotation model, dubbed {\\\\bf diverse and distinct image annotation} (D2IA). The generative model D2IA is inspired by the ensemble of human annotations, which create semantically relevant, yet distinct and diverse tags. In D2IA, we generate a relevant and distinct tag subset, in which the tags are relevant to the image contents and semantically distinct to each other, using sequential sampling from a determinantal point process (DPP) model. Multiple such tag subsets that cover diverse semantic aspects or diverse semantic levels of the image contents are generated by randomly perturbing the DPP sampling process. We leverage a generative adversarial network (GAN) model to train D2IA. Extensive experiments including quantitative and qualitative comparisons, as well as human subject studies, on two benchmark datasets demonstrate that the proposed model can produce more diverse and distinct tags than the state-of-the-arts.

  20. BRILIA: Integrated Tool for High-Throughput Annotation and Lineage Tree Assembly of B-Cell Repertoires

    Science.gov (United States)

    Lee, Donald W.; Khavrutskii, Ilja V.; Wallqvist, Anders; Bavari, Sina; Cooper, Christopher L.; Chaudhury, Sidhartha

    2017-01-01

    The somatic diversity of antigen-recognizing B-cell receptors (BCRs) arises from Variable (V), Diversity (D), and Joining (J) (VDJ) recombination and somatic hypermutation (SHM) during B-cell development and affinity maturation. The VDJ junction of the BCR heavy chain forms the highly variable complementarity determining region 3 (CDR3), which plays a critical role in antigen specificity and binding affinity. Tracking the selection and mutation of the CDR3 can be useful in characterizing humoral responses to infection and vaccination. Although tens to hundreds of thousands of unique BCR genes within an expressed B-cell repertoire can now be resolved with high-throughput sequencing, tracking SHMs is still challenging because existing annotation methods are often limited by poor annotation coverage, inconsistent SHM identification across the VDJ junction, or lack of B-cell lineage data. Here, we present B-cell repertoire inductive lineage and immunosequence annotator (BRILIA), an algorithm that leverages repertoire-wide sequencing data to globally improve the VDJ annotation coverage, lineage tree assembly, and SHM identification. On benchmark tests against simulated human and mouse BCR repertoires, BRILIA correctly annotated germline and clonally expanded sequences with 94 and 70% accuracy, respectively, and it has a 90% SHM-positive prediction rate in the CDR3 of heavily mutated sequences; these are substantial improvements over existing methods. We used BRILIA to process BCR sequences obtained from splenic germinal center B cells extracted from C57BL/6 mice. BRILIA returned robust B-cell lineage trees and yielded SHM patterns that are consistent across the VDJ junction and agree with known biological mechanisms of SHM. By contrast, existing BCR annotation tools, which do not account for repertoire-wide clonal relationships, systematically underestimated both the size of clonally related B-cell clusters and yielded inconsistent SHM frequencies. We demonstrate

  1. Smart Annotation of Cyclic Data Using Hierarchical Hidden Markov Models

    Directory of Open Access Journals (Sweden)

    Christine F. Martindale

    2017-10-01

    Full Text Available Cyclic signals are an intrinsic part of daily life, such as human motion and heart activity. The detailed analysis of them is important for clinical applications such as pathological gait analysis and for sports applications such as performance analysis. Labeled training data for algorithms that analyze these cyclic data come at a high annotation cost due to only limited annotations available under laboratory conditions or requiring manual segmentation of the data under less restricted conditions. This paper presents a smart annotation method that reduces this cost of labeling for sensor-based data, which is applicable to data collected outside of strict laboratory conditions. The method uses semi-supervised learning of sections of cyclic data with a known cycle number. A hierarchical hidden Markov model (hHMM is used, achieving a mean absolute error of 0.041 ± 0.020 s relative to a manually-annotated reference. The resulting model was also used to simultaneously segment and classify continuous, ‘in the wild’ data, demonstrating the applicability of using hHMM, trained on limited data sections, to label a complete dataset. This technique achieved comparable results to its fully-supervised equivalent. Our semi-supervised method has the significant advantage of reduced annotation cost. Furthermore, it reduces the opportunity for human error in the labeling process normally required for training of segmentation algorithms. It also lowers the annotation cost of training a model capable of continuous monitoring of cycle characteristics such as those employed to analyze the progress of movement disorders or analysis of running technique.

  2. Incorporating Non-Coding Annotations into Rare Variant Analysis.

    Directory of Open Access Journals (Sweden)

    Tom G Richardson

    Full Text Available The success of collapsing methods which investigate the combined effect of rare variants on complex traits has so far been limited. The manner in which variants within a gene are selected prior to analysis has a crucial impact on this success, which has resulted in analyses conventionally filtering variants according to their consequence. This study investigates whether an alternative approach to filtering, using annotations from recently developed bioinformatics tools, can aid these types of analyses in comparison to conventional approaches.We conducted a candidate gene analysis using the UK10K sequence and lipids data, filtering according to functional annotations using the resource CADD (Combined Annotation-Dependent Depletion and contrasting results with 'nonsynonymous' and 'loss of function' consequence analyses. Using CADD allowed the inclusion of potentially deleterious intronic variants, which was not possible when filtering by consequence. Overall, different filtering approaches provided similar evidence of association, although filtering according to CADD identified evidence of association between ANGPTL4 and High Density Lipoproteins (P = 0.02, N = 3,210 which was not observed in the other analyses. We also undertook genome-wide analyses to determine how filtering in this manner compared to conventional approaches for gene regions. Results suggested that filtering by annotations according to CADD, as well as other tools known as FATHMM-MKL and DANN, identified association signals not detected when filtering by variant consequence and vice versa.Incorporating variant annotations from non-coding bioinformatics tools should prove to be a valuable asset for rare variant analyses in the future. Filtering by variant consequence is only possible in coding regions of the genome, whereas utilising non-coding bioinformatics annotations provides an opportunity to discover unknown causal variants in non-coding regions as well. This should allow

  3. ONEMercury: Towards Automatic Annotation of Earth Science Metadata

    Science.gov (United States)

    Tuarob, S.; Pouchard, L. C.; Noy, N.; Horsburgh, J. S.; Palanisamy, G.

    2012-12-01

    Earth sciences have become more data-intensive, requiring access to heterogeneous data collected from multiple places, times, and thematic scales. For example, research on climate change may involve exploring and analyzing observational data such as the migration of animals and temperature shifts across the earth, as well as various model-observation inter-comparison studies. Recently, DataONE, a federated data network built to facilitate access to and preservation of environmental and ecological data, has come to exist. ONEMercury has recently been implemented as part of the DataONE project to serve as a portal for discovering and accessing environmental and observational data across the globe. ONEMercury harvests metadata from the data hosted by multiple data repositories and makes it searchable via a common search interface built upon cutting edge search engine technology, allowing users to interact with the system, intelligently filter the search results on the fly, and fetch the data from distributed data sources. Linking data from heterogeneous sources always has a cost. A problem that ONEMercury faces is the different levels of annotation in the harvested metadata records. Poorly annotated records tend to be missed during the search process as they lack meaningful keywords. Furthermore, such records would not be compatible with the advanced search functionality offered by ONEMercury as the interface requires a metadata record be semantically annotated. The explosion of the number of metadata records harvested from an increasing number of data repositories makes it impossible to annotate the harvested records manually, urging the need for a tool capable of automatically annotating poorly curated metadata records. In this paper, we propose a topic-model (TM) based approach for automatic metadata annotation. Our approach mines topics in the set of well annotated records and suggests keywords for poorly annotated records based on topic similarity. We utilize the

  4. Creating video-annotated discussions: An asynchronous alternative

    Directory of Open Access Journals (Sweden)

    Craig D. Howard

    2010-01-01

    Full Text Available In this article the authors illustrate the design and development of a pedagogical intervention using video annotations in a pre-service teacher education courrse. An annotation platform was selected and video was shot to create a video backdrop on which asynchronous discussions would take place. The article addresses design considerations in the selection of video, the editing process, and the development of a tutorial to lead learners through their first experience with this form of discussion. Learner participation samples were collected, and an analysis of the design process concludes the article.

  5. SmashCommunity: A metagenomic annotation and analysis tool

    DEFF Research Database (Denmark)

    Arumugam, Manimozhiyan; Harrington, Eoghan D; Foerstner, Konrad U

    2010-01-01

    SUMMARY: SmashCommunity is a stand-alone metagenomic annotation and analysis pipeline suitable for data from Sanger and 454 sequencing technologies. It supports state-of-the-art software for essential metagenomic tasks such as assembly and gene prediction. It provides tools to estimate the quanti......SUMMARY: SmashCommunity is a stand-alone metagenomic annotation and analysis pipeline suitable for data from Sanger and 454 sequencing technologies. It supports state-of-the-art software for essential metagenomic tasks such as assembly and gene prediction. It provides tools to estimate...

  6. VASP-E: Specificity Annotation with a Volumetric Analysis of Electrostatic Isopotentials

    Science.gov (United States)

    Chen, Brian Y.

    2014-01-01

    Algorithms for comparing protein structure are frequently used for function annotation. By searching for subtle similarities among very different proteins, these algorithms can identify remote homologs with similar biological functions. In contrast, few comparison algorithms focus on specificity annotation, where the identification of subtle differences among very similar proteins can assist in finding small structural variations that create differences in binding specificity. Few specificity annotation methods consider electrostatic fields, which play a critical role in molecular recognition. To fill this gap, this paper describes VASP-E (Volumetric Analysis of Surface Properties with Electrostatics), a novel volumetric comparison tool based on the electrostatic comparison of protein-ligand and protein-protein binding sites. VASP-E exploits the central observation that three dimensional solids can be used to fully represent and compare both electrostatic isopotentials and molecular surfaces. With this integrated representation, VASP-E is able to dissect the electrostatic environments of protein-ligand and protein-protein binding interfaces, identifying individual amino acids that have an electrostatic influence on binding specificity. VASP-E was used to examine a nonredundant subset of the serine and cysteine proteases as well as the barnase-barstar and Rap1a-raf complexes. Based on amino acids established by various experimental studies to have an electrostatic influence on binding specificity, VASP-E identified electrostatically influential amino acids with 100% precision and 83.3% recall. We also show that VASP-E can accurately classify closely related ligand binding cavities into groups with different binding preferences. These results suggest that VASP-E should prove a useful tool for the characterization of specific binding and the engineering of binding preferences in proteins. PMID:25166865

  7. Methods for eliciting, annotating, and analyzing databases for child speech development.

    Science.gov (United States)

    Beckman, Mary E; Plummer, Andrew R; Munson, Benjamin; Reidy, Patrick F

    2017-09-01

    Methods from automatic speech recognition (ASR), such as segmentation and forced alignment, have facilitated the rapid annotation and analysis of very large adult speech databases and databases of caregiver-infant interaction, enabling advances in speech science that were unimaginable just a few decades ago. This paper centers on two main problems that must be addressed in order to have analogous resources for developing and exploiting databases of young children's speech. The first problem is to understand and appreciate the differences between adult and child speech that cause ASR models developed for adult speech to fail when applied to child speech. These differences include the fact that children's vocal tracts are smaller than those of adult males and also changing rapidly in size and shape over the course of development, leading to between-talker variability across age groups that dwarfs the between-talker differences between adult men and women. Moreover, children do not achieve fully adult-like speech motor control until they are young adults, and their vocabularies and phonological proficiency are developing as well, leading to considerably more within-talker variability as well as more between-talker variability. The second problem then is to determine what annotation schemas and analysis techniques can most usefully capture relevant aspects of this variability. Indeed, standard acoustic characterizations applied to child speech reveal that adult-centered annotation schemas fail to capture phenomena such as the emergence of covert contrasts in children's developing phonological systems, while also revealing children's nonuniform progression toward community speech norms as they acquire the phonological systems of their native languages. Both problems point to the need for more basic research into the growth and development of the articulatory system (as well as of the lexicon and phonological system) that is oriented explicitly toward the construction of

  8. Statistical approaches to use a model organism for regulatory sequences annotation of newly sequenced species.

    Directory of Open Access Journals (Sweden)

    Pietro Liò

    Full Text Available A major goal of bioinformatics is the characterization of transcription factors and the transcriptional programs they regulate. Given the speed of genome sequencing, we would like to quickly annotate regulatory sequences in newly-sequenced genomes. In such cases, it would be helpful to predict sequence motifs by using experimental data from closely related model organism. Here we present a general algorithm that allow to identify transcription factor binding sites in one newly sequenced species by performing Bayesian regression on the annotated species. First we set the rationale of our method by applying it within the same species, then we extend it to use data available in closely related species. Finally, we generalise the method to handle the case when a certain number of experiments, from several species close to the species on which to make inference, are available. In order to show the performance of the method, we analyse three functionally related networks in the Ascomycota. Two gene network case studies are related to the G2/M phase of the Ascomycota cell cycle; the third is related to morphogenesis. We also compared the method with MatrixReduce and discuss other types of validation and tests. The first network is well known and provides a biological validation test of the method. The two cell cycle case studies, where the gene network size is conserved, demonstrate an effective utility in annotating new species sequences using all the available replicas from model species. The third case, where the gene network size varies among species, shows that the combination of information is less powerful but is still informative. Our methodology is quite general and could be extended to integrate other high-throughput data from model organisms.

  9. Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles.

    Science.gov (United States)

    Cohen, K Bretonnel; Lanfranchi, Arrick; Choi, Miji Joo-Young; Bada, Michael; Baumgartner, William A; Panteleyeva, Natalya; Verspoor, Karin; Palmer, Martha; Hunter, Lawrence E

    2017-08-17

    Coreference resolution is the task of finding strings in text that have the same referent as other strings. Failures of coreference resolution are a common cause of false negatives in information extraction from the scientific literature. In order to better understand the nature of the phenomenon of coreference in biomedical publications and to increase performance on the task, we annotated the Colorado Richly Annotated Full Text (CRAFT) corpus with coreference relations. The corpus was manually annotated with coreference relations, including identity and appositives for all coreferring base noun phrases. The OntoNotes annotation guidelines, with minor adaptations, were used. Interannotator agreement ranges from 0.480 (entity-based CEAF) to 0.858 (Class-B3), depending on the metric that is used to assess it. The resulting corpus adds nearly 30,000 annotations to the previous release of the CRAFT corpus. Differences from related projects include a much broader definition of markables, connection to extensive annotation of several domain-relevant semantic classes, and connection to complete syntactic annotation. Tool performance was benchmarked on the data. A publicly available out-of-the-box, general-domain coreference resolution system achieved an F-measure of 0.14 (B3), while a simple domain-adapted rule-based system achieved an F-measure of 0.42. An ensemble of the two reached F of 0.46. Following the IDENTITY chains in the data would add 106,263 additional named entities in the full 97-paper corpus, for an increase of 76% percent in the semantic classes of the eight ontologies that have been annotated in earlier versions of the CRAFT corpus. The project produced a large data set for further investigation of coreference and coreference resolution in the scientific literature. The work raised issues in the phenomenon of reference in this domain and genre, and the paper proposes that many mentions that would be considered generic in the general domain are not

  10. Putative drug and vaccine target protein identification using comparative genomic analysis of KEGG annotated metabolic pathways of Mycoplasma hyopneumoniae.

    Science.gov (United States)

    Damte, Dereje; Suh, Joo-Won; Lee, Seung-Jin; Yohannes, Sileshi Belew; Hossain, Md Akil; Park, Seung-Chun

    2013-07-01

    In the present study, a computational comparative and subtractive genomic/proteomic analysis aimed at the identification of putative therapeutic target and vaccine candidate proteins from Kyoto Encyclopedia of Genes and Genomes (KEGG) annotated metabolic pathways of Mycoplasma hyopneumoniae was performed for drug design and vaccine production pipelines against M.hyopneumoniae. The employed comparative genomic and metabolic pathway analysis with a predefined computational systemic workflow extracted a total of 41 annotated metabolic pathways from KEGG among which five were unique to M. hyopneumoniae. A total of 234 proteins were identified to be involved in these metabolic pathways. Although 125 non homologous and predicted essential proteins were found from the total that could serve as potential drug targets and vaccine candidates, additional prioritizing parameters characterize 21 proteins as vaccine candidate while druggability of each of the identified proteins evaluated by the DrugBank database prioritized 42 proteins suitable for drug targets. Copyright © 2013 Elsevier Inc. All rights reserved.

  11. Annotation-based enrichment of Digital Objects using open-source frameworks

    Directory of Open Access Journals (Sweden)

    Marcus Emmanuel Barnes

    2017-07-01

    Full Text Available The W3C Web Annotation Data Model, Protocol, and Vocabulary unify approaches to annotations across the web, enabling their aggregation, discovery and persistence over time. In addition, new javascript libraries provide the ability for users to annotate multi-format content. In this paper, we describe how we have leveraged these developments to provide annotation features alongside Islandora’s existing preservation, access, and management capabilities. We also discuss our experience developing with the Web Annotation Model as an open web architecture standard, as well as our approach to integrating mature external annotation libraries. The resulting software (the Web Annotation Utility Module for Islandora accommodates annotation across multiple formats. This solution can be used in various digital scholarship contexts.

  12. A Generative Theory of Textbook Design: Using Annotated Illustrations To Foster Meaningful Learning of Science Text.

    Science.gov (United States)

    Mayer, Richard E.; And Others

    1995-01-01

    Explains a generative theory of textbook design and describes three experiments that compared college students' solutions on transfer problems after reading science texts with illustrations adjacent to corresponding text and including annotations, and illustrations separated from text without annotations. (LRW)

  13. Annotated Bibliography of Research in the Teaching of English

    Science.gov (United States)

    Beach, Richard; Bigelow, Martha; Dillon, Deborah; Dockter, Jessie; Galda, Lee; Helman, Lori; Kapoor, Richa; Ngo, Bic; O'Brien, David; Sato, Mistilina; Scharber, Cassie; Jorgensen, Karen; Liang, Lauren; Braaksma, Martine; Janssen, Tanja

    2009-01-01

    This article presents an annotated bibliography of research works about digital/technology tools for literacy instruction, discourse/cultural analysis, literacy, literary response/literature/narrative, media-information literacy/media use, professional development/teacher education related to English/language arts, reading, second language…

  14. Chicano Literature for Young Adults: An Annotated Bibliography.

    Science.gov (United States)

    Frankson, Marie Stewart

    1990-01-01

    Offers an 83-item annotated bibliography of works by Chicano authors suitable for high school students. Cites novels, short stories, poetry, drama, works in the oral tradition, anthologies, and biographies which are still in print and which are bilingual or mostly in English. Describes sources found to be most helpful in compiling this…

  15. Vind(x): Using the user through cooperative annotation

    NARCIS (Netherlands)

    Williams, A.D.; Vuurpijl, Louis; Schomaker, Lambert; van den Broek, Egon

    2002-01-01

    In this paper, the image retrieval system Vind(x) is described. The architecture of the system and first user experiences are reported. Using Vind(x), users on the Internet may cooperatively annotate objects in paintings by use of the pen or mouse. The collected data can be searched through

  16. The Performance Career of Charles Dickens: An Annotated Bibliography.

    Science.gov (United States)

    Gentile, John Samuel

    Offered in response to the broad appeal of Charles Dickens's performance career to various disciplines, this annotated bibliography lists 40 resources concerned with Dickens's success as a performer interpreting his literary works. The resources are categorized under books, theses and dissertations, articles in scholarly journals, nineteenth…

  17. Asian American Literature of Hawaii: An Annotated Bibliography.

    Science.gov (United States)

    Hiura, Arnold T.; Sumida, Stephen H.

    This annotated bibllography focuses on the drama, prose fiction, and poetry of people of Chinese, Japanese, Korean, and Filipino descent in Hawaii. All works cited were written in English, between the 1920s and 1970, with the exception of poems translated into English by their authors. The bibliography begins with an overview of the cultural and…

  18. Using Online Annotations to Support Error Correction and Corrective Feedback

    Science.gov (United States)

    Yeh, Shiou-Wen; Lo, Jia-Jiunn

    2009-01-01

    Giving feedback on second language (L2) writing is a challenging task. This research proposed an interactive environment for error correction and corrective feedback. First, we developed an online corrective feedback and error analysis system called "Online Annotator for EFL Writing". The system consisted of five facilities: Document Maker,…

  19. Freedom of Speech: A Selected, Annotated Basic Bibliography.

    Science.gov (United States)

    Tedford, Thomas L.

    Restricted to books on freedom of speech, this annotated bibliography offers a list of 38 references pertinent to the subject. Also included is a list of 18 ERIC documents on freedom of speech, and information on how to order them. (JC)

  20. Folklore of the North American Indians. An Annotated Bibliography.

    Science.gov (United States)

    Ullom, Judith C., Comp.

    Intended for compilers or retellers of folktales, for storytellers or librarians serving children, or for children themselves, the annotated bibliography contains references to 152 sources of North American Indian folktales. Sources in the non-comprehensive bibliography were selected on the basis of (1) a statement of sources and faithfulness to…

  1. Adolescent Literacy Resources: An Annotated Bibliography. Second Edition 2009

    Science.gov (United States)

    Center on Instruction, 2009

    2009-01-01

    This annotated bibliography updated from a 2007 edition, is intended as a resource for technical assistance providers as they work with states on adolescent literacy. This revision includes current research and documents of practical use in guiding improvements in grades 4-12 reading instruction in the content areas and in interventions for…

  2. Different Approaches to Automatic Polarity Annotation at Synset Level

    NARCIS (Netherlands)

    Maks, E.; Vossen, P.T.J.M.; Sagot, B.

    2011-01-01

    In this paper we explore two approaches for the automatic annotation of polarity (positive, negative and neutral) of adjective synsets in Dutch. Both approaches focus on the creation of a Dutch polarity lexicon at word sense level using wordnet as a lexical resource. The first method is based upon

  3. Sequence-based feature prediction and annotation of proteins

    DEFF Research Database (Denmark)

    Juncker, Agnieszka; Jensen, Lars J.; Pierleoni, Andrea

    2009-01-01

    A recent trend in computational methods for annotation of protein function is that many prediction tools are combined in complex workflows and pipelines to facilitate the analysis of feature combinations, for example, the entire repertoire of kinase-binding motifs in the human proteome....

  4. Annotation des structures discursives : l'expérience ANNODIS

    Directory of Open Access Journals (Sweden)

    Ho-Dac Lydia-Mai

    2014-07-01

    Full Text Available La ressource ANNODIS est un corpus diversifié de français écrit enrichi d'annotations concernant le niveau discursif. Son originalité réside dans sa mutualisation de deux approches complémentaires qui permettent, par leur oppositions et rapprochements, de poser un certain nombre de questions concernant l'annotation de structures discursives. cet article propose de revenir sur les enjeux principaux qui ont motivés les membres du projet ANNODIS : 1 stabiliser un certain nombre de définition linguistique de phénomènes discursives ciblées et 2 confronter aux données réelles une certaine modélisation de la construction de la cohérence discursive. Ce double objectif est révélateur des deux approches mises à l'épreuve dans l'expérience ANNODIS. Cet article revient sur les enjeux de cette ressource en terme à la fois de structures discursives et de campagne d'annotation. Un regard particulier sera porté sur la question du devenir des annotations, notamment dans un domaine encore peu stabilisé.

  5. Ethical Issues in Health Services: A Report and Annotated Bibliography.

    Science.gov (United States)

    Carmody, James

    This publication identifies, discusses, and lists areas for further research for five ethical issues related to health services: 1) the right to health care; 2) death and euthanasia; 3) human experimentation; 4) genetic engineering; and, 5) abortion. Following a discussion of each issue is a selected annotated bibliography covering the years 1967…

  6. The human factor in ecological research: an annotated bibliography.

    Science.gov (United States)

    Carol Eckhardt

    1998-01-01

    As a bibliography of annotated references addressing interdisciplinary environmental research, the collection reviews a broad spectrum of literature to illustrate the breadth of issues that bear on the role of humankind in environmental context. Categories of culture, environmental law, public policy, environmental valuation strategies, philosophy, interdisciplinary...

  7. Latin America: An Annotated List of Materials for Children.

    Science.gov (United States)

    United Nations Children's Fund, New York, NY. United States Committee.

    This annotated bibliography of materials on Latin America is intended for children to age 14. South and Central America, Mexico, and the French, English, and Spanish speaking areas of the Caribbean are covered. Listings are by country and include history books, geography books, fiction, nonfiction, poetry, and folklore books. Some works in Spanish…

  8. Biochemical Space: A Framework for Systemic Annotation of Biological Models

    Czech Academy of Sciences Publication Activity Database

    Klement, M.; Děd, T.; Šafránek, D.; Červený, Jan; Müller, Stefan; Steuer, Ralf

    2014-01-01

    Roč. 306, JUL (2014), s. 31-44 ISSN 1571-0661 R&D Projects: GA MŠk(CZ) EE2.3.20.0256 Institutional support: RVO:67179843 Keywords : biological models * model annotation * systems biology * cyanobacteria Subject RIV: EH - Ecology, Behaviour

  9. Asian American Literature, January 1992-June 1996: An Annotated Bibliography.

    Science.gov (United States)

    Lim, Shirley Geok-Lin; Williams, Angela Noelle

    1997-01-01

    Contains an annotated bibliography of Asian American literature (published between 1992 and 1996) focusing on prose narratives, including novels, short stories, and memoirs. Includes also some critical studies of Asian American literature and some drama and award-winning poetry collections. (TB)

  10. An annotated catalogue of the generic names of the Bromeliaceae

    NARCIS (Netherlands)

    Grant, J.R.; Zijlstra, G.

    1998-01-01

    An annotated catalogue of the known generic names of the Bromeliaceae is presented. It accounts for 187 names in six lists: I. Generic names (133), II. Invalid names (7), III. A synonymized checklist of the genera of the Bromeliaceae (56 accepted genera, and 77 synonyms), IV. Nothogenera (bigeneric

  11. Bibliographie Annotee de Linguistique Acadienne (Annotated Bibliography of Acadian Linguistics).

    Science.gov (United States)

    Gesner, Edward

    A bibliography of 430 books, journal articles, papers, and other references on Acadian French written in English or French is divided into two principal sections: an annotated bibliography of works focusing on the Acadian French dialect spoken in the Canadian Maritime Provinces, and an unannotated bibliography pertaining to Louisiana Acadian…

  12. Effects of Teaching Strategies in Annotated Bibliography Writing

    Science.gov (United States)

    Tan-de Ramos, Jennifer

    2015-01-01

    The study examines the effect of teaching strategies to improved writing of students in the tertiary level. Specifically, three teaching approaches--the use of modelling, grammar-based, and information element-focused--were tested on their effect on the writing of annotated bibliography in three research classes at a university in Manila.…

  13. Genome Annotation in a Community College Cell Biology Lab

    Science.gov (United States)

    Beagley, C. Timothy

    2013-01-01

    The Biology Department at Salt Lake Community College has used the IMG-ACT toolbox to introduce a genome mapping and annotation exercise into the laboratory portion of its Cell Biology course. This project provides students with an authentic inquiry-based learning experience while introducing them to computational biology and contemporary learning…

  14. Resources for Achieving Sex Equity: An Annotated Bibliography.

    Science.gov (United States)

    Miller, Susan W., Comp.

    This annotated bibliography provides a list of resources dealing with sex equity in vocational education. The bibliography first provides operational definitions of "sexism,""sex fair,""sex affirmative,""sex bias," and "affirmative action." It then lists resources under the following topics and/or bibliographic forms: (1) sex role definition, (2)…

  15. Annotated checklist of fungi in Cyprus Island. 1. Larger Basidiomycota

    Directory of Open Access Journals (Sweden)

    Miguel Torrejón

    2014-06-01

    Full Text Available An annotated checklist of wild fungi living in Cyprus Island has been compiled broughting together all the information collected from the different works dealing with fungi in this area throughout the three centuries of mycology in Cyprus. This part contains 363 taxa of macroscopic Basidiomycota.

  16. Annotated Football Bibliography. An Applied Project in Physical Education.

    Science.gov (United States)

    Clemence, William J., Jr.; Pitts, James Walter

    This annotated bibliography was compiled to assist physical education majors, especially those having a major interest in football and football coaching. The bibliography is limited to the areas of coaching techniques and philosophy, fundamentals, offense, defense, injuries, and conditioning at the high school and college level. These broader…

  17. An Annotated Bibliography of Isotonic Weight-Training Methods.

    Science.gov (United States)

    Wysong, John V.

    This literature study was conducted to compare and evaluate various types and techniques of weight lifting so that a weight lifting program could be selected or devised for a secondary school. Annotations of 32 research reports, journal articles, and monographs on isotonic strength training are presented. The literature in the first part of the…

  18. Pertinent Discussions Toward Modeling the Social Edition: Annotated Bibliographies

    NARCIS (Netherlands)

    Siemens, R.; Timney, M.; Leitch, C.; Koolen, C.; Garnett, A.

    2012-01-01

    The two annotated bibliographies present in this publication document and feature pertinent discussions toward the activity of modeling the social edition, first exploring reading devices, tools and social media issues and, second, social networking tools for professional readers in the Humanities.

  19. Communication in a Diverse Classroom: An Annotated Bibliographic Review

    Science.gov (United States)

    Brown, Rachelle

    2016-01-01

    Students have social and personal needs to fulfill and communicate these needs in different ways. This annotated bibliographic review examined communication studies to provide educators of diverse classrooms with ideas to build an environment that contributes to student well-being. Participants in the studies ranged in age, ability, and cultural…

  20. RADIOISOTOPE EXPERIMENTS IN HIGH SCHOOL BIOLOGY, AN ANNOTATED SELECTED BIBLIOGRAPHY.

    Science.gov (United States)

    HURLBURT, EVELYN M.

    SELECTED REFERENCES ON THE USE OF RADIOISOTOPES IN BIOLOGY ARE CONTAINED IN THIS ANNOTATED BIBLIOGRAPHY FOR SECONDARY SCHOOL STUDENTS. MATERIALS INCLUDED WERE PUBLISHED AFTER 1960 AND DEAL WITH THE PROPERTIES OF RADIATION, SIMPLE RADIATION DETECTION PROCEDURES, AND TECHNIQUES FOR USING RADIOISOTOPES EXPERIMENTALLY. THE REFERENCES ARE LISTED IN…

  1. Annotated bibliography of highly ionized atoms of importance to plasmas

    International Nuclear Information System (INIS)

    Schmieder, R.W.

    1975-04-01

    A bibliography is presented of the literature on highly ionized atoms which have relevance to plasmas. The bibliography is annotated with keywords, and indexed by subjects and authors. It should be of greatest use to researchers working on the problems of impurity cooling and diagnostics of CTR plasmas. (U.S.)

  2. House dust mites in Brazil - an annotated bibliography

    Directory of Open Access Journals (Sweden)

    Binotti Raquel S

    2001-01-01

    Full Text Available House dust mites have been reported to be the most important allergen in human dwellings. Several articles had already shown the presence of different mite species at homes in Brazil, being Pyroglyphidae, Glycyphagidae and Cheyletidae the most important families found. This paper is an annotated bibliography that will lead to a better knowledge of house dust mite fauna in Brazil.

  3. On temporality in discourse annotation : Theoretical and practical considerations

    NARCIS (Netherlands)

    Evers-Vermeul, J.; Hoek, J.; Scholman, M.C.J.

    2017-01-01

    Temporal information is one of the prominent features that determine the coherence in a discourse. That is why we need an adequate way to deal with this type of information during discourse annotation. In this paper, we will argue that temporal order is a relational rather than a segment-specific

  4. Onotology-Based Annotation and Ranking Service for Geoscience

    Science.gov (United States)

    Sainju, R.; Ramachandran, R.; Li, X.; McEniry, M.; Kulkarni, A.; Conover, H.

    2012-12-01

    There is a need to automatically annotate information using a either a control vocabulary or an ontology to make the information not only easily discoverable but also allow the information to be linked to other information based on these semantic annotations. We present an ontology annotation and a ranking service designed to address this need. The service can be configured to use an ontology describing a specific application domain. Given text inputs, this service generates annotations whenever the service finds terms that intersect both in the text and the ontology. The service is also capable of ranking the different inputs based on the "contextual" similarity to the information captured in the ontology. To rank a given input, the service uses a specialized algorithm which calculated both an ontological score based on precomputed weights of the intersecting term from the ontology and a statistical score using traditional term frequency- inverse document frequency (TF-IDF) approach. Both these scores are normalized and combined to generate the final ranking. An example application of this service to find relevant datasets for studying Hurricanes within NASA's data catalog. A hurricane ontology is used to index and rank all the data set descriptions from the metadata catalog and only the datasets that rank high are presented to the end users as contextually relevant for studying Hurricanes.

  5. Annotated Bibliography; Freedom of Information Center Reports and Summary Papers.

    Science.gov (United States)

    Freedom of Information Center, Columbia, MO.

    This bibliography lists and annotates almost 400 information reports, opinion papers, and summary papers dealing with freedom of information. Topics covered include the nature of press freedom and increased press efforts toward more open access to information; the press situation in many foreign countries, including France, Sweden, Communist…

  6. From protein interactions to functional annotation: graph alignment in Herpes

    Czech Academy of Sciences Publication Activity Database

    Kolář, Michal; Lassig, M.; Berg, J.

    2008-01-01

    Roč. 2, č. 90 (2008), e-e ISSN 1752-0509 Institutional research plan: CEZ:AV0Z50520514 Keywords : graph alignment * functional annotation * protein orthology Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 3.706, year: 2008

  7. Reliability and effectiveness of clickthrough data for automatic image annotation

    NARCIS (Netherlands)

    Tsikrika, T.; Diou, C.; De Vries, A.P.; Delopoulos, A.

    2010-01-01

    Automatic image annotation using supervised learning is performed by concept classifiers trained on labelled example images. This work proposes the use of clickthrough data collected from search logs as a source for the automatic generation of concept training data, thus avoiding the expensive

  8. Supervised learning of semantic classes for image annotation and retrieval.

    Science.gov (United States)

    Carneiro, Gustavo; Chan, Antoni B; Moreno, Pedro J; Vasconcelos, Nuno

    2007-03-01

    A probabilistic formulation for semantic image annotation and retrieval is proposed. Annotation and retrieval are posed as classification problems where each class is defined as the group of database images labeled with a common semantic label. It is shown that, by establishing this one-to-one correspondence between semantic labels and semantic classes, a minimum probability of error annotation and retrieval are feasible with algorithms that are 1) conceptually simple, 2) computationally efficient, and 3) do not require prior semantic segmentation of training images. In particular, images are represented as bags of localized feature vectors, a mixture density estimated for each image, and the mixtures associated with all images annotated with a common semantic label pooled into a density estimate for the corresponding semantic class. This pooling is justified by a multiple instance learning argument and performed efficiently with a hierarchical extension of expectation-maximization. The benefits of the supervised formulation over the more complex, and currently popular, joint modeling of semantic label and visual feature distributions are illustrated through theoretical arguments and extensive experiments. The supervised formulation is shown to achieve higher accuracy than various previously published methods at a fraction of their computational cost. Finally, the proposed method is shown to be fairly robust to parameter tuning.

  9. Reliability and effectiveness of clickthrough data for automatic image annotation

    NARCIS (Netherlands)

    T. Tsikrika (Theodora); C. Diou; A.P. de Vries (Arjen); A. Delopoulos

    2010-01-01

    htmlabstractAutomatic image annotation using supervised learning is performed by concept classifiers trained on labelled example images. This work proposes the use of clickthrough data collected from search logs as a source for the automatic generation of concept training data, thus avoiding the

  10. Energy: An Annotated Bibliography of Selected Energy Education Materials.

    Science.gov (United States)

    Massachusetts Audubon Society, Lincoln. Hatheway Environmental Education Inst.

    This is an annotated bibliography of selected energy education materials. These materials were selected according to the following criteria: (1) Usability in an instructional atmosphere; (2) Relevancy to issues on energy use in the environment; (3) Accuracy and current relevancy of energy facts and trends; (4) Attractiveness of format including…

  11. An annotated corpus for the analysis of VP ellipsis

    NARCIS (Netherlands)

    Bos, Johan; Spenader, J.

    2011-01-01

    Verb Phrase Ellipsis (VPE) has been studied in great depth in theoretical linguistics, but empirical studies of VPE are rare. We extend the few previous corpus studies with an annotated corpus of VPE in all 25 sections of the Wall Street Journal corpus (WSJ) distributed with the Penn Treebank. We

  12. Food, Nutrition and the Disabled: An Annotated Bibliography.

    Science.gov (United States)

    Furse, Alison, Comp.; Levine, Elyse, Comp.

    The annotated bibliography presents approximately 200 citations of printed and 25 citations of audiovisual materials on nutrition and its relation to disabilities. Citations are organized alphabetically by author within the following topics: general, child care, home management, nutrition, mealtime skills and behavior, aids and devices, and…

  13. Exploring Metacognitive Strategies and Hypermedia Annotations on Foreign Language Reading

    Science.gov (United States)

    Shang, Hui-Fang

    2017-01-01

    The effective use of reading strategies has been recognized as an important way to increase reading comprehension in hypermedia environments. The purpose of the study was to explore whether metacognitive strategy use and access to hypermedia annotations facilitated reading comprehension based on English as a foreign language students' proficiency…

  14. Optimizing high performance computing workflow for protein functional annotation.

    Science.gov (United States)

    Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene

    2014-09-10

    Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data.

  15. Annotation modeling with formal ontologies: Implications for informal ontologies

    Science.gov (United States)

    Lumb, L. I.; Freemantle, J. R.; Lederman, J. I.; Aldridge, K. D.

    2009-04-01

    Knowledge representation is increasingly recognized as an important component of any cyberinfrastructure (CI). In order to expediently address scientific needs, geoscientists continue to leverage the standards and implementations emerging from the World Wide Web Consortium's (W3C) Semantic Web effort. In an ongoing investigation, previous efforts have been aimed towards the development of a semantic framework for the Global Geodynamics Project (GGP). In contrast to other efforts, the approach taken has emphasized the development of informal ontologies, i.e., ontologies that are derived from the successive extraction of Resource Description Framework (RDF) representations from eXtensible Markup Language (XML), and then Web Ontology Language (OWL) from RDF. To better understand the challenges and opportunities for incorporating annotations into the emerging semantic framework, the present effort focuses on knowledge-representation modeling involving formal ontologies. Although OWL's internal mechanism for annotation is constrained to ensure computational completeness and decidability, externally originating annotations based on the XML Pointer Language (XPointer) can easily violate these constraints. Thus, the effort of modeling with formal ontologies allows for recommendations applicable to the case of incorporating annotations into informal ontologies.

  16. The Holocaust in Books and Films. A Selected, Annotated List.

    Science.gov (United States)

    Muffs, Judith Herschlag, Ed.

    This is an annotated list of over 400 resource books and films on Jewish history before the Holocaust, the history and development of Nazism, experiences during the Holocaust and its aftermath, and the phenomenon of prejudice and anti-Semitism. Designed primarily for teachers and librarians in secondary schools and to some extent in elementary…

  17. Wanda ML - a markup language for digital annotation

    NARCIS (Netherlands)

    Franke, K.Y.; Guyon, I.; Schomaker, L.R.B.; Vuurpijl, L.G.

    2004-01-01

    WANDAML is an XML-based markup language for the annotation and filter journaling of digital documents. It addresses in particular the needs of forensic handwriting data examination, by allowing experts to enter information about writer, material (pen, paper), script and content, and to record chains

  18. The WANDAML Markup Language for Digital Document Annotation

    NARCIS (Netherlands)

    Franke, K.; Guyon, I.; Schomaker, L.; Vuurpijl, L.

    2004-01-01

    WANDAML is an XML-based markup language for the annotation and filter journaling of digital documents. It addresses in particular the needs of forensic handwriting data examination, by allowing experts to enter information about writer, material (pen, paper), script and content, and to record chains

  19. Annotated Bibliography on Return Migration to Puerto Rico.

    Science.gov (United States)

    Carrasquillo, Angela; Carrasquillo, Ceferino

    This paper is an annotated bibliography on return migration from the mainland United States to Puerto Rico. An introduction defines the term "return migration" in the specific context of the Puerto Rican community. The introduction is followed by the bibliography, which lists and summarizes research studies and works dealing with…

  20. Annotation: Neurofeedback--Train Your Brain to Train Behaviour

    Science.gov (United States)

    Heinrich, Hartmut; Gevensleben, Holger; Strehl, Ute

    2007-01-01

    Background: Neurofeedback (NF) is a form of behavioural training aimed at developing skills for self-regulation of brain activity. Within the past decade, several NF studies have been published that tend to overcome the methodological shortcomings of earlier studies. This annotation describes the methodical basis of NF and reviews the evidence…

  1. An Oral History Annotation Tool for INTER-VIEWs

    NARCIS (Netherlands)

    Heuvel, H. van den; Sanders, E.P.; Rutten, R.; Scagliola, S.; Witkamp, P.

    2012-01-01

    We present a web-based tool for retrieving and annotating audio fragments of e.g. interviews. Our collection contains 250 interviews with veterans of Dutch conflicts and military missions. The audio files of the interviews were disclosed using ASR technology focussed at keyword retrieval. Resulting

  2. An Annotated Dataset of 14 Cardiac MR Images

    DEFF Research Database (Denmark)

    Stegmann, Mikkel Bille

    2002-01-01

    This note describes a dataset consisting of 14 annotated cardiac MR images. Points of correspondence are placed on each image at the left ventricle (LV). As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....

  3. Computational analyses and annotations of the Arabidopsis peroxidasegene family

    DEFF Research Database (Denmark)

    Østergaard, Lars; Pedersen, Anders Gorm; Jespersen, Hans M.

    1998-01-01

    lack of peroxidase substrate specificity. Computational analysis was performed on 30 near full-length Arabidopsis peroxidase cDNAs for annotation of start codons and signal peptide cleavage sites. A compositional analysis revealed that 23 of the 30 peroxidase cDNAs have 5' untranslated regions...

  4. Gabriele D'Annunzio, Pleasure, translated and annotated by Lara ...

    African Journals Online (AJOL)

    User

    Gabriele D'Annunzio, Pleasure, translated and annotated by. Lara Gochin Raffaelli, introduction by Alexander Stille,. New York, Penguin, 2013, pp. 355. Undertaking the translation of a literary work by any renowned author is always a daunting task. Thus the challenge presented by the intricate prose of the sophisticated ...

  5. Protein annotation in the era of personal genomics

    DEFF Research Database (Denmark)

    Holberg Blicher, Thomas; Gupta, Ramneek; Wesolowska, Agata

    2010-01-01

    Protein annotation provides a condensed and systematic view on the function of individual proteins. It has traditionally dealt with sorting proteins into functional categories, which for example has proven to be successful for the comparison of different species. However, if we are to understand...

  6. prokaryote genome annotation with GeneScan and GLIMMER

    Indian Academy of Sciences (India)

    Unknown

    fications made hitherto might require re-evaluation. All these cases are discussed in detail. [Aggarwal G and Ramaswamy R 2002 Ab initio gene identification: prokaryote genome annotation with GeneScan and GLIMMER;. J. Biosci. (Suppl. 1) 27 7–14]. 1. Introduction. The increased effort in genome sequencing has led to a.

  7. Feeling Expression Using Avatars and Its Consistency for Subjective Annotation

    Science.gov (United States)

    Ito, Fuyuko; Sasaki, Yasunari; Hiroyasu, Tomoyuki; Miki, Mitsunori

    Consumer Generated Media(CGM) is growing rapidly and the amount of content is increasing. However, it is often difficult for users to extract important contents and the existence of contents recording their experiences can easily be forgotten. As there are no methods or systems to indicate the subjective value of the contents or ways to reuse them, subjective annotation appending subjectivity, such as feelings and intentions, to contents is needed. Representation of subjectivity depends on not only verbal expression, but also nonverbal expression. Linguistically expressed annotation, typified by collaborative tagging in social bookmarking systems, has come into widespread use, but there is no system of nonverbally expressed annotation on the web. We propose the utilization of controllable avatars as a means of nonverbal expression of subjectivity, and confirmed the consistency of feelings elicited by avatars over time for an individual and in a group. In addition, we compared the expressiveness and ease of subjective annotation between collaborative tagging and controllable avatars. The result indicates that the feelings evoked by avatars are consistent in both cases, and using controllable avatars is easier than collaborative tagging for representing feelings elicited by contents that do not express meaning, such as photos.

  8. Semantic annotation of morphological descriptions: an overall strategy

    Directory of Open Access Journals (Sweden)

    Cui Hong

    2010-05-01

    Full Text Available Abstract Background Large volumes of morphological descriptions of whole organisms have been created as print or electronic text in a human-readable format. Converting the descriptions into computer- readable formats gives a new life to the valuable knowledge on biodiversity. Research in this area started 20 years ago, yet not sufficient progress has been made to produce an automated system that requires only minimal human intervention but works on descriptions of various plant and animal groups. This paper attempts to examine the hindering factors by identifying the mismatches between existing research and the characteristics of morphological descriptions. Results This paper reviews the techniques that have been used for automated annotation, reports exploratory results on characteristics of morphological descriptions as a genre, and identifies challenges facing automated annotation systems. Based on these criteria, the paper proposes an overall strategy for converting descriptions of various taxon groups with the least human effort. Conclusions A combined unsupervised and supervised machine learning strategy is needed to construct domain ontologies and lexicons and to ultimately achieve automated semantic annotation of morphological descriptions. Further, we suggest that each effort in creating a new description or annotating an individual description collection should be shared and contribute to the "biodiversity information commons" for the Semantic Web. This cannot be done without a sound strategy and a close partnership between and among information scientists and biologists.

  9. An Annotated Bibliography of the Gestalt Methods, Techniques, and Therapy

    Science.gov (United States)

    Prewitt-Diaz, Joseph O.

    The purpose of this annotated bibliography is to provide the reader with a guide to relevant research in the area of Gestalt therapy, techniques, and methods. The majority of the references are journal articles written within the last 5 years or documents easily obtained through interlibrary loans from local libraries. These references were…

  10. Identification and annotation of promoter regions in microbial ...

    Indian Academy of Sciences (India)

    PRAKASH KUMAR

    2007-06-15

    Jun 15, 2007 ... [Rangannan V and Bansal M 2007 Identification and annotation of promoter regions in microbial genome sequences on the basis of DNA stability;. J. Biosci. ... (Version 9.1, updated on 12th May, 2005) (Keseler et al. 2005). ... The stability of a double stranded DNA molecule can be expressed in terms of the ...

  11. Coleoptera of the Idaho National Engineering Laboratory: an annotated checklist

    Energy Technology Data Exchange (ETDEWEB)

    Stafford, M.P.; Barr, W.F.; Johnson, J.B.

    1986-04-30

    An insect survey was conducted on the Idaho National Engineering Laboratory during the summers of 1981-1983. This site is on the Snake River Plains in southeastern Idaho. Presented here is an annotated checklist of the Coleoptera collected. Successful collecting methods, dates of adult occurrence, and relative abundance are given for each species. Relevant biological information is also presented for some species.

  12. Annotating RNA motifs in sequences and alignments.

    Science.gov (United States)

    Gardner, Paul P; Eldai, Hisham

    2015-01-01

    RNA performs a diverse array of important functions across all cellular life. These functions include important roles in translation, building translational machinery and maturing messenger RNA. More recent discoveries include the miRNAs and bacterial sRNAs that regulate gene expression, the thermosensors, riboswitches and other cis-regulatory elements that help prokaryotes sense their environment and eukaryotic piRNAs that suppress transposition. However, there can be a long period between the initial discovery of a RNA and determining its function. We present a bioinformatic approach to characterize RNA motifs, which are critical components of many RNA structure-function relationships. These motifs can, in some instances, provide researchers with functional hypotheses for uncharacterized RNAs. Moreover, we introduce a new profile-based database of RNA motifs--RMfam--and illustrate some applications for investigating the evolution and functional characterization of RNA. All the data and scripts associated with this work are available from: https://github.com/ppgardne/RMfam. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. Supplementary Material for: BEACON: automated tool for Bacterial GEnome Annotation ComparisON

    KAUST Repository

    Kalkatawi, Manal M.

    2015-01-01

    Abstract Background Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). Results The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACONâ s utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27 %, while the number of genes without any function assignment is reduced. Conclusions We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/ .

  14. Bovine Genome Database: supporting community annotation and analysis of the Bos taurus genome

    Directory of Open Access Journals (Sweden)

    Childs Kevin L

    2010-11-01

    Full Text Available Abstract Background A goal of the Bovine Genome Database (BGD; http://BovineGenome.org has been to support the Bovine Genome Sequencing and Analysis Consortium (BGSAC in the annotation and analysis of the bovine genome. We were faced with several challenges, including the need to maintain consistent quality despite diversity in annotation expertise in the research community, the need to maintain consistent data formats, and the need to minimize the potential duplication of annotation effort. With new sequencing technologies allowing many more eukaryotic genomes to be sequenced, the demand for collaborative annotation is likely to increase. Here we present our approach, challenges and solutions facilitating a large distributed annotation project. Results and Discussion BGD has provided annotation tools that supported 147 members of the BGSAC in contributing 3,871 gene models over a fifteen-week period, and these annotations have been integrated into the bovine Official Gene Set. Our approach has been to provide an annotation system, which includes a BLAST site, multiple genome browsers, an annotation portal, and the Apollo Annotation Editor configured to connect directly to our Chado database. In addition to implementing and integrating components of the annotation system, we have performed computational analyses to create gene evidence tracks and a consensus gene set, which can be viewed on individual gene pages at BGD. Conclusions We have provided annotation tools that alleviate challenges associated with distributed annotation. Our system provides a consistent set of data to all annotators and eliminates the need for annotators to format data. Involving the bovine research community in genome annotation has allowed us to leverage expertise in various areas of bovine biology to provide biological insight into the genome sequence.

  15. NegGOA: negative GO annotations selection using ontology structure.

    Science.gov (United States)

    Fu, Guangyuan; Wang, Jun; Yang, Bo; Yu, Guoxian

    2016-10-01

    Predicting the biological functions of proteins is one of the key challenges in the post-genomic era. Computational models have demonstrated the utility of applying machine learning methods to predict protein function. Most prediction methods explicitly require a set of negative examples-proteins that are known not carrying out a particular function. However, Gene Ontology (GO) almost always only provides the knowledge that proteins carry out a particular function, and functional annotations of proteins are incomplete. GO structurally organizes more than tens of thousands GO terms and a protein is annotated with several (or dozens) of these terms. For these reasons, the negative examples of a protein can greatly help distinguishing true positive examples of the protein from such a large candidate GO space. In this paper, we present a novel approach (called NegGOA) to select negative examples. Specifically, NegGOA takes advantage of the ontology structure, available annotations and potentiality of additional annotations of a protein to choose negative examples of the protein. We compare NegGOA with other negative examples selection algorithms and find that NegGOA produces much fewer false negatives than them. We incorporate the selected negative examples into an efficient function prediction model to predict the functions of proteins in Yeast, Human, Mouse and Fly. NegGOA also demonstrates improved accuracy than these comparing algorithms across various evaluation metrics. In addition, NegGOA is less suffered from incomplete annotations of proteins than these comparing methods. The Matlab and R codes are available at https://sites.google.com/site/guoxian85/neggoa gxyu@swu.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  16. Missing genes in the annotation of prokaryotic genomes

    Directory of Open Access Journals (Sweden)

    Feng Wu-chun

    2010-03-01

    Full Text Available Abstract Background Protein-coding gene detection in prokaryotic genomes is considered a much simpler problem than in intron-containing eukaryotic genomes. However there have been reports that prokaryotic gene finder programs have problems with small genes (either over-predicting or under-predicting. Therefore the question arises as to whether current genome annotations have systematically missing, small genes. Results We have developed a high-performance computing methodology to investigate this problem. In this methodology we compare all ORFs larger than or equal to 33 aa from all fully-sequenced prokaryotic replicons. Based on that comparison, and using conservative criteria requiring a minimum taxonomic diversity between conserved ORFs in different genomes, we have discovered 1,153 candidate genes that are missing from current genome annotations. These missing genes are similar only to each other and do not have any strong similarity to gene sequences in public databases, with the implication that these ORFs belong to missing gene families. We also uncovered 38,895 intergenic ORFs, readily identified as putative genes by similarity to currently annotated genes (we call these absent annotations. The vast majority of the missing genes found are small (less than 100 aa. A comparison of select examples with GeneMark, EasyGene and Glimmer predictions yields evidence that some of these genes are escaping detection by these programs. Conclusions Prokaryotic gene finders and prokaryotic genome annotations require improvement for accurate prediction of small genes. The number of missing gene families found is likely a lower bound on the actual number, due to the conservative criteria used to determine whether an ORF corresponds to a real gene.

  17. OMIGA: Optimized Maker-Based Insect Genome Annotation.

    Science.gov (United States)

    Liu, Jinding; Xiao, Huamei; Huang, Shuiqing; Li, Fei

    2014-08-01

    Insects are one of the largest classes of animals on Earth and constitute more than half of all living species. The i5k initiative has begun sequencing of more than 5,000 insect genomes, which should greatly help in exploring insect resource and pest control. Insect genome annotation remains challenging because many insects have high levels of heterozygosity. To improve the quality of insect genome annotation, we developed a pipeline, named Optimized Maker-Based Insect Genome Annotation (OMIGA), to predict protein-coding genes from insect genomes. We first mapped RNA-Seq reads to genomic scaffolds to determine transcribed regions using Bowtie, and the putative transcripts were assembled using Cufflink. We then selected highly reliable transcripts with intact coding sequences to train de novo gene prediction software, including Augustus. The re-trained software was used to predict genes from insect genomes. Exonerate was used to refine gene structure and to determine near exact exon/intron boundary in the genome. Finally, we used the software Maker to integrate data from RNA-Seq, de novo gene prediction, and protein alignment to produce an official gene set. The OMIGA pipeline was used to annotate the draft genome of an important insect pest, Chilo suppressalis, yielding 12,548 genes. Different strategies were compared, which demonstrated that OMIGA had the best performance. In summary, we present a comprehensive pipeline for identifying genes in insect genomes that can be widely used to improve the annotation quality in insects. OMIGA is provided at http://ento.njau.edu.cn/omiga.html .

  18. Digital Ink: In-Class Annotation of PowerPoint Lectures

    Science.gov (United States)

    Johnson, Anne E.

    2008-01-01

    Digital ink is a tool that, in conjunction with Microsoft PowerPoint software, allows real-time freehand annotation of presentations. Annotation of slides during class encourages student engagement with the material and problems under discussion. Digital ink annotation is a technique suitable for teaching across many disciplines, but is especially…

  19. Integrating Annotations into a Dual-Slide PowerPoint Presentation for Classroom Learning

    Science.gov (United States)

    Lai, Yen-Shou; Tsai, Hung-Hsu; Yu, Pao-Ta

    2011-01-01

    This study introduces a learning environment integrating annotations with a dual-slide PowerPoint presentation for classroom learning. Annotation means a kind of additional information to emphasize the explanations for the learning objects. The use of annotations is to support the cognitive process for PowerPoint presentation in a classroom. The…

  20. Music journals in South Africa 1854-2010: an annotated bibliography

    African Journals Online (AJOL)

    Music journals in South Africa 1854-2010: an annotated bibliography. ... The article focuses on presenting an annotated bibliography of music journalism in South Africa from as early as 1854 until 2010. Most of ... Key words: annotated bibliography, electronic journals, music journals, periodicals, South African music history ...

  1. Effects of Reviewing Annotations and Homework Solutions on Math Learning Achievement

    Science.gov (United States)

    Hwang, Wu-Yuin; Chen, Nian-Shing; Shadiev, Rustam; Li, Jin-Sing

    2011-01-01

    Previous studies have demonstrated that making annotations can be a meaningful and useful learning method that promote metacognition and enhance learning achievement. A web-based annotation system, Virtual Pen (VPEN), which provides for the creation and review of annotations and homework solutions, has been developed to foster learning process…

  2. Effects of Annotations and Homework on Learning Achievement: An Empirical Study of Scratch Programming Pedagogy

    Science.gov (United States)

    Su, Addison Y. S.; Huang, Chester S. J.; Yang, Stephen J. H.; Ding, T. J.; Hsieh, Y. Z.

    2015-01-01

    In Taiwan elementary schools, Scratch programming has been taught for more than four years. Previous studies have shown that personal annotations is a useful learning method that improve learning performance. An annotation-based Scratch programming (ASP) system provides for the creation, share, and review of annotations and homework solutions in…

  3. Rapid Identification of Sequences for Orphan Enzymes to Power Accurate Protein Annotation

    Science.gov (United States)

    Ojha, Sunil; Watson, Douglas S.; Bomar, Martha G.; Galande, Amit K.; Shearer, Alexander G.

    2013-01-01

    The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the “back catalog” of enzymology – “orphan enzymes,” those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme “back catalog” is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology’s “back catalog” another powerful tool to drive accurate genome annotation. PMID:24386392

  4. m6ASNP: a tool for annotating genetic variants by m6A function.

    Science.gov (United States)

    Jiang, Shuai; Xie, Yubin; He, Zhihao; Zhang, Ya; Zhao, Yuli; Chen, Li; Zheng, Yueyuan; Miao, Yanyan; Zuo, Zhixiang; Ren, Jian

    2018-04-02

    Large-scale genome sequencing projects have identified many genetic variants for diverse diseases. A major goal of these projects is to characterize these genetic variants to provide insight into their function and roles in diseases. N6-methyladenosine (m6A) is one of the most abundant RNA modifications in eukaryotes. Recent studies have revealed that aberrant m6A modifications are involved in many diseases. In this study, we present a user-friendly web server called "m6ASNP" that is dedicated to the identification of genetic variants targeting m6A modification sites. A random forest model was implemented in m6ASNP to predict whether the methylation status of a m6A site is altered by the variants surrounding the site. In m6ASNP, genetic variants in a standard VCF format are accepted as the input data, and the output includes an interactive table containing the genetic variants annotated by m6A function. In addition, statistical diagrams and a genome browser are provided to visualize the characteristics and annotate the genetic variants. We believe that m6ASNP is a highly convenient tool that can be used to boost further functional studies investigating genetic variants. The web server "m6ASNP" is implemented in JAVA and PHP and is freely available at http://m6asnp.renlab.org.

  5. Essential Annotation Schema for Ecology (EASE)-A framework supporting the efficient data annotation and faceted navigation in ecology.

    Science.gov (United States)

    Pfaff, Claas-Thido; Eichenberg, David; Liebergesell, Mario; König-Ries, Birgitta; Wirth, Christian

    2017-01-01

    Ecology has become a data intensive science over the last decades which often relies on the reuse of data in cross-experimental analyses. However, finding data which qualifies for the reuse in a specific context can be challenging. It requires good quality metadata and annotations as well as efficient search strategies. To date, full text search (often on the metadata only) is the most widely used search strategy although it is known to be inaccurate. Faceted navigation is providing a filter mechanism which is based on fine granular metadata, categorizing search objects along numeric and categorical parameters relevant for their discovery. Selecting from these parameters during a full text search creates a system of filters which allows to refine and improve the results towards more relevance. We developed a framework for the efficient annotation and faceted navigation in ecology. It consists of an XML schema for storing the annotation of search objects and is accompanied by a vocabulary focused on ecology to support the annotation process. The framework consolidates ideas which originate from widely accepted metadata standards, textbooks, scientific literature, and vocabularies as well as from expert knowledge contributed by researchers from ecology and adjacent disciplines.

  6. Essential Annotation Schema for Ecology (EASE)—A framework supporting the efficient data annotation and faceted navigation in ecology

    Science.gov (United States)

    Eichenberg, David; Liebergesell, Mario; König-Ries, Birgitta; Wirth, Christian

    2017-01-01

    Ecology has become a data intensive science over the last decades which often relies on the reuse of data in cross-experimental analyses. However, finding data which qualifies for the reuse in a specific context can be challenging. It requires good quality metadata and annotations as well as efficient search strategies. To date, full text search (often on the metadata only) is the most widely used search strategy although it is known to be inaccurate. Faceted navigation is providing a filter mechanism which is based on fine granular metadata, categorizing search objects along numeric and categorical parameters relevant for their discovery. Selecting from these parameters during a full text search creates a system of filters which allows to refine and improve the results towards more relevance. We developed a framework for the efficient annotation and faceted navigation in ecology. It consists of an XML schema for storing the annotation of search objects and is accompanied by a vocabulary focused on ecology to support the annotation process. The framework consolidates ideas which originate from widely accepted metadata standards, textbooks, scientific literature, and vocabularies as well as from expert knowledge contributed by researchers from ecology and adjacent disciplines. PMID:29023519

  7. Essential Annotation Schema for Ecology (EASE-A framework supporting the efficient data annotation and faceted navigation in ecology.

    Directory of Open Access Journals (Sweden)

    Claas-Thido Pfaff

    Full Text Available Ecology has become a data intensive science over the last decades which often relies on the reuse of data in cross-experimental analyses. However, finding data which qualifies for the reuse in a specific context can be challenging. It requires good quality metadata and annotations as well as efficient search strategies. To date, full text search (often on the metadata only is the most widely used search strategy although it is known to be inaccurate. Faceted navigation is providing a filter mechanism which is based on fine granular metadata, categorizing search objects along numeric and categorical parameters relevant for their discovery. Selecting from these parameters during a full text search creates a system of filters which allows to refine and improve the results towards more relevance. We developed a framework for the efficient annotation and faceted navigation in ecology. It consists of an XML schema for storing the annotation of search objects and is accompanied by a vocabulary focused on ecology to support the annotation process. The framework consolidates ideas which originate from widely accepted metadata standards, textbooks, scientific literature, and vocabularies as well as from expert knowledge contributed by researchers from ecology and adjacent disciplines.

  8. How to annotate morphologically rich learner language. Principles, problems and solutions

    Directory of Open Access Journals (Sweden)

    Sisko Brunni

    2015-05-01

    Full Text Available This article illustrates the grammatical and error annotations of a morphologically rich learner language with the help of the International Corpus of Learner Finnish (ICLFI. It especially focuses on problems and solutions in morphological and error annotation, both of which are challenging due to the rich morphological structure of the target language. The article also introduces existing Finno-Ugric learner data and their annotation schemes, and compares those with the ones used in ICLFI annotations. Learner data variables, taxonomy, and principles in grammatical and error annotation are also discussed with the help of the ICLFI in the present article.

  9. Annotated bibliography about Leonese wrestling (1977-2015

    Directory of Open Access Journals (Sweden)

    José Antonio Robles-Tascón

    2017-11-01

    Full Text Available The purpose of this study was to create an annotated bibliography about Leonese wrestling. The first author's library was the starting point and then the catalogs of the National Library of Spain, Public Libraries of Spain, Spanish University Libraries Network, as well as the Spanish ISBN Agency Database of books published in Spain were consulted by using the keywords “lucha leonesa” and “aluche”. The annotated bibliography comprises a total of 19 monographs, published between 1977 and 2015. As a whole, they show the eminently local dimension of this traditional sport, the support it has received from several public and private institutions, as well as its double dimension as a sport and as a tradition solidly rooted in Leonese culture.

  10. Semantic Annotation of Unstructured Documents Using Concepts Similarity

    Directory of Open Access Journals (Sweden)

    Fernando Pech

    2017-01-01

    Full Text Available There is a large amount of information in the form of unstructured documents which pose challenges in the information storage, search, and retrieval. This situation has given rise to several information search approaches. Some proposals take into account the contextual meaning of the terms specified in the query. Semantic annotation technique can help to retrieve and extract information in unstructured documents. We propose a semantic annotation strategy for unstructured documents as part of a semantic search engine. In this proposal, ontologies are used to determine the context of the entities specified in the query. Our strategy for extracting the context is focused on concepts similarity. Each relevant term of the document is associated with an instance in the ontology. The similarity between each of the explicit relationships is measured through the combination of two types of associations: the association between each pair of concepts and the calculation of the weight of the relationships.

  11. CARMA: Software for Continuous Affect Rating and Media Annotation

    Directory of Open Access Journals (Sweden)

    Jeffrey M. Girard

    2014-07-01

    Full Text Available CARMA is a media annotation program that collects continuous ratings while displaying audio and video files. It is designed to be highly user-friendly and easily customizable. Based on Gottman and Levenson’s affect rating dial, CARMA enables researchers and study participants to provide moment-by-moment ratings of multimedia files using a computer mouse or keyboard. The rating scale can be configured on a number of parameters including the labels for its upper and lower bounds, its numerical range, and its visual representation. Annotations can be displayed alongside the multimedia file and saved for easy import into statistical analysis software. CARMA provides a tool for researchers in affective computing, human-computer interaction, and the social sciences who need to capture the unfolding of subjective experience and observable behavior over time.

  12. Annotation of glycoproteins in the SWISS-PROT database.

    Science.gov (United States)

    Jung, E; Veuthey, A L; Gasteiger, E; Bairoch, A

    2001-02-01

    SWISS-PROT is a protein sequence database, which aims to be nonredundant, fully annotated and highly cross-referenced. Most eukaryotic gene products undergo co- and/or post-translational modifications, and these need to be included in the database in order to describe the mature protein. SWISS-PROT includes information on many types of different protein modifications. As glycosylation is the most common type of post-translational protein modification, we are currently placing an emphasis on annotation of protein glycosylation in SWISS-PROT. Information on the position of the sugar within the polypeptide chain, the reducing terminal linkage as well as additional information on biological function of the sugar is included in the database. In this paper we describe how we account for the different types of protein glycosylation, namely N-linked glycosylation, O-linked glycosylation, proteoglycans, C-linked glycosylation and the attachment of glycosyl-phosphatidylinosital anchors to proteins.

  13. Image annotation by deep neural networks with attention shaping

    Science.gov (United States)

    Zheng, Kexin; Lv, Shaohe; Ma, Fang; Chen, Fei; Jin, Chi; Dou, Yong

    2017-07-01

    Image annotation is a task of assigning semantic labels to an image. Recently, deep neural networks with visual attention have been utilized successfully in many computer vision tasks. In this paper, we show that conventional attention mechanism is easily misled by the salient class, i.e., the attended region always contains part of the image area describing the content of salient class at different attention iterations. To this end, we propose a novel attention shaping mechanism, which aims to maximize the non-overlapping area between consecutive attention processes by taking into account the history of previous attention vectors. Several weighting polices are studied to utilize the history information in different manners. In two benchmark datasets, i.e., PASCAL VOC2012 and MIRFlickr-25k, the average precision is improved by up to 10% in comparison with the state-of-the-art annotation methods.

  14. An Assessment of Reliability of Dialogue Annotation Instructions

    Science.gov (United States)

    1977-01-01

    elsewhere (Mann, 1975). A brief description of each of the annotation categories is provided below to chi. acierize the need for the reliability...Jaulin, Cnlcul et h’ormalisation dans les Sciences de l.’llomme, Centre National de la Recherche Scientifique, Paris, 1968. Ncwoll, A. and Simon, H. A...Urbana, IL 61801 0r. John R. flnderson Dcpt. ol Psychology Ya le Um verr. i ty Neu Haven, CT 06520 Rrmid Forces Slalf College Norfolk, VR 23511

  15. REPARATION : ribosome profiling assisted (re-)annotation of bacterial genomes

    OpenAIRE

    Ndah, Elvis; Jonckheere, Veronique; Giess, Adam; Valen, Eivind; Menschaert, Gerben; Van Damme, Petra

    2017-01-01

    Prokaryotic genome annotation is highly dependent on automated methods, as manual curation cannot keep up with the exponential growth of sequenced genomes. Current automated methods depend heavily on sequence composition and often underestimate the complexity of the proteome. We developed RibosomeE Profiling Assisted (re-)AnnotaTION (REPARATION), a de novo machine learning algorithm that takes advantage of experimental protein synthesis evidence from ribosome profiling (Ribo-seq) to delineate...

  16. REPARATION: ribosome profiling assisted (re-)annotation of bacterial genomes

    OpenAIRE

    Ndah, Elvis; Jonckheere, Veronique; Giess, Adam; Valen, Eivind; Menschaert, Gerben; Van Damme, Petra

    2017-01-01

    Abstract Prokaryotic genome annotation is highly dependent on automated methods, as manual curation cannot keep up with the exponential growth of sequenced genomes. Current automated methods depend heavily on sequence composition and often underestimate the complexity of the proteome. We developed RibosomeE Profiling Assisted (re-)AnnotaTION (REPARATION), a de novo machine learning algorithm that takes advantage of experimental protein synthesis evidence from ribosome profiling (Ribo-seq) to ...

  17. Grass buffers for playas in agricultural landscapes: An annotated bibliography

    Science.gov (United States)

    Melcher, Cynthia P.; Skagen, Susan K.

    2005-01-01

    This bibliography and associated literature synthesis (Melcher and Skagen, 2005) was developed for the Playa Lakes Joint Venture (PLJV). The PLJV sought compilation and annotation of the literature on grass buffers for protecting playas from runoff containing sediments, nutrients, pesticides, and other contaminants. In addition, PLJV sought information regarding the extent to which buffers may attenuate the precipitation runoff needed to fill playas, and avian use of buffers. We emphasize grass buffers, but we also provide information on other buffer types.

  18. An Annotated Checklist of the Mammals of Kuwait

    Directory of Open Access Journals (Sweden)

    Peter J. Cowan

    2013-12-01

    Full Text Available An annotated checklist of the mammals of Kuwait is presented, based on the literature, personal communications, a Kuwait website and a blog and the author’s observations. Twenty five species occur, a further four are uncommon or rare visitors, six used to occur whilst another two are of doubtful provenance. This list should assist those planning desert rehabilitation, animal reintroduction and protected area projects in Kuwait.

  19. Small molecule annotation for the Protein Data Bank.

    Science.gov (United States)

    Sen, Sanchayita; Young, Jasmine; Berrisford, John M; Chen, Minyu; Conroy, Matthew J; Dutta, Shuchismita; Di Costanzo, Luigi; Gao, Guanghua; Ghosh, Sutapa; Hudson, Brian P; Igarashi, Reiko; Kengaku, Yumiko; Liang, Yuhe; Peisach, Ezra; Persikova, Irina; Mukhopadhyay, Abhik; Narayanan, Buvaneswari Coimbatore; Sahni, Gaurav; Sato, Junko; Sekharan, Monica; Shao, Chenghua; Tan, Lihua; Zhuravleva, Marina A

    2014-01-01

    The Protein Data Bank (PDB) is the single global repository for three-dimensional structures of biological macromolecules and their complexes, and its more than 100,000 structures contain more than 20,000 distinct ligands or small molecules bound to proteins and nucleic acids. Information about these small molecules and their interactions with proteins and nucleic acids is crucial for our understanding of biochemical processes and vital for structure-based drug design. Small molecules present in a deposited structure may be attached to a polymer or may occur as a separate, non-covalently linked ligand. During curation of a newly deposited structure by wwPDB annotation staff, each molecule is cross-referenced to the PDB Chemical Component Dictionary (CCD). If the molecule is new to the PDB, a dictionary description is created for it. The information about all small molecule components found in the PDB is distributed via the ftp archive as an external reference file. Small molecule annotation in the PDB also includes information about ligand-binding sites and about covalent and other linkages between ligands and macromolecules. During the remediation of the peptide-like antibiotics and inhibitors present in the PDB archive in 2011, it became clear that additional annotation was required for consistent representation of these molecules, which are quite often composed of several sequential subcomponents including modified amino acids and other chemical groups. The connectivity information of the modified amino acids is necessary for correct representation of these biologically interesting molecules. The combined information is made available via a new resource called the Biologically Interesting molecules Reference Dictionary, which is complementary to the CCD and is now routinely used for annotation of peptide-like antibiotics and inhibitors. © The Author(s) 2014. Published by Oxford University Press.

  20. Annotation of the Protein Coding Regions of the Equine Genome.

    Directory of Open Access Journals (Sweden)

    Matthew S Hestand

    Full Text Available Current gene annotation of the horse genome is largely derived from in silico predictions and cross-species alignments. Only a small number of genes are annotated based on equine EST and mRNA sequences. To expand the number of equine genes annotated from equine experimental evidence, we sequenced mRNA from a pool of forty-three different tissues. From these, we derived the structures of 68,594 transcripts. In addition, we identified 301,829 positions with SNPs or small indels within these transcripts relative to EquCab2. Interestingly, 780 variants extend the open reading frame of the transcript and appear to be small errors in the equine reference genome, since they are also identified as homozygous variants by genomic DNA resequencing of the reference horse. Taken together, we provide a resource of equine mRNA structures and protein coding variants that will enhance equine and cross-species transcriptional and genomic comparisons.