WorldWideScience

Sample records for annotation est-ssr characterization

  1. De novo assembly of transcriptome sequencing in Caragana korshinskii Kom. and characterization of EST-SSR markers.

    Directory of Open Access Journals (Sweden)

    Yan Long

    Full Text Available Caragana korshinskii Kom. is widely distributed in various habitats, including gravel desert, clay desert, fixed and semi-fixed sand, and saline land in the Asian and African deserts. To date, no previous genomic information or EST-SSR marker has been reported in Caragana Fabr. genus. In this study, more than two billion bases of high-quality sequence of C. korshinskii were generated by using illumina sequencing technology and demonstrated the de novo assembly and annotation of genes without prior genome information. These reads were assembled into 86,265 unigenes (mean length = 709 bp. The similarity search indicated that 33,955 and 21,978 unigenes showed significant similarities to known proteins from NCBI non-redundant and Swissprot protein databases, respectively. Among these annotated unigenes, 26,232 a unigenes were separately assigned to Gene Ontology (GO database. When 22,756 unigenes searched against the Kyoto Encyclopedia of Genes and Genomes Pathway (KEGG database, 5,598 unigenes were assigned to 5 main categories including 32 KEGG pathways. Among the main KEGG categories, metabolism was the biggest category (2,862, 43.7%, suggesting the active metabolic processes in the desert tree. In addition, a total of 19,150 EST-SSRs were identified from 15,484 unigenes, and the characterizations of EST-SSRs were further compared with other four species in Fabraceae. 126 potential marker sites were randomly selected to validate the assembly quality and develop EST-SSR markers. Among the 9 germplasms in Caranaga Fabr. genus, PCR success rate were 93.7% and the phylogenic tree was constructed based on the genotypic data. This research generated a substantial fraction of transcriptome sequences, which were very useful resources for gene annotation and discovery, molecular markers development, genome assembly and annotation. The EST-SSR markers identified and developed in this study will facilitate marker-assisted selection breeding.

  2. Development and Characterization of EST-SSR Markers in Ostryopsis (Betulaceae

    Directory of Open Access Journals (Sweden)

    Bing-Bing Liu

    2014-02-01

    Full Text Available Premise of the study: A set of expressed sequence tag (EST microsatellite markers were developed and characterized using next-generation sequencing technology for the Chinese genus Ostryopsis (Betulaceae. Methods and Results: A total of 38 high-quality simple sequence repeat (SSR primers were identified, of which 15 could be successfully amplified. Subsequently, we selected 80 individuals to represent the three species of the genus to evaluate the efficacy of these markers for examining genetic diversity of each species in the future. We found that the number of alleles per locus ranged from one to nine, with an average of 3.8. The expected heterozygosity and observed heterozygosity per locus varied from 0 to 0.829 and from 0 to 1, respectively, with their respective mean values as 0.483 and 0.416. Conclusions: These EST-SSR markers will be useful for evaluating the range-wide genetic diversity of each species and examining genetic divergence and gene flow between the three species.

  3. Genetic characterization of an elite coffee germplasm assessed by gSSR and EST-SSR markers.

    Science.gov (United States)

    Missio, R F; Caixeta, E T; Zambolim, E M; Pena, G F; Zambolim, L; Dias, L A S; Sakiyama, N S

    2011-10-06

    Coffee is one of the main agrifood commodities traded worldwide. In 2009, coffee accounted for 6.1% of the value of Brazilian agricultural production, generating a revenue of US$6 billion. Despite the importance of coffee production in Brazil, it is supported by a narrow genetic base, with few accessions. Molecular differentiation and diversity of a coffee breeding program were assessed with gSSR and EST-SSR markers. The study comprised 24 coffee accessions according to their genetic origin: arabica accessions (six traditional genotypes of C. arabica), resistant arabica (six leaf rust-resistant C. arabica genotypes with introgression of Híbrido de Timor), robusta (five C. canephora genotypes), Híbrido de Timor (three C. arabica x C. canephora), triploids (three C. arabica x C. racemosa), and racemosa (one C. racemosa). Allele and polymorphism analysis, AMOVA, the Student t-test, Jaccard's dissimilarity coefficient, cluster analysis, correlation of genetic distances, and discriminant analysis, were performed. EST-SSR markers gave 25 exclusive alleles per genetic group, while gSSR showed 47, which will be useful for differentiating accessions and for fingerprinting varieties. The gSSR markers detected a higher percentage of polymorphism among (35% higher on average) and within (42.9% higher on average) the genetic groups, compared to EST-SSR markers. The highest percentage of polymorphism within the genetic groups was found with gSSR markers for robusta (89.2%) and for resistant arabica (39.5%). It was possible to differentiate all genotypes including the arabica-related accessions. Nevertheless, combined use of gSSR and EST-SSR markers is recommended for coffee molecular characterization, because EST-SSRs can provide complementary information.

  4. Exploiting EST databases for the development and characterization of EST-SSR markers in castor bean (Ricinus communis L.

    Directory of Open Access Journals (Sweden)

    Yang Jun-Bo

    2010-12-01

    Full Text Available Abstract Background The castor bean (Ricinus communis L., a monotypic species in the spurge family (Euphorbiaceae, 2n = 20, is an important non-edible oilseed crop widely cultivated in tropical, sub-tropical and temperate countries for its high economic value. Because of the high level of ricinoleic acid (over 85% in its seed oil, the castor bean seed derivatives are often used in aviation oil, lubricants, nylon, dyes, inks, soaps, adhesive and biodiesel. Due to lack of efficient molecular markers, little is known about the population genetic diversity and the genetic relationships among castor bean germplasm. Efficient and robust molecular markers are increasingly needed for breeding and improving varieties in castor bean. The advent of modern genomics has produced large amounts of publicly available DNA sequence data. In particular, expressed sequence tags (ESTs provide valuable resources to develop gene-associated SSR markers. Results In total, 18,928 publicly available non-redundant castor bean EST sequences, representing approximately 17.03 Mb, were evaluated and 7732 SSR sites in 5,122 ESTs were identified by data mining. Castor bean exhibited considerably high frequency of EST-SSRs. We developed and characterized 118 polymorphic EST-SSR markers from 379 primer pairs flanking repeats by screening 24 castor bean samples collected from different countries. A total of 350 alleles were identified from 118 polymorphic SSR loci, ranging from 2-6 per locus (A with an average of 2.97. The EST-SSR markers developed displayed moderate gene diversity (He with an average of 0.41. Genetic relationships among 24 germplasms were investigated using the genotypes of 350 alleles, showing geographic pattern of genotypes across genetic diversity centers of castor bean. Conclusion Castor bean EST sequences exhibited considerably high frequency of SSR sites, and were rich resources for developing EST-SSR markers. These EST-SSR markers would be particularly

  5. Development and characterization of polymorphic EST-SSR and genomic SSR markers for Tibetan annual wild barley.

    Science.gov (United States)

    Zhang, Mian; Mao, Weihua; Zhang, Guoping; Wu, Feibo

    2014-01-01

    Tibetan annual wild barley is rich in genetic variation. This study was aimed at the exploitation of new SSRs for the genetic diversity and phylogenetic analysis of wild barley by data mining. We developed 49 novel EST-SSRs and confirmed 20 genomic SSRs for 80 Tibetan annual wild barley and 16 cultivated barley accessions. A total of 213 alleles were generated from 69 loci with an average of 3.14 alleles per locus. The trimeric repeats were the most abundant motifs (40.82%) among the EST-SSRs, while the majority of the genomic SSRs were di-nuleotide repeats. The polymorphic information content (PIC) ranged from 0.08 to 0.75 with a mean of 0.46. Besides this, the expected heterozygosity (He) ranged from 0.0854 to 0.7842 with an average of 0.5279. Overall, the polymorphism of genomic SSRs was higher than that of EST-SSRs. Furthermore, the number of alleles and the PIC of wild barley were both higher than that of cultivated barley, being 3.12 vs 2.59 and 0.44 vs 0.37. Indicating more polymorphism existed in the Tibetan wild barley than in cultivated barley. The 96 accessions were divided into eight subpopulations based on 69 SSR markers, and the cultivated genotypes can be clearly separated from wild barleys. A total of 47 SSR-containing EST unigenes showed significant similarities to the known genes. These EST-SSR markers have potential for application in germplasm appraisal, genetic diversity and population structure analysis, facilitating marker-assisted breeding and crop improvement in barley. PMID:24736399

  6. Development and characterization of polymorphic EST-SSR and genomic SSR markers for Tibetan annual wild barley.

    Directory of Open Access Journals (Sweden)

    Mian Zhang

    Full Text Available Tibetan annual wild barley is rich in genetic variation. This study was aimed at the exploitation of new SSRs for the genetic diversity and phylogenetic analysis of wild barley by data mining. We developed 49 novel EST-SSRs and confirmed 20 genomic SSRs for 80 Tibetan annual wild barley and 16 cultivated barley accessions. A total of 213 alleles were generated from 69 loci with an average of 3.14 alleles per locus. The trimeric repeats were the most abundant motifs (40.82% among the EST-SSRs, while the majority of the genomic SSRs were di-nuleotide repeats. The polymorphic information content (PIC ranged from 0.08 to 0.75 with a mean of 0.46. Besides this, the expected heterozygosity (He ranged from 0.0854 to 0.7842 with an average of 0.5279. Overall, the polymorphism of genomic SSRs was higher than that of EST-SSRs. Furthermore, the number of alleles and the PIC of wild barley were both higher than that of cultivated barley, being 3.12 vs 2.59 and 0.44 vs 0.37. Indicating more polymorphism existed in the Tibetan wild barley than in cultivated barley. The 96 accessions were divided into eight subpopulations based on 69 SSR markers, and the cultivated genotypes can be clearly separated from wild barleys. A total of 47 SSR-containing EST unigenes showed significant similarities to the known genes. These EST-SSR markers have potential for application in germplasm appraisal, genetic diversity and population structure analysis, facilitating marker-assisted breeding and crop improvement in barley.

  7. Isolation and Characterization of Novel Genomic and EST-SSR Markers in Coreoperca whiteheadi Boulenger and Cross-Species Amplification

    Directory of Open Access Journals (Sweden)

    YaQi Dou

    2012-10-01

    Full Text Available We described and characterized 11 expressed sequence tag (EST-derived simple sequence repeats (SSR and seven genomic (G-derived SSRs in Coreoperca whiteheadi Boulenger. The EST-SSRs comprised 62.2% di-nucleotide repeats, 32.2% tri-nucleotide repeats and 5.5% tetra-nucleotide repeats, whereas the majority of the G-SSRs were tri-nuleotide repeats (81.4%. The number of alleles for the 18 loci ranged from 3 to 6, with a mean of 3.8 alleles per locus. The observed (Ho and expected heterozygosities (He values ranged from 0.375 to 1.000, and 0.477 to 0.757, respectively. The polymorphic information content (PIC values ranged from 0.466 to 0.706. The mean values number of alleles, Ho, He, and PIC of EST-SSRs were higher than those of the G-SSRs. Four microsatellite loci deviated significantly from Hardy-Weinberg equilibrium (HWE after Bonferroni correction and no significant deviations in linkage disequilibrium (LD were observed. These loci are the first to be characterized in C. whiteheadi and should be useful in the investigation of a genetic evaluation for conservation. Compared with 11 loci in C. whiteheadi, 37 potential polymorphic EST-SSRs were found in Siniperca chuatsi (Basilewsky, which will provide a valuable tool for mapping studies and molecular breeding programs in S. chuatsi.

  8. Transcriptome analysis of colored calla lily (Zantedeschia rehmannii Engl.) by Illumina sequencing: de novo assembly, annotation and EST-SSR marker development.

    Science.gov (United States)

    Wei, Zunzheng; Sun, Zhenzhen; Cui, Binbin; Zhang, Qixiang; Xiong, Min; Wang, Xian; Zhou, Di

    2016-01-01

    Colored calla lily is the short name for the species or hybrids in section Aestivae of genus Zantedeschia. It is currently one of the most popular flower plants in the world due to its beautiful flower spathe and long postharvest life. However, little genomic information and few molecular markers are available for its genetic improvement. Here, de novo transcriptome sequencing was performed to produce large transcript sequences for Z. rehmannii cv. 'Rehmannii' using an Illumina HiSeq 2000 instrument. More than 59.9 million cDNA sequence reads were obtained and assembled into 39,298 unigenes with an average length of 1,038 bp. Among these, 21,077 unigenes showed significant similarity to protein sequences in the non-redundant protein database (Nr) and in the Swiss-Prot, Gene Ontology (GO), Cluster of Orthologous Group (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Moreover, a total of 117 unique transcripts were then defined that might regulate the flower spathe development of colored calla lily. Additionally, 9,933 simple sequence repeats (SSRs) and 7,162 single nucleotide polymorphisms (SNPs) were identified as putative molecular markers. High-quality primers for 200 SSR loci were designed and selected, of which 58 amplified reproducible amplicons were polymorphic among 21 accessions of colored calla lily. The sequence information and molecular markers in the present study will provide valuable resources for genetic diversity analysis, germplasm characterization and marker-assisted selection in the genus Zantedeschia. PMID:27635342

  9. Transcriptome analysis of colored calla lily (Zantedeschia rehmannii Engl.) by Illumina sequencing: de novo assembly, annotation and EST-SSR marker development.

    Science.gov (United States)

    Wei, Zunzheng; Sun, Zhenzhen; Cui, Binbin; Zhang, Qixiang; Xiong, Min; Wang, Xian; Zhou, Di

    2016-01-01

    Colored calla lily is the short name for the species or hybrids in section Aestivae of genus Zantedeschia. It is currently one of the most popular flower plants in the world due to its beautiful flower spathe and long postharvest life. However, little genomic information and few molecular markers are available for its genetic improvement. Here, de novo transcriptome sequencing was performed to produce large transcript sequences for Z. rehmannii cv. 'Rehmannii' using an Illumina HiSeq 2000 instrument. More than 59.9 million cDNA sequence reads were obtained and assembled into 39,298 unigenes with an average length of 1,038 bp. Among these, 21,077 unigenes showed significant similarity to protein sequences in the non-redundant protein database (Nr) and in the Swiss-Prot, Gene Ontology (GO), Cluster of Orthologous Group (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Moreover, a total of 117 unique transcripts were then defined that might regulate the flower spathe development of colored calla lily. Additionally, 9,933 simple sequence repeats (SSRs) and 7,162 single nucleotide polymorphisms (SNPs) were identified as putative molecular markers. High-quality primers for 200 SSR loci were designed and selected, of which 58 amplified reproducible amplicons were polymorphic among 21 accessions of colored calla lily. The sequence information and molecular markers in the present study will provide valuable resources for genetic diversity analysis, germplasm characterization and marker-assisted selection in the genus Zantedeschia.

  10. Transcriptome analysis of colored calla lily (Zantedeschia rehmannii Engl.) by Illumina sequencing: de novo assembly, annotation and EST-SSR marker development

    Science.gov (United States)

    Cui, Binbin; Zhang, Qixiang; Xiong, Min; Wang, Xian

    2016-01-01

    Colored calla lily is the short name for the species or hybrids in section Aestivae of genus Zantedeschia. It is currently one of the most popular flower plants in the world due to its beautiful flower spathe and long postharvest life. However, little genomic information and few molecular markers are available for its genetic improvement. Here, de novo transcriptome sequencing was performed to produce large transcript sequences for Z. rehmannii cv. ‘Rehmannii’ using an Illumina HiSeq 2000 instrument. More than 59.9 million cDNA sequence reads were obtained and assembled into 39,298 unigenes with an average length of 1,038 bp. Among these, 21,077 unigenes showed significant similarity to protein sequences in the non-redundant protein database (Nr) and in the Swiss-Prot, Gene Ontology (GO), Cluster of Orthologous Group (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Moreover, a total of 117 unique transcripts were then defined that might regulate the flower spathe development of colored calla lily. Additionally, 9,933 simple sequence repeats (SSRs) and 7,162 single nucleotide polymorphisms (SNPs) were identified as putative molecular markers. High-quality primers for 200 SSR loci were designed and selected, of which 58 amplified reproducible amplicons were polymorphic among 21 accessions of colored calla lily. The sequence information and molecular markers in the present study will provide valuable resources for genetic diversity analysis, germplasm characterization and marker-assisted selection in the genus Zantedeschia.

  11. Development, chromosome location and genetic mapping of EST-SSR markers in wheat

    Institute of Scientific and Technical Information of China (English)

    CHEN Haimei; LI Linzhi; WEI Xianyun; LI Sishen; LEI Tiandong; HU Haizhou; WANG Honggang; ZHANG Xiansheng

    2005-01-01

    A number of 151695 wheat expression sequence tags (ESTs) that originated from GenBank/dbEST from July 14, 2003 to August 24, 2004 were used to search for simple sequence repeats (SSRs) with motif 2―5 bp, and 2038 simple sequence repeats (EST-SSRs), which accounted for 1.34% of EST database, were identified. Based on these SSR sequences, 249 EST-SSR primer pairs and 166 amplified clear bands in various wheat cultivars were designed. These EST-SSR markers can be used as new molecular markers in wheat and related species. Using Chinese Spring nulli-tetrasomic lines, 93 EST-SSR primer pairs and 193 EST-SSR loci were located on 19 wheat chromosomes except for 4A and 4B. Forty-three loci were mapped on 11 chromosomes of the genetic framework map previously constructed using recombinant inbred lines.

  12. A second generation framework for the analysis of microsatellites in expressed sequence tags and the development of EST-SSR markers for a conifer, Cryptomeria japonica

    Directory of Open Access Journals (Sweden)

    Ueno Saneyoshi

    2012-04-01

    Full Text Available Abstract Background Microsatellites or simple sequence repeats (SSRs in expressed sequence tags (ESTs are useful resources for genome analysis because of their abundance, functionality and polymorphism. The advent of commercial second generation sequencing machines has lead to new strategies for developing EST-SSR markers, necessitating the development of bioinformatic framework that can keep pace with the increasing quality and quantity of sequence data produced. We describe an open scheme for analyzing ESTs and developing EST-SSR markers from reads collected by Sanger sequencing and pyrosequencing of sugi (Cryptomeria japonica. Results We collected 141,097 sequence reads by Sanger sequencing and 1,333,444 by pyrosequencing. After trimming contaminant and low quality sequences, 118,319 Sanger and 1,201,150 pyrosequencing reads were passed to the MIRA assembler, generating 81,284 contigs that were analysed for SSRs. 4,059 SSRs were found in 3,694 (4.54% contigs, giving an SSR frequency lower than that in seven other plant species with gene indices (5.4–21.9%. The average GC content of the SSR-containing contigs was 41.55%, compared to 40.23% for all contigs. Tri-SSRs were the most common SSRs; the most common motif was AT, which was found in 655 (46.3% di-SSRs, followed by the AAG motif, found in 342 (25.9% tri-SSRs. Most (72.8% tri-SSRs were in coding regions, but 55.6% of the di-SSRs were in non-coding regions; the AT motif was most abundant in 3′ untranslated regions. Gene ontology (GO annotations showed that six GO terms were significantly overrepresented within SSR-containing contigs. Forty–four EST-SSR markers were developed from 192 primer pairs using two pipelines: read2Marker and the newly-developed CMiB, which combines several open tools. Markers resulting from both pipelines showed no differences in PCR success rate and polymorphisms, but PCR success and polymorphism were significantly affected by the expected PCR product size

  13. Expressed Sequence Tag-Simple Sequence Repeat (EST-SSR Marker Resources for Diversity Analysis of Mango (Mangifera indica L.

    Directory of Open Access Journals (Sweden)

    Natalie L. Dillon

    2014-01-01

    Full Text Available In this study, a collection of 24,840 expressed sequence tags (ESTs generated from five mango (Mangifera indica L. cDNA libraries was mined for EST-based simple sequence repeat (SSR markers. Over 1,000 ESTs with SSR motifs were detected from more than 24,000 EST sequences with di- and tri-nucleotide repeat motifs the most abundant. Of these, 25 EST-SSRs in genes involved in plant development, stress response, and fruit color and flavor development pathways were selected, developed into PCR markers and characterized in a population of 32 mango selections including M. indica varieties, and related Mangifera species. Twenty-four of the 25 EST-SSR markers exhibited polymorphisms, identifying a total of 86 alleles with an average of 5.38 alleles per locus, and distinguished between all Mangifera selections. Private alleles were identified for Mangifera species. These newly developed EST-SSR markers enhance the current 11 SSR mango genetic identity panel utilized by the Australian Mango Breeding Program. The current panel has been used to identify progeny and parents for selection and the application of this extended panel will further improve and help to design mango hybridization strategies for increased breeding efficiency.

  14. Construction of full-length cDNA library and development of EST-derived simple sequence repeat (EST-SSR) markers in Senecio scandens.

    Science.gov (United States)

    Qian, Gang; Ping, Junjiao; Lu, Jian; Zhang, Zhen; Wang, Lei; Xu, Delin

    2014-12-01

    Senecio scandens Buch.-Ham. ex D. Don (Compositae) is a crucial source of Chinese traditional medicine with antibacterial properties. We constructed a cDNA library and obtained expressed sequence tags (ESTs) to show the distribution of gene ontology annotations for mRNAs, using an individual plant with superior antibacterial characteristics. Analysis of comparative genomics indicates that the putative uncharacterized proteins (21.07%) might be derived from "molecular function unknown" clones or rare transcripts. Furthermore, the Compositae had high cross-species transferability of EST-derived simple sequence repeats (EST-SSR), based on valid amplifications of 206 primer pairs developed from the newly assembled expressed sequence tag sequences in Artemisia annua L. Among those EST-SSR markers, 52 primers showed polymorphic amplifications between individuals with contrasting diverse antibacterial traits. Our sequence data and molecular markers will be cost-effective tools for further studies such as genome annotation, molecular breeding, and novel transcript profiles within Compositae species. PMID:25007751

  15. Development of EST-SSR markers in Barringtonia racemosa (Lecythidaceae) and cross-amplification in related species1

    Science.gov (United States)

    Xie, Hongxian; Yuan, Yang; Fang, Xiaoting; Liu, Ying; Yang, Chao; Jin, Jianhua; Tan, Fengxiao; Huang, Yelin

    2015-01-01

    Premise of the study: Microsatellite markers were identified and characterized to study the genetic diversity and structure of Barringtonia racemosa (Lecythidaceae). Methods and Results: Based on the transcriptome data of B. racemosa, 30 primer pairs were initially designed and tested, of which 15 were successfully amplified and displayed clear polymorphisms across the 43 individuals from three distant populations tested in the study. The results showed that the number of alleles per locus ranged from two to seven and the expected heterozygosity and observed heterozygosity per locus varied from 0 to 0.772 and from 0 to 0.933, respectively. Conclusions: The expressed sequence tag–simple sequence repeat (EST-SSR) markers described here will be useful for studying genetic diversity and structure of B. racemosa. Furthermore, all loci were successfully cross-amplified in B. asiatica and B. acutangula and will be of great value for genetic studies across this genus. PMID:26697277

  16. Development, cross-species/genera transferability of novel EST-SSR markers and their utility in revealing population structure and genetic diversity in sugarcane

    KAUST Repository

    Singh, Ram K.

    2013-07-01

    Sugarcane (Saccharum spp. hybrid) with complex polyploid genome requires a large number of informative DNA markers for various applications in genetics and breeding. Despite the great advances in genomic technology, it is observed in several crop species, especially in sugarcane, the availability of molecular tools such as microsatellite markers are limited. Now-a-days EST-SSR markers are preferred to genomic SSR (gSSR) as they represent only the functional part of the genome, which can be easily associated with desired trait. The present study was taken up with a new set of 351 EST-SSRs developed from the 4085 non redundant EST sequences of two Indian sugarcane cultivars. Among these EST-SSRs, TNR containing motifs were predominant with a frequency of 51.6%. Thirty percent EST-SSRs showed homology with annotated protein. A high frequency of SSRs was found in the 5\\'UTR and in the ORF (about 27%) and a low frequency was observed in the 3\\'UTR (about 8%). Two hundred twenty-seven EST-SSRs were evaluated, in sugarcane, allied genera of sugarcane and cereals, and 134 of these have revealed polymorphism with a range of PIC value 0.12 to 0.99. The cross transferability rate ranged from 87.0% to 93.4% in Saccharum complex, 80.0% to 87.0% in allied genera, and 76.0% to 80.0% in cereals. Cloning and sequencing of EST-SSR size variant amplicons revealed that the variation in the number of repeat-units was the main source of EST-SSR fragment polymorphism. When 124 sugarcane accessions were analyzed for population structure using model-based approach, seven genetically distinct groups or admixtures thereof were observed in sugarcane. Results of principal coordinate analysis or UPGMA to evaluate genetic relationships delineated also the 124 accessions into seven groups. Thus, a high level of polymorphism adequate genetic diversity and population structure assayed with the EST-SSR markers not only suggested their utility in various applications in genetics and genomics in

  17. Transferability and polymorphism of barley EST-SSR markers used for phylogenetic analysis in Hordeum chilense

    Directory of Open Access Journals (Sweden)

    Dorado Gabriel

    2008-09-01

    Full Text Available Abstract Background Hordeum chilense, a native South American diploid wild barley, is a potential source of useful genes for cereal breeding. The use of this wild species to increase genetic variation in cereals will be greatly facilitated by marker-assisted selection. Different economically feasible approaches have been undertaken for this wild species with limited direct agricultural use in a search for suitable and cost-effective markers. The availability of Expressed Sequence Tags (EST derived microsatellites or simple sequence repeat (SSR markers, commonly called as EST-SSRs, for barley (Hordeum vulgare represents a promising source to increase the number of genetic markers available for the H. chilense genome. Results All of the 82 barley EST-derived SSR primer pairs tested for transferability to H. chilense amplified products of correct size from this species. Of these 82 barley EST-SSRs, 21 (26% showed polymorphism among H. chilense lines. Identified polymorphic markers were used to test the transferability and polymorphism in other Poaceae family species with the aim of establishing H. chilense phylogenetic relationships. Triticum aestivum-H. chilense addition lines allowed us to determine the chromosomal localizations of EST-SSR markers and confirm conservation of the linkage group. Conclusion From the present study a set of 21 polymorphic EST-SSR markers have been identified to be useful for diversity analysis of H. chilense, related wild barleys like H. murinum, and for wheat marker-assisted introgression breeding. Across-genera transferability of the barley EST-SSR markers has allowed phylogenetic inference within the Triticeae complex.

  18. Assessment of genetic diversity in the sorghum reference set using EST-SSR markers.

    Science.gov (United States)

    Ramu, P; Billot, C; Rami, J-F; Senthilvel, S; Upadhyaya, H D; Ananda Reddy, L; Hash, C T

    2013-08-01

    Selection and use of genetically diverse genotypes are key factors in any crop breeding program to develop cultivars with a broad genetic base. Molecular markers play a major role in selecting diverse genotypes. In the present study, a reference set representing a wide range of sorghum genetic diversity was screened with 40 EST-SSR markers to validate both the use of these markers for genetic structure analyses and the population structure of this set. Grouping of accessions is identical in distance-based and model-based clustering methods. Genotypes were grouped primarily based on race within the geographic origins. Accessions derived from the African continent contributed 88.6 % of alleles confirming the African origin of sorghum. In total, 360 alleles were detected in the reference set with an average of 9 alleles per marker. The average PIC value was 0.5230 with a range of 0.1379-0.9483. Sub-race, guinea margaritiferum (Gma) from West Africa formed a separate cluster in close proximity to wild accessions suggesting that the Gma group represents an independent domestication event. Guineas from India and Western Africa formed two distinct clusters. Accessions belongs to the kafir race formed the most homogeneous group as observed in earlier studies. This analysis suggests that the EST-SSR markers used in the present study have greater discriminating power than the genomic SSRs. Genetic variance within the subpopulations was very high (71.7 %) suggesting that the germplasm lines included in the set are more diverse. Thus, this reference set representing the global germplasm is an ideal material for the breeding community, serving as a community resource for trait-specific allele mining as well as genome-wide association mapping.

  19. tropiTree: an NGS-based EST-SSR resource for 24 tropical tree species.

    Science.gov (United States)

    Russell, Joanne R; Hedley, Peter E; Cardle, Linda; Dancey, Siobhan; Morris, Jenny; Booth, Allan; Odee, David; Mwaura, Lucy; Omondi, William; Angaine, Peter; Machua, Joseph; Muchugi, Alice; Milne, Iain; Kindt, Roeland; Jamnadass, Ramni; Dawson, Ian K

    2014-01-01

    The development of genetic tools for non-model organisms has been hampered by cost, but advances in next-generation sequencing (NGS) have created new opportunities. In ecological research, this raises the prospect for developing molecular markers to simultaneously study important genetic processes such as gene flow in multiple non-model plant species within complex natural and anthropogenic landscapes. Here, we report the use of bar-coded multiplexed paired-end Illumina NGS for the de novo development of expressed sequence tag-derived simple sequence repeat (EST-SSR) markers at low cost for a range of 24 tree species. Each chosen tree species is important in complex tropical agroforestry systems where little is currently known about many genetic processes. An average of more than 5,000 EST-SSRs was identified for each of the 24 sequenced species, whereas prior to analysis 20 of the species had fewer than 100 nucleotide sequence citations. To make results available to potential users in a suitable format, we have developed an open-access, interactive online database, tropiTree (http://bioinf.hutton.ac.uk/tropiTree), which has a range of visualisation and search facilities, and which is a model for the efficient presentation and application of NGS data.

  20. SSR and EST-SSR-based genetic linkage map of cassava (Manihot esculenta Crantz).

    Science.gov (United States)

    Sraphet, Supajit; Boonchanawiwat, Athipong; Thanyasiriwat, Thanwanit; Boonseng, Opas; Tabata, Satoshi; Sasamoto, Shigemi; Shirasawa, Kenta; Isobe, Sachiko; Lightfoot, David A; Tangphatsornruang, Sithichoke; Triwitayakorn, Kanokporn

    2011-04-01

    Simple sequence repeat (SSR) markers provide a powerful tool for genetic linkage map construction that can be applied for identification of quantitative trait loci (QTL). In this study, a total of 640 new SSR markers were developed from an enriched genomic DNA library of the cassava variety 'Huay Bong 60' and 1,500 novel expressed sequence tag-simple sequence repeat (EST-SSR) loci were developed from the Genbank database. To construct a genetic linkage map of cassava, a 100 F(1) line mapping population was developed from the cross Huay Bong 60 by 'Hanatee'. Polymorphism screening between the parental lines revealed that 199 SSRs and 168 EST-SSRs were identified as novel polymorphic markers. Combining with previously developed SSRs, we report a linkage map consisted of 510 markers encompassing 1,420.3 cM, distributed on 23 linkage groups with a mean distance between markers of 4.54 cM. Comparison analysis of the SSR order on the cassava linkage map and the cassava genome sequences allowed us to locate 284 scaffolds on the genetic map. Although the number of linkage groups reported here revealed that this F(1) genetic linkage map is not yet a saturated map, it encompassed around 88% of the cassava genome indicating that the map was almost complete. Therefore, sufficient markers now exist to encompass most of the genomes and efficiently map traits in cassava.

  1. First genetic linkage map of Taraxacum koksaghyz Rodin based on AFLP, SSR, COS and EST-SSR markers.

    Science.gov (United States)

    Arias, Marina; Hernandez, Monica; Remondegui, Naroa; Huvenaars, Koen; van Dijk, Peter; Ritter, Enrique

    2016-01-01

    Taraxacum koksaghyz Rodin (TKS) has been studied in many occasions as a possible alternative source for natural rubber production of good quality and for inulin production. Some tire companies are already testing TKS tire prototypes. There are also many investigations on the production of bio-fuels from inulin and inulin applications for health improvement and in the food industry. A limited amount of genomic resources exist for TKS and particularly no genetic linkage map is available in this species. We have constructed the first TKS genetic linkage map based on AFLP, COS, SSR and EST-SSR markers. The integrated linkage map with eight linkage groups (LG), representing the eight chromosomes of Russian dandelion, has 185 individual AFLP markers from parent 1, 188 individual AFLP markers from parent 2, 75 common AFLP markers and 6 COS, 1 SSR and 63 EST-SSR loci. Blasting the EST-SSR sequences against known sequences from lettuce allowed a partial alignment of our TKS map with a lettuce map. Blast searches against plant gene databases revealed some homologies with useful genes for downstream applications in the future. PMID:27488242

  2. First genetic linkage map of Taraxacum koksaghyz Rodin based on AFLP, SSR, COS and EST-SSR markers

    Science.gov (United States)

    Arias, Marina; Hernandez, Monica; Remondegui, Naroa; Huvenaars, Koen; van Dijk, Peter; Ritter, Enrique

    2016-01-01

    Taraxacum koksaghyz Rodin (TKS) has been studied in many occasions as a possible alternative source for natural rubber production of good quality and for inulin production. Some tire companies are already testing TKS tire prototypes. There are also many investigations on the production of bio-fuels from inulin and inulin applications for health improvement and in the food industry. A limited amount of genomic resources exist for TKS and particularly no genetic linkage map is available in this species. We have constructed the first TKS genetic linkage map based on AFLP, COS, SSR and EST-SSR markers. The integrated linkage map with eight linkage groups (LG), representing the eight chromosomes of Russian dandelion, has 185 individual AFLP markers from parent 1, 188 individual AFLP markers from parent 2, 75 common AFLP markers and 6 COS, 1 SSR and 63 EST-SSR loci. Blasting the EST-SSR sequences against known sequences from lettuce allowed a partial alignment of our TKS map with a lettuce map. Blast searches against plant gene databases revealed some homologies with useful genes for downstream applications in the future. PMID:27488242

  3. Genetic diversity revealed by genomic-SSR and EST-SSR markers among common wheat, spelt and compactum

    Institute of Scientific and Technical Information of China (English)

    YANG Xinquan; LIU Peng; HAN Zongfu; NI Zhongfu; SUN Qixin

    2005-01-01

    In this study, two SSR molecular markers, named genomic-SSR and EST-SSR, are used to measure the genetic diversity among three hexaploid wheat populations, which include 28 common wheat ( Triticum aestivum L. ), 13 spelt ( Triticum spelta L. ),and 11 compactum ( Triticum compactum Host. ). The results show that common wheat has the highest genetic polymorphism, followed by spelt and then compactum. The mean genetic distance between the populations is higher than that within a population, and similar tendency is detected for individual genomes A, B and D. Therefore, spelt and compactum can be used as potential germplasms for wheat breeding, especially for enriching the genetic variation in genome D. As compared with spelt, the genetic diversity between common wheat and compactum is much smaller, indicating a closer consanguine relationship between these two species. Although the polymorphism revealed by EST-SSR is lower than that by genomic-SSR, it can effectively differentiate diverse genotypes as well. Together with our present results, it is concluded that EST-SSR marker is an ideal marker for assessing the genetic diversity in wheat. Meanwhile, the origin and evolution of hexaploid wheat is also analyzed and discussed.

  4. Development of Soybean EST-SSR Markers and Their Use to Assess Genetic Diversity in the Subgenus Soja

    Institute of Scientific and Technical Information of China (English)

    LIU Yu-lin; LI Ying-hui; ZHOU Guo-an; Uzokwe N; CHANG Ru-zhen; CHEN Shou-yi; QIU Li-juan

    2010-01-01

    Developing expressed sequence tag-derived SSR (EST-SSR) markers is imperative in genetic research. In this paper, we reported 37 EST-SSR markers which were developed from 286 unigenes obtained from soybean eDNA library. Among the 286 markers designed for the 4 accessions of Glycine max and 6 of its wild progenitor (G. soja) within the subgenus Soja,209 markers amplified DNA fragments, taking 73.1% and 37 markers appeared to be polymorphic, which was 12.9% of the total. The 37 loci detected a total of 142 alleles, while the PIC values varied from 0.194 to 0.794. Both the number of alleles per locus and PIC value were significantly related to the SSR motif. Six EST-SSR loci may be fixed for different alleles between G. max and G. soja since they were particularly polymorphic among the 6 G. soja accessions. A neighbor-joining tree placed the G. max accessions together as a group within the G. soja, though the average genetic distance among G. soja accessions was much higher. These new EST-SSRs markers will be useful for genetic diversity analysis, genetic mapping construction and gene discovery in Soja subgenus.

  5. Transferable EST-SSR markers for the study of polymorphism and genetic diversity in bread wheat.

    Science.gov (United States)

    Gupta, P K; Rustgi, S; Sharma, S; Singh, R; Kumar, N; Balyan, H S

    2003-12-01

    Nearly 900 SSRs (simple sequence repeats) were identified among 15,000 ESTs (expressed sequence tags) belonging to bread wheat ( Triticum aestivumL.). The SSRs were defined by their minimum length, which ranged from 14 to 21 bp. The maximum length ranged from 24 to 87 bp depending upon the length of the repeat unit itself (1-7 bp). The average density of SSRs was one SSR per 9.2 kb of EST sequence screened. The trinucleotide repeats were the most abundant SSRs detected. As a representative sample, 78 primer pairs were designed, which were also used to screen the dbEST entries for Hordeum vulgare and Triticum tauschii (donor of the D-genome of cultivated wheat) using a cut-off E (expectation) value of 0.01. On the basis of in silico analysis, up to 55.12% of the primer pairs exhibited transferability from Triticum to Hordeum, indicating that the sequences flanking the SSRs are not only conserved within a single genus but also between related genera in Poaceae. Primer pairs for the 78 SSRs were synthesized and used successfully for the study of (1) their transferability to 18 related wild species and five cereal species (barley, oat, rye, rice and maize); and (2) polymorphism between the parents of four mapping populations available with us. A subset of 20 EST-SSR primers was also used to assess genetic diversity in a collection of 52 elite exotic wheat genotypes. This was done with a view to compare their utility relative to other molecular markers (gSSRs, AFLPs, and SAMPL) previously used by us for the same purpose with the same set of 52 bread wheat genotypes. Although only a low level of polymorphism was detected, relative to that observed with genomic SSRs, the study suggested that EST-SSRs can be successfully used for a variety of purposes, and may actually prove superior to SSR markers extracted from genomic libraries for diversity estimation and transferability. PMID:14508680

  6. Assessment of EST-SSR markers for genetic analisys on coffee Potencial de marcadores EST-SSR para análise genética em café

    Directory of Open Access Journals (Sweden)

    Robson Fernando Missio

    2009-09-01

    Full Text Available EST-SSR markers were used to investigate the genetic diversity among and within coffee populations, to explore the possibility of their use for fingerprinting of cultivars and to assist breeding programs. Seventeen markers, developed from ESTs (Expressed Sequence Tags from the Brazilian Coffee Genome Project, were used. All markers showed polymorphism among the genotypes assessed. The average number of allele per primer was 5.1. The highest polymorphisms were found within C. canephora (88.2% and rust-resistant varieties (35.3%. About 29.4% of the markers differentiated C. arabica from Híbrido de Timor; it was also possible to identify those closest and farthest from C. arabica . The analysis of population-grouped genotypes revealed a 64.0% genetic diversity among and a 36.0% genetic diversity within populations. The differentiation index was 0.637. Six markers distinguished four rust-resistance varieties, showing their fingerprinting potential. These results demonstrate the usefulness of EST-SSR markers for cross orientation, in diversity and introgression studies, and in genetic mapping.No estudo da diversidade genética entre e dentro de populações de café, foram usados marcadores EST-SSR, visando avaliar seu potencial para identificar cultivares comerciais e assistir programas de melhoramento. Os 17 marcadores utilizados foram desenvolvidos a partir das seqüências ESTs do Projeto Brasileiro do Genoma Café. Em todos os marcadores observou-se polimorfismo entre os genótipos avaliados, com um número médio de 5,1 alelos por primer. Os maiores polimorfismos foram constatados dentro de C. canephora (88,2% e em variedades resistentes à ferrugem (35,3%. Dos marcadores analisados, 29,4% distinguiram C. arabica dos Híbridos de Timor (HDT, sendo possível identificar os mais próximos e os mais distantes de C. arabica . A análise dos genótipos agrupados por população revelou diversidade genética de 64% entre populações e 36% dentro

  7. Development and Characterization of EST-SSR Markers in the Chinese Medicinal Plant Callerya speciosa (Fabaceae

    Directory of Open Access Journals (Sweden)

    Li Li

    2013-06-01

    Full Text Available Premise of the study: The first microsatellite primers were developed for Callerya speciosa, an important traditional medicinal plant with island-mainland distributions in China, to further investigate its genetic variability and population structure. Methods and Results: The microsatellite-containing sequences were selected from a cDNA library of C. speciosa. In total, 58 primer pairs were designed, and 25 of the corresponding loci showed clear amplification. Polymorphisms were assessed in two different natural populations. The mean number of alleles per locus ranged from two to nine. Observed and expected heterozygosity per loci ranged from 0.067 to 0.938 and 0.064 to 0.836, respectively. One out of 25 loci showed departure from Hardy–Weinberg equilibrium expectations in both populations, and three pairs of loci showed significant linkage disequilibrium after Bonferroni correction. Conclusions: These microsatellite markers will be useful tools for genetic and conservation studies and to understand the evolutionary processes in Callerya species.

  8. Exploiting Illumina Sequencing for the Development of 95 Novel Polymorphic EST-SSR Markers in Common Vetch (Vicia sativa subsp. sativa

    Directory of Open Access Journals (Sweden)

    Zhipeng Liu

    2014-05-01

    Full Text Available The common vetch (Vicia sativa subsp. sativa, a self-pollinating and diploid species, is one of the most important annual legumes in the world due to its short growth period, high nutritional value, and multiple usages as hay, grain, silage, and green manure. The available simple sequence repeat (SSR markers for common vetch, however, are insufficient to meet the developing demand for genetic and molecular research on this important species. Here, we aimed to develop and characterise several polymorphic EST-SSR markers from the vetch Illumina transcriptome. A total number of 1,071 potential EST-SSR markers were identified from 1025 unigenes whose lengths were greater than 1,000 bp, and 450 primer pairs were then designed and synthesized. Finally, 95 polymorphic primer pairs were developed for the 10 common vetch accessions, which included 50 individuals. Among the 95 EST-SSR markers, the number of alleles ranged from three to 13, and the polymorphism information content values ranged from 0.09 to 0.98. The observed heterozygosity values ranged from 0.00 to 1.00, and the expected heterozygosity values ranged from 0.11 to 0.98. These 95 EST-SSR markers developed from the vetch Illumina transcriptome could greatly promote the development of genetic and molecular breeding studies pertaining to in this species.

  9. New analysis of EST-SSR distribution and development Of EST-SSR markers in Salvia miltiorrhiza%丹参新的EST-SSR分布规律及分子标记的建立

    Institute of Scientific and Technical Information of China (English)

    王学勇; 周晓丽; 高伟; 崔光红; 黄璐琦; 刘春生

    2011-01-01

    Objective: To establish the new EST-SSR markers for analyzing the genetic variation of different population of Salvia miltiorrhiza. Method: It was dealt with ESTs newly downloaded from Cenbank and that of acquired from HMPL lab EGassembler software,and then carried out SSR loci search and SSR type analysis by SSRIT software. After that, it was designed the EST-SSR primer pairs for PCR amplification condition optimization. Result: Abundant and high coverage of SSR loci distribution were found in S. miltiorrhiza with having one SSR per 5.8 kb ESTs. Among them, the occurrences of different repeat units were mainly the di(63.0%) and tri-(35.5% ). The CT/AG was the most frequent motif in dinucleotide motif type and the GAA/TCC was the most frequent motif in trinucleotide repeats. Out off 36 primer pairs, 29 primer pairs ( 80.5% ) were successfully amplified in all samples of S.miltiorrhiza while the rest failed to give PCR products at various annealing temperature and Mg2+ concentrations. The selected primer pairs also showed the polymorphism in samples from different S. miltiorrhiza populations. Conclusion: The newly establishment of EST-SSR markers showed high SSR loci coverage and genetic polymorphisms in S. miltiorrhiza population. It could be used for genetic variation analysis.%目的:建立新近较高覆盖度丹参KST-SSR分子标记,为不同产地来源丹参居群的遗传变异分析奠定基础.方法:对截止2010年11月Genbank下载的丹参EST序列结合HMPL实验室获取的共计1 408条EST序列,进行处理,获取高质量EST序列;经SSRIT软件搜索、分析EST序列中SSR位点的分布规律及类型;在此基础上,运用Websat软件设计EST-SSR引物,经对不同产地丹参样本DNA模板的PCR筛选和条件优化,建立最新EST-SSR分子标记.结果:丹参的EST-SSR种类丰富、覆盖度较大,平均每5.8 kb就检出1条SSR.各种类型出现的频率相差很大,主要重复类型为二核苷酸、三核

  10. Construction of an EST-SSR-based interspecific transcriptome linkage map of fibre development in cotton

    Indian Academy of Sciences (India)

    Chuanxiang Liu; Daojun Yuan; Zhongxu Lin

    2014-12-01

    Quantitative trait locus (QTL) mapping is an important method in marker-assisted selection breeding. Many studies on the QTLs focus on cotton fibre yield and quality; however, most are conducted at the DNA level, which may reveal null QTLs. Hence, QTL mapping based on transcriptome maps at the cDNA level is often more reliable. In this study, an interspecific transcriptome map of allotetraploid cotton was developed based on an F2 population (Emian22 × 3-79) by amplifying cDNA using EST-SSRs. The map was constructed using cDNA obtained from developing fibres at five days post anthesis (DPA). A total of 1270 EST-SSRs were screened for polymorphisms between the mapping parents. The resulting transcriptome linkage map contained 242 markers that were distributed in 32 linkage groups (26 chromosomes). The full length of this map is 1938.72 cM with a mean marker distance of 8.01 cM. The functions of some ESTs have been annotated by exploring homologous sequences. Some markers were related to the differentiation and elongation of cotton fibre, while most were related to the basic metabolism. This study demonstrates that constructing a transcriptome linkage map by amplifying cDNAs using EST-SSRs is a simple and practical method as well as a powerful tool to map eQTLs for fibre quality and other traits in cotton.

  11. A deep sequencing analysis of transcriptomes and the development of EST-SSR markers in mungbean (Vigna radiata)

    Indian Academy of Sciences (India)

    CHANGYOU LIU; BAOJIE FAN; ZHIMIN CAO; QIUZHU SU; YAN WANG; ZHIXIAO ZHANG; JING WU; JING TIAN

    2016-09-01

    Mungbean (Vigna radiata L. Wilczek) is one of the most important leguminous food crops in Asia. We employed Illumina paired-end sequencing to analyse transcriptomes of three different mungbean genotypes. A total of 38.3–39.8 million paired-end reads with 73 bp lengths were generated. The pooled reads from the three libraries were assembled into 56,471 transcripts. Following a cluster analysis, 43,293 unigenes were obtained with an average length of 739 bp and N50 length of 1176 bp. Of the unigenes, 34,903 (80.6%) had significant similarity to known proteins in the NCBI nonredundant protein database (Nr), while 21,450 (58.4%) had BLAST hits in the Swiss-Prot database (E-value < 10⁻⁵). Further, 1245 differential expression genes were detected among three mungbean genotypes. In addition, we identified 3788 expressed sequence tag-simple sequence repeat (EST-SSR) motifs that could be used as potential molecular markers. Among 320 tested loci, 310 (96.5%) yielded amplification products, and 151 (47.0%) exhibited polymorphisms among six mungbean accessions. These transcriptome data and mungbean EST-SSRs could serve as a valuable resource for novel gene discovery and the marker-assisted selective breeding of this specie

  12. Genome evolution of intermediate wheatgrass as revealed by EST-SSR markers developed from its three progenitor diploid species.

    Science.gov (United States)

    Wang, Richard R-C; Larson, Steve R; Jensen, Kevin B; Bushman, B Shaun; DeHaan, Lee R; Wang, Shuwen; Yan, Xuebing

    2015-02-01

    Intermediate wheatgrass (Thinopyrum intermedium (Host) Barkworth & D.R. Dewey), a segmental autoallohexaploid (2n = 6x = 42), is not only an important forage crop but also a valuable gene reservoir for wheat (Triticum aestivum L.) improvement. Throughout the scientific literature, there continues to be disagreement as to the origin of the different genomes in intermediate wheatgrass. Genotypic data obtained from newly developed EST-SSR primers derived from the putative progenitor diploid species Pseudoroegneria spicata (Pursh) Á. Löve (St genome), Thinopyrum bessarabicum (Savul. & Rayss) Á. Löve (J = J(b) = E(b)), and Thinopyrum elongatum (Host) D. Dewey (E = J(e) = E(e)) indicate that the V genome of Dasypyrum (Coss. & Durieu) T. Durand is not one of the three genomes in intermediate wheatgrass. Based on all available information in the literature and findings in this study, the genomic designation of intermediate wheatgrass should be changed to J(vs)J(r)St, where J(vs) and J(r) represent ancestral genomes of present-day J(b) of Th. bessarabicum and J(e) of Th. elongatum, with J(vs) being more ancient. Furthermore, the information suggests that the St genome in intermediate wheatgrass is most similar to the present-day St found in diploid species of Pseudoroegneria from Eurasia.

  13. An EST-SSR based linkage map for Persea americana Mill. (avocado)

    Science.gov (United States)

    Recent enhancement of the pool of known molecular markers for avocado has allowed the construction of the first moderate density genetic map for this species. Over 300 microsatellite markers have been characterized and 163 of these were used to construct a map from the cross of two Florida cultivar...

  14. Functional molecular markers (EST-SSR) in the full-sib reciprocal recurrent selection program of maize (Zea mays L.).

    Science.gov (United States)

    Galvão, K S C; Ramos, H C C; Santos, P H A D; Entringer, G C; Vettorazzi, J C F; Pereira, M G

    2015-01-01

    This study aimed to improve grain yield in the full-sib reciprocal recurrent selection program of maize from the North Fluminense State University. In the current phase of the program, the goal is to maintain, or even increase, the genetic variability within and among populations, in order to increase heterosis of the 13th cycle of reciprocal recurrent selection. Microsatellite expressed sequence tags (EST-SSRs) were used as a tool to assist the maximization step of genetic variability, targeting the functional genome. Eighty S1 progenies of the 13th recur-rent selection cycle, 40 from each population (CIMMYT and Piranão), were analyzed using 20 EST-SSR loci. Genetic diversity, observed heterozygosity, information content of polymorphism, and inbreeding co-efficient were estimated. Subsequently, analysis of genetic dissimilarity, molecular variance, and a graphical dispersion of genotypes were conducted. The number of alleles in the CIMMYT population ranged from 1 to 6, while in the Piranão population the range was from 2 to 8, with a mean of 3.65 and 4.35, respectively. As evidenced by the number of alleles, the Shannon index showed greater diversity for the Piranão population (1.04) in relation to the CIMMYT population (0.89). The genic SSR markers were effective in clustering genotypes into their respective populations before selection and an increase in the variation between populations after selection was observed. The results indicate that the study populations have expressive genetic diversity, which cor-responds to the functional genome, indicating that this strategy may contribute to genetic gain, especially in association with the grain yield of future hybrids. PMID:26214413

  15. Development of EST-SSR Markers in (Citrus aurantium L.)%积壳EST-SSR标记的开发

    Institute of Scientific and Technical Information of China (English)

    杨春霞; 温强; 叶金山; 朱培林

    2011-01-01

    The big collection of expressed sequence tags (ESTs) from Citrus aurantium L. is available in public database, and offers an opportunity to identify simple sequence repeats (SSR) in ESTs by data mining. These sequences may provide an estimate of diversity in the expressed portion of the genome and may be useful for comparative mapping, for tagging important traits of interest, and for additional map-based cloning of important genes.We analyzed fiontal 11 029 Unigene sequences fi.om C. aurantium database using online SSR identified software SSRIT. Totally, 327 ESTs with 348 SSRs were identified, and accounted for 2.96% of the total number of EST sequences. Trinucleotide repeats (a total of 161) were the most abundant repeat class, and accounted for 46.26% of all found SSRs. According to these EST sequences containing SSR, 58 primer pairs were designed using Primer 3.0. of these, 36 primer pairs amplified DNA fragments and 6 primer pairs exhibited polymorphism, account for 62.07% and 10.34% of the total designed 58 pairs. The development of new EST-SSR markers from C. aurantium has potential important implication for genetic analysis and exploitation of genetic resources of C. aurantium and would provide a more direct estimate of functional diversity.%GenBank上己公布的积壳EST序列为开发新的SSR标记提供了宝贵的数据资源.本研究利用在线SSR鉴定软件SSRIT分析来自积壳EST数据库的11029条Unigene序列.分析结果共发现327条EST序列含有348个SSR位点,占总数的2.96%.其中,三核苷酸重复的SSR类型最多,共有161个,占检索总数的46.26%. Primer 3.0设计合成58对EST-SSR引物,其中36对能扩增出产物,6对引物产生多态性分离,分别占所设计引物总数的62.07%和10.34%.本文研究成果为今后积壳遗传多样性分析、遗传图谱构建及比较基因组等研究方面奠定了基础.

  16. Transcriptome sequencing of mung bean (Vigna radiate L. genes and the identification of EST-SSR markers.

    Directory of Open Access Journals (Sweden)

    Honglin Chen

    Full Text Available Mung bean (Vigna radiate (L. Wilczek is an important traditional food legume crop, with high economic and nutritional value. It is widely grown in China and other Asian countries. Despite its importance, genomic information is currently unavailable for this crop plant species or some of its close relatives in the Vigna genus. In this study, more than 103 million high quality cDNA sequence reads were obtained from mung bean using Illumina paired-end sequencing technology. The processed reads were assembled into 48,693 unigenes with an average length of 874 bp. Of these unigenes, 25,820 (53.0% and 23,235 (47.7% showed significant similarity to proteins in the NCBI non-redundant protein and nucleotide sequence databases, respectively. Furthermore, 19,242 (39.5% could be classified into gene ontology categories, 18,316 (37.6% into Swiss-Prot categories and 10,918 (22.4% into KOG database categories (E-value < 1.0E-5. A total of 6,585 (8.3% were mapped onto 244 pathways using the Kyoto Encyclopedia of Genes and Genome (KEGG pathway database. Among the unigenes, 10,053 sequences contained a unique simple sequence repeat (SSR, and 2,303 sequences contained more than one SSR together in the same expressed sequence tag (EST. A total of 13,134 EST-SSRs were identified as potential molecular markers, with mono-nucleotide A/T repeats being the most abundant motif class and G/C repeats being rare. In this SSR analysis, we found five main repeat motifs: AG/CT (30.8%, GAA/TTC (12.6%, AAAT/ATTT (6.8%, AAAAT/ATTTT (6.2% and AAAAAT/ATTTTT (1.9%. A total of 200 SSR loci were randomly selected for validation by PCR amplification as EST-SSR markers. Of these, 66 marker primer pairs produced reproducible amplicons that were polymorphic among 31 mung bean accessions selected from diverse geographical locations. The large number of SSR-containing sequences found in this study will be valuable for the construction of a high-resolution genetic linkage maps, association

  17. 菊花EST-SSR分析及标记开发%EST-SSR Analysis and Marker Development for Chrysanthemum morifolium

    Institute of Scientific and Technical Information of China (English)

    万志兵; 陈燕; 闫莹莹; 陈黎

    2013-01-01

    为了开发菊花的分子标记,对7 087条菊花EST进行拼接,得到275个contigs,发现50个SSR位点;在拼接的contigs中SSR平均密度为每2 854.3 bp含有1个SSR.三核苷酸重复基元的SSR类型最多,占总数的50.00%;在二碱基重复中,最主要的优势重复基元是AC和AG;三碱基中CAT和CCA为优势重复基元;四碱基、五碱基重复类型中,(TTTN)n和(ATTTN)n重复基元为对应优势基元;这些优势重复基元中富含碱基A和T,菊花EST序列中高度变异的微卫星(长度>20 bp)约占2.00%.根据得到的菊花EST-SSR,共设计出428对引物,并选取了28对SSR引物对黄山贡菊基因组DNA进行PCR扩增,其中有27对引物扩增成功.%7087 EST of Chrysanthemum morifolium were assembled in order to provide molecular markers, and 275 contigs were obtained. There were 50 microsatellites (SSRs) were detected and averagely there was one SSR locus detected from 2 854. 3 bp of contigs. Trinucleotide repeats were the most abundant repeats (50. 00% ) a-mong these SSR types. As for the composition of microsatellites, AC, AG repeats were the richest motif in dinucle-otide repeats, and CAT, CCA repeats were the most frequent motifs in trinucleotide repeats, whereas (TTTN) n and (ATTTN ) n repeats were dominant in tetra- and penta-nucleotide repeats, respectively. All the dominant repeat motifs for different type of SSRs were rich in A and T alkali bases. In EST of C. morifolium, microsatellites longer than 20 bp accounted for about 00% of the detected SSRs. 428 pairs of primers were designed using Primer 5. 0 and Oligo 6. 0 according to these EST sequences containing SSR. 28 pairs of primers were randomly selected for PCR test with genomic DNA of Huangshan variety of Chrysanthemum morifolium, and 27 primer pairs succeeded in amplification, with successful ratio of 96. 4%.

  18. 虫草属EST-SSR标记系统的建立研究%Study of EST-SSR marker system of Cordyceps

    Institute of Scientific and Technical Information of China (English)

    管俊娇; 虞泓; 解云峰; 左世梅; 马荣锋; 曾文波

    2011-01-01

    Objective: To establish the EST-SSR marker system for Cordyceps by using ESTs of C. Bassiana and C. mUitaris. Method; The ESTs of Cordyceps were downloaded from the public database of NCBI, and the redundant ESTs with low quality were removed. The EST-SSR primers were designed by Sequece Seiner 1.2. And the primers were screened through PAGE-Electrophoresis. Result; The 4 556 non-redundant ESTs which from C. Bassiana with total length of 2 953 173 bp were selected. 718 EST-SSRs distributed in 616 ESTs were totally screened out, accounting for 15. 8%of the non-redundant ESTs. It was discovered that the average distance of EST-SSSR was 1/4 0% bp in EST-SSRs distribution of C. Bassiana. Trinucleotide repeats were the most abundant types with 419 repeated sequences. Regarding to C. Militaris, totally 1 363 non-redundant ESTs were acquired, from which 1 117 EST-SSRs were screened, and rate of SSR sites in ESTs was 81. 95%. The leading motif of SSR was nucleotide A. The 50 pairs of EST-SSR primers were designed according to the ESTs of C. Bassiana,and preliminary test showed the 34 pairs of primers amplified clear fragments,accounting for 68% of all primers. Furthermore, the 39 of the 40 pairs of primers from the ESTs of C. Militaris were found to be amplified as the clear fragments, accounting for 97. 5%. The phylogenetic analysis revealed that different anamorph of Cordyceps spieces were divided into four branches. Conclusion: The EST-SSR of Cordyceps had comparably higher utility value. The EST-SSR markers developed from ESTs of C. Bassiana and C. Militaris had well transferability in Cordyceps. And it was suggested that the EST-SSR markers should be an easy and effective way to assay molecular genetic structure of Cordyceps.%目的:通过球孢虫草、蛹虫草EST设计EST-SSR引物,建立虫草属EST-SSR标记系统.方法:从NCBI公共数据库下载获得虫草EST,利用Sequece Seiners 1.2软件去除冗余序列并设计引物,进行PAGE电泳.结果:通过去

  19. Development and Characterization of Polymorphic EST-SSR and Genomic SSR Markers for Tibetan Annual Wild Barley

    OpenAIRE

    Mian Zhang; Weihua Mao; Guoping Zhang; Feibo Wu

    2014-01-01

    Tibetan annual wild barley is rich in genetic variation. This study was aimed at the exploitation of new SSRs for the genetic diversity and phylogenetic analysis of wild barley by data mining. We developed 49 novel EST-SSRs and confirmed 20 genomic SSRs for 80 Tibetan annual wild barley and 16 cultivated barley accessions. A total of 213 alleles were generated from 69 loci with an average of 3.14 alleles per locus. The trimeric repeats were the most abundant motifs (40.82%) among the EST-SSRs...

  20. Development of EST-SSR markers in Barringtonia racemosa (Lecythidaceae) and cross-amplification in related species 1

    OpenAIRE

    Xie, Hongxian; Yuan, Yang; Fang, Xiaoting; Liu, Ying; Yang, Chao; Jin, Jianhua; Tan, Fengxiao; Huang, Yelin

    2015-01-01

    Premise of the study: Microsatellite markers were identified and characterized to study the genetic diversity and structure of Barringtonia racemosa (Lecythidaceae). Methods and Results: Based on the transcriptome data of B. racemosa, 30 primer pairs were initially designed and tested, of which 15 were successfully amplified and displayed clear polymorphisms across the 43 individuals from three distant populations tested in the study. The results showed that the number of alleles per locus ra...

  1. Cross-Species, Amplifiable EST-SSR Markers for Amentotaxus Species Obtained by Next-Generation Sequencing

    Directory of Open Access Journals (Sweden)

    Chiuan-Yu Li

    2016-01-01

    Full Text Available Amentotaxus, a genus of Taxaceae, is an ancient lineage with six relic and endangered species. Four Amentotaxus species, namely A. argotaenia, A. formosana, A. yunnanensis, and A. poilanei, are considered a species complex because of their morphological similarities. Small populations of these species are allopatrically distributed in Asian forests. However, only a few codominant markers have been developed and applied to study population genetic structure of these endangered species. In this study, we developed and characterized polymorphic expressed sequence tag-simple sequence repeats (EST-SSRs from the transcriptome of A. formosana. We identified 4955 putative EST-SSRs from 68,281 unigenes as potential molecular markers. Twenty-six EST-SSRs were selected for estimating polymorphism and transferability among Amentotaxus species, of which 23 EST-SSRs were polymorphic within Amentotaxus species. Among these, the number of alleles ranged from 1–4, the polymorphism information content ranged from 0.000–0.692, and the observed and expected heterozygosity were 0.000–1.000 and 0.080–0.740, respectively. Population genetic structure analyses confirmed that A. argotaenia and A. formosana were separate species and A. yunnanensis and A. poilanei were the same species. These novel EST-SSRs can facilitate further population genetic structure research of Amentotaxus species.

  2. Breeding strains of Panax notoginseng by using EST-SSR markers%EST-SSR标记对三七选育品系的研究

    Institute of Scientific and Technical Information of China (English)

    张金渝; 杨维泽; 崔秀明; 虞泓; 金航; 陈中坚; 沈涛

    2011-01-01

    目的:通过对不同三七选育品系的遗传变异和遗传分化程度进行分析比较,为三七的品种选育提供理论依据.方法:利用自行设计和他人开发的17对EST-SSR引物,对来自4个不同区域的17份三七选育品系进行遗传多样性及遗传分化分析.结果:在17份三七选育品系中一共扩增出136个多态位点,平均多态信息量PIC值为0.78,Nei's基因多样性H0.139,Shannon多样性指数I0.208.选育品系间的遗传分化系数为0.382,遗传相似度和聚类分析的结果表明17份三七选育品系和屏边三七被划分为4个大类群,其中17份三七选育品系被分为3个类群,屏边三七单独在一个类群.结论:通过集团选择后,从相同栽培居群内筛选出的不同品系存在一定程度的遗传分化,可以用EST-SSR标记来检测集团选择的结果.%Objective: To comparatively determine the genetic variation and differentiation of different breeding strains of Panax notoginseng for providing the basic information for genetic breeding. Method: The genetic diversity and genetic structure of the 17 breeding strains of P. notoginseng were assayed by using EST-SSR molecular marker. Result: A total of 136 polymorphic loci of ESTSSR were detected in the 17 breeding strains of P. notoginseng, with the PIC ( polymorphism information content) being 0. 78, H ( the gene diversity within population) being 0. 139, the I ( the Shannon's information index) being 0.208. Gat ( coefficient of gene differentiation) was 0. 382 among the 17 strains. The cluster analysis of genetic similarity showed that the 17 strains of P. notoginseng and P. stipuleanatus were classified into 4 groups, while the 17 strains of P. notoginseng were classified into three subgroups. Conclusion: The genetic differentiation was detected among the 17 strains of P. notoginseng from the same cultivation population by bulk selecting. And it was feasible to detect the effect of bulk selection by EST-SSR markets.

  3. Development and Utilization of EST-SSR Marker in Sugarbeet (Beta vulgaris L.)%甜菜EST-SSR引物的开发与应用

    Institute of Scientific and Technical Information of China (English)

    史树德; 魏磊; 张子义; 邵金旺; 田自华

    2011-01-01

    利用NCBI公共数据库现有的甜菜(Beta vulgaris L.)表达序列标签(expressed sequence tags,EST)数据信息,开发了甜菜EST-SSR标记.在所有的29830条甜菜EST序列中共确认得到20109条非冗余EST序列,总长为11287.6kb.在含有微卫星重复的6951条EST序列中按照SSR引物设计要求,最终获得了2845个EST-SSR,平均每3.96 kb含有1个SSR.EST-SSR的分布频率和特征分析表明,A/T单碱基重复最多,其次是AAG/CTT三核苷酸重复,AG/CT二核苷酸重复,ACCTCC/AGGTGG等六核苷酸重复最少.随机合成了100对SSR引物,并分别选用6个甜菜品种进行多态性检验,将其按遗传相似性分为两组,多态信息含量(polymorphism information content,PIC)平均值为0.47.本研究证实这种全新的开发甜菜SSR标记的方法具有高效、多态性较高的特点,在甜菜遗传多样性分析、功能基因定位、遗传图谱构建以及比较基因组等研究方面有广阔的利用前景.

  4. MannDB: A microbial annotation database for protein characterization

    Energy Technology Data Exchange (ETDEWEB)

    Zhou, C; Lam, M; Smith, J; Zemla, A; Dyer, M; Kuczmarski, T; Vitalis, E; Slezak, T

    2006-05-19

    MannDB was created to meet a need for rapid, comprehensive automated protein sequence analyses to support selection of proteins suitable as targets for driving the development of reagents for pathogen or protein toxin detection. Because a large number of open-source tools were needed, it was necessary to produce a software system to scale the computations for whole-proteome analysis. Thus, we built a fully automated system for executing software tools and for storage, integration, and display of automated protein sequence analysis and annotation data. MannDB is a relational database that organizes data resulting from fully automated, high-throughput protein-sequence analyses using open-source tools. Types of analyses provided include predictions of cleavage, chemical properties, classification, features, functional assignment, post-translational modifications, motifs, antigenicity, and secondary structure. Proteomes (lists of hypothetical and known proteins) are downloaded and parsed from Genbank and then inserted into MannDB, and annotations from SwissProt are downloaded when identifiers are found in the Genbank entry or when identical sequences are identified. Currently 36 open-source tools are run against MannDB protein sequences either on local systems or by means of batch submission to external servers. In addition, BLAST against protein entries in MvirDB, our database of microbial virulence factors, is performed. A web client browser enables viewing of computational results and downloaded annotations, and a query tool enables structured and free-text search capabilities. When available, links to external databases, including MvirDB, are provided. MannDB contains whole-proteome analyses for at least one representative organism from each category of biological threat organism listed by APHIS, CDC, HHS, NIAID, USDA, USFDA, and WHO. MannDB comprises a large number of genomes and comprehensive protein sequence analyses representing organisms listed as high

  5. 木薯基因组SSR和EST-SSR在麻疯树和橡胶树中的通用性分析%Transferability Analysis of Cassava EST-SSR and Genomic-SSR Markers in Jatropha and Rubber Tree

    Institute of Scientific and Technical Information of China (English)

    文明富; 陈新; 王海燕; 卢诚; 王文泉

    2011-01-01

    利用木薯的419对EST-SSR引物和182对基因组SSR引物在5个麻疯树品系和2个橡胶树品系中进行通用性分析.结果显示,木薯EST-SSR在麻疯树和橡胶树中的通用性比例分别为55.85%和38.90%,而木薯基因组SSR在麻疯树和橡胶树中的通用性比例分别为37.36%和26.37%.由此推测,EST-SSR的通用性高于基因组SSR.此外,木薯EST-SSR和基因组SSR的通用件在麻疯树中高于在橡胶树中.本研究发掘的通用性SSR引物可以用于木薯、麻疯树和橡胶树间的比较作网、基因发掘和QTL定位研究.%Euphorbiaceae family includes abundant economic species, such as rubber tree, cassava, castor bean, and Jatropha.Cassava (Manihot esculenta Crantz) ranks in the sixth food crop in the world.In China, cassava is also an important tropical economic crop.The genomic-SSRs derived from cassava genome, and EST-SSRs derived from expressed sequence tags (ESTs).In this study, the transferability of 419 pairs of EST-SSR primer and 182 pairs of genomic-SSR primer from cassava was tested in five Jatropha lines and two rubber tree lines.The results showed that the transferability rate of cassava EST-SSR in Jatropha and rubber tree was 55.85% and 38.90%, and the transferability rate of cassava genomic-SSR in Jatropha and rubber tree was 37.36% and 26.37%, respectively.The transferability EST-SSR was higher for cssava than that of genomic-SSR.Besides, the transferability of cassava EST-SSR and genomic-SSR was higher in Jatropha than in rubber tree.These results suggested that the cassava SSR can be used for comparative mapping, gene tagging and QTL mapping among cassava, Jatropha, and rubber tree.

  6. Transferability analysis of EST-SSR markers of Castanea mollissima to Castanopsis fargesii%中国板栗EST-SSR分子标记在栲树中的通用性分析

    Institute of Scientific and Technical Information of China (English)

    李春; 孙晔

    2012-01-01

    Simple sequence repeats(SSR),also known as microsatellite molecular markers,have been cross-amplified successfully in closely related species of the same genus and even across genera within the same family. In the present study, 124 primer pairs of polymorphic EST-SSR originally developed from Castanea mollissima were cross-amplified in Castanopsis fargesii. Results indicated that 42. 7% of Castanea mollssima EST-SSR primers were successfully cross-amplified in Castanopsis far gesii and 56. 6% were polymorphic. The genetic diversity of 4 populations of C. fargesii were investigated with polymorphic EST-SSRs, preliminary results showed that C. fargesii possessed high levels of genetic diversity(Na=6.105,Ho=0. 563,He-0.621). These polymorphic EST-SSR primers would provide a powerful tool for further investigation on population genetics of C. fargesii.%简单重复序列也称为微卫星分子标记,不仅在同属近缘种间具有良好的通用性,甚至在近缘属间也具有一定的通用性.本研究利用壳斗科基因组信息数据库中公布的中国板栗124对多态的EST-SSR引物在栲树中进行跨属(栗属到栲属)通用性研究,结果显示中国板栗EST-SSR引物在栲树中通用性和多态性分别为42.7%和56.6%;使用19对多态的EST-SSR引物对4个栲树自然居群的遗传多样性进行初步分析,结果显示栲树自然居群具有较高的遗传多样性(Na=6.105,Ho=0.563,He=0.621).这些引物为栲树群体遗传学的深入研究提供了有力工具.

  7. Exploiting EST databases for the development and characterization of EST-SSR markers in castor bean (Ricinus communis L.)

    OpenAIRE

    Yang Jun-Bo; Tian Bo; Yang Chun; Qiu Lijun; Liu Aizhong

    2010-01-01

    Abstract Background The castor bean (Ricinus communis L.), a monotypic species in the spurge family (Euphorbiaceae, 2n = 20), is an important non-edible oilseed crop widely cultivated in tropical, sub-tropical and temperate countries for its high economic value. Because of the high level of ricinoleic acid (over 85%) in its seed oil, the castor bean seed derivatives are often used in aviation oil, lubricants, nylon, dyes, inks, soaps, adhesive and biodiesel. Due to lack of efficient molecular...

  8. Transcriptome Sequencing Analysis and Development of EST-SSR Markers for Pinus koraiensis%红松转录组SSR分析及EST-SSR标记开发

    Institute of Scientific and Technical Information of China (English)

    张振; 张含国; 莫迟; 张磊

    2015-01-01

    ) breeding programs,lack of co-dominant genetic markers constrained the development of molecular marker assisted breeding. At present,development of SSR markers based on transcriptome data is still an economic and efficient development strategy of DNA molecular markers. In this study,we used high-throughput sequencing technology to develop EST-SSR markers for Korean pine. Distribution patterns of the markers in the transcriptome sequences and their characteristics were analyzed,in order to provide a basis for analysis of SSR diversity and mutation of Korean pine. [Method]A total of 1 757 SSR sites were identified from 41 476 unigenes in Korean pine transcriptome by using SSR searching program. Statistical analyses were conducted for number,distribution and characteristics of the SSR loci. And 101 pairs of SSR primers were designed and synthesized. Agarose electrophoresis was used for initial check and polyacrylamide gel electrophoresis for separation and detection of the polymorphisms of primers. Then amplification products were collected and sequenced for validation. Finally,16 pairs of SSR primers and 6 pairs of fluorescence primers were identified. To study the genetic variation,53 samples of open-pollinated progeny were collected from four seed orchards respectively in Hegang,Linkou,Tieli and Weihe. [Result]The distribution frequency of EST-SSRs ( ratio of the number of SSRs to the total number of unigenes) was 4. 24%,based on the transcriptome sequences. Mononucleotide dinucleotide and trinucleotide repeats were 46. 90% ,17. 12% and 34. 66% of total SSR, respectively. The SSR repeat number of SSR repeat units was between 5 and 24. Twenty-one pairs of primers showed polymorphism among 101 pairs of primer,which accounting for 20. 8% of the total number of primer pairs. By sequencing validation,16 pairs of primers amplified the target sequence. Eighteen alleles were tested from 6 pairs of fluorescence primers. Polymorphic information content ( PIC) was 0. 036 3 - 0. 667 4

  9. Mining and transferability analysis of EST-SSR primers in Kiwifruit (Actinidia spp.)%猕猴桃EST-SSR引物筛选及通用性分析

    Institute of Scientific and Technical Information of China (English)

    廖娇; 黄春辉; 辜青青; 曲雪艳; 徐小彪

    2011-01-01

    ESTs of Actinidia in the NCBI database were downloaded and screened by SSRHunter software, the EST-SSR primers were designed by Primer 5.0, and their transferabilities were analyzed in some Citrus plants. The 97 EST-SSR primers which were deigned were mined by using genomic DNA of Actinidia lijiangensis. The results indicated that the 77 pairs of primer showed amplification, accounting for 79.38%. Twenty pairs of primer selected were detected to PCR for DNAs from 10 A ctinidia varieties, the 17 primer pairs of the 20 showed amplification and polymorphism, accounting for 85%. The transferability of 20 pairs of EST-SSR primers which were randomly selected was explored in 9 Citrus germplasms (Okit-su, Kumquat, Trifoliate Orange, Ponkan, Sour Pummelo, Sweet Orange, Huyou, Owari, Miyagawa). The results showed that the 11 primer pairs of the 20 tested primers had the amplification, accounting for 55%. And the 8 primer pairs showed polymorphism, accounting for 72.7%. The results revealed that the EST-SSR markers in A ctinidia were transferable in some Citrus germplasms.%应用SSRHunter软件对NCBI公共数据库中的猕猴桃EST序列进行筛查,利用Primer5.0软件设计引物并进行筛选,同时对其在部分柑橘类植物的通用性进行分析。以漓江猕猴桃DNA为模板,对所设计的97对EST-SSR引物进行筛选,结果表明其中77对引物能扩增出清晰条带,有效扩增率为79-38%。随机选取20对引物对10份猕猴桃资源进行检测,结果发现有17对引物对所有材料都有扩增产物并呈现出多态性,多态性扩增率为85%。随机应用20对有效EST-SSR引物对兴津、金柑、枳壳、桠柑、酸柚、甜橙、胡柚、尾张、宫川等柑橘类植物基因组DNA进行扩增,结果表明,其中有11对引物在供试材料中有扩增产物,占引物总数的55%:有8对引物的扩增产物具有多态性,扩增率为72.7%。据此,基于猕猴桃EST序列而筛选的SSR

  10. Assessment of Functional EST-SSR Markers (Sugarcane) in Cross-Species Transferability, Genetic Diversity among Poaceae Plants, and Bulk Segregation Analysis

    Science.gov (United States)

    Ul Haq, Shamshad; Kumar, Pradeep; Singh, R. K.; Verma, Kumar Sambhav; Bhatt, Ritika; Sharma, Meenakshi; Kachhwaha, Sumita; Kothari, S. L.

    2016-01-01

    Expressed sequence tags (ESTs) are important resource for gene discovery, gene expression and its regulation, molecular marker development, and comparative genomics. We procured 10000 ESTs and analyzed 267 EST-SSRs markers through computational approach. The average density was one SSR/10.45 kb or 6.4% frequency, wherein trinucleotide repeats (66.74%) were the most abundant followed by di- (26.10%), tetra- (4.67%), penta- (1.5%), and hexanucleotide (1.2%) repeats. Functional annotations were done and after-effect newly developed 63 EST-SSRs were used for cross transferability, genetic diversity, and bulk segregation analysis (BSA). Out of 63 EST-SSRs, 42 markers were identified owing to their expansion genetics across 20 different plants which amplified 519 alleles at 180 loci with an average of 2.88 alleles/locus and the polymorphic information content (PIC) ranged from 0.51 to 0.93 with an average of 0.83. The cross transferability ranged from 25% for wheat to 97.22% for Schlerostachya, with an average of 55.86%, and genetic relationships were established based on diversification among them. Moreover, 10 EST-SSRs were recognized as important markers between bulks of pooled DNA of sugarcane cultivars through BSA. This study highlights the employability of the markers in transferability, genetic diversity in grass species, and distinguished sugarcane bulks. PMID:27340568

  11. Assessment of Functional EST-SSR Markers (Sugarcane) in Cross-Species Transferability, Genetic Diversity among Poaceae Plants, and Bulk Segregation Analysis.

    Science.gov (United States)

    Ul Haq, Shamshad; Kumar, Pradeep; Singh, R K; Verma, Kumar Sambhav; Bhatt, Ritika; Sharma, Meenakshi; Kachhwaha, Sumita; Kothari, S L

    2016-01-01

    Expressed sequence tags (ESTs) are important resource for gene discovery, gene expression and its regulation, molecular marker development, and comparative genomics. We procured 10000 ESTs and analyzed 267 EST-SSRs markers through computational approach. The average density was one SSR/10.45 kb or 6.4% frequency, wherein trinucleotide repeats (66.74%) were the most abundant followed by di- (26.10%), tetra- (4.67%), penta- (1.5%), and hexanucleotide (1.2%) repeats. Functional annotations were done and after-effect newly developed 63 EST-SSRs were used for cross transferability, genetic diversity, and bulk segregation analysis (BSA). Out of 63 EST-SSRs, 42 markers were identified owing to their expansion genetics across 20 different plants which amplified 519 alleles at 180 loci with an average of 2.88 alleles/locus and the polymorphic information content (PIC) ranged from 0.51 to 0.93 with an average of 0.83. The cross transferability ranged from 25% for wheat to 97.22% for Schlerostachya, with an average of 55.86%, and genetic relationships were established based on diversification among them. Moreover, 10 EST-SSRs were recognized as important markers between bulks of pooled DNA of sugarcane cultivars through BSA. This study highlights the employability of the markers in transferability, genetic diversity in grass species, and distinguished sugarcane bulks. PMID:27340568

  12. Assessment of Functional EST-SSR Markers (Sugarcane in Cross-Species Transferability, Genetic Diversity among Poaceae Plants, and Bulk Segregation Analysis

    Directory of Open Access Journals (Sweden)

    Shamshad Ul Haq

    2016-01-01

    Full Text Available Expressed sequence tags (ESTs are important resource for gene discovery, gene expression and its regulation, molecular marker development, and comparative genomics. We procured 10000 ESTs and analyzed 267 EST-SSRs markers through computational approach. The average density was one SSR/10.45 kb or 6.4% frequency, wherein trinucleotide repeats (66.74% were the most abundant followed by di- (26.10%, tetra- (4.67%, penta- (1.5%, and hexanucleotide (1.2% repeats. Functional annotations were done and after-effect newly developed 63 EST-SSRs were used for cross transferability, genetic diversity, and bulk segregation analysis (BSA. Out of 63 EST-SSRs, 42 markers were identified owing to their expansion genetics across 20 different plants which amplified 519 alleles at 180 loci with an average of 2.88 alleles/locus and the polymorphic information content (PIC ranged from 0.51 to 0.93 with an average of 0.83. The cross transferability ranged from 25% for wheat to 97.22% for Schlerostachya, with an average of 55.86%, and genetic relationships were established based on diversification among them. Moreover, 10 EST-SSRs were recognized as important markers between bulks of pooled DNA of sugarcane cultivars through BSA. This study highlights the employability of the markers in transferability, genetic diversity in grass species, and distinguished sugarcane bulks.

  13. 基于复合EST-SSR标记的大白菜种子纯度鉴定及SNP位点获取%Purity Identification and SNP Site Obtain of Chinese Cabbage Hybrids Using Multiplex EST-SSR Marker

    Institute of Scientific and Technical Information of China (English)

    赵新; 王永; 兰青阔; 贺长征; 陈锐; 李欧静; 刘娜; 朱珠; 郭永泽

    2013-01-01

      In this research,10 Chinese cabbage〔Brassica campestris L. ssp. pekinensis(Lour) Olsson〕hybrids and their parents were taken as study objects.Based on the EST sequences from NCBI genome data base,a SNP site and multiplex EST-SSR sites,which could be used to identify the purity of Chinese cabbage hybrids,were screened by SSR marker and high resolution melting(HRM) technology.According to the requirement of seed purity identification by high throughput molecular technology,the key techniques such as seedling culture condition of single seed,extraction status of single grain plant, rapid DNA extraction method and establishment of multiplex PCR system were groped and optimized.Results showed that the single germinating seed after 48 h could gain DNA with higher quality by the improved CTAB method in 3 h. Meanwhile,plant leaves reached seeding stage after 7 d,could complete DNA extraction by chexe-100 method within 40 min.The SNP site obtained by screening could be used to indentify the purity of 6 Chinese cabbage hybrids,and the multiplex EST-SSR locus could be used to indentify the purity of 10 Chinese cabbage hybrids.%  以10份大白菜杂交种及其亲本为研究对象,基于NCBI基因组数据库中EST序列信息,应用SSR标记及高分辨率溶解曲线技术筛选出用于大白菜杂交种纯度鉴定的SNP位点及复合SSR位点,并根据高通量分子检测技术种子纯度鉴定的需要,对单粒种子育苗条件、单株植株提取状态、快速DNA提取方法、复合PCR体系的建立等关键技术进行了摸索及优化。结果显示:经过48 h破壳状态的单粒种子,采用改良CTAB法可于3 h内得到较高质量的DNA;同时经7 d达到苗期状态的单株叶片,采用chexe-100法亦可于40 min内完成DNA提取;筛选得到的SNP位点可对6份大白菜杂交种进行纯度鉴定,复合SSR位点可对10份大白菜杂交种进行纯度鉴定。

  14. EST-SSR Based Genetic Diversity Analysis on Salt Tolerant Plants from Six Species in Chenopodiaceae%藜科6种耐盐植物遗传多样性的EST-SSR分析

    Institute of Scientific and Technical Information of China (English)

    徐照龙; 易金鑫; 余桂红; 张大勇; 何晓兰; 王秀娥; 马鸿翔

    2011-01-01

    利用EST-SSR标记分析了藜科6种耐盐植物的遗传基础和遗传多样性,以期为藜科耐盐植物遗传育种提供快速、可靠的分子标记辅助选择工具.采用31对藜科海蓬子属和碱蓬属的EST-SSR引物对藜科6种植物进行PCR扩增,其中16对引物得到较好扩增,引物通用率为51.6%,共检测到18个多态性位点,每位点等位基因数2~4个,多态性丰富.进一步采用Nei's遗传距离聚类分析表明6种植物可以分为3组,主成分分析也支持上述分组,而且DY529957、DY529903和DY5298853个EST在分组中贡献率最高.经与GenBank中序列相似性比对,前两者分别编码生长素抑制蛋白(Auxin-repressed protein,ARP)和植物防御素(Defensins,Def),都参与植物逆境胁迫响应,但分属于不同代谢途径;后者则编码未知蛋白.总体而言,16对SSR引物在藜科6种植物间具有较好的通用性,能够揭示该6种植物间广泛的遗传多样性,及其存在不同耐盐机制提供分子证据.%This report focus on EST-SSR based evaluation of genetic diversity in salt tolerant plant from six species in Chenopodiaceae. Thirty-one pairs of EST-SSR primers were designed according to ESTs sequence collected from Salicornia and Suaeda genera. Only sixteen out of all primer pairs successfully amplified the DNA fragments by using PCR procedure across all samples, which demonstrated 51.6% over all primers was transferrable. Total 18 polymorphic loci were detected by the 16 primer pairs, and allele number at each locus ranged from 2 to 4, indicating a wide range of genetic diversity. Clusterring analysis based on Nei's genetic distance showed that the six plants could be grouped into three clades, and the division was confirmed by principal component analysis. Moreover, this grouping profile was mainly attributed to polymorphism of three ESTs, e. g. DY529957, DY529903 and DY529885.According to the sequence similarity, the three ESTs were assumed to encode an auxin

  15. 中国50个甘蓝代表品种EST-SSR指纹图谱的构建%EST-SSR Fingerprinting of Fifty Cabbage Representative Varieties from China

    Institute of Scientific and Technical Information of China (English)

    王庆彪; 张扬勇; 庄木; 杨丽梅; 刘玉梅; 吕红豪; 方智远

    2014-01-01

    [Objective]In this study, the method of cabbage DNA fingerprint was drawn, and fifty cabbage representative varieties from China were fingerprinted with EST-SSR primers to provide reference for variety distinctness, authenticity, and purity identification. [Method]First, EST-SSR primers were screened by using the technologies of 6% denaturing polyacrylamide gel electrophoresis and six cabbage varieties that come from different ecogeography. The length of amplified fragments was detected on DNA Analyzer platform using four fluorescent-labels (TAMRA, HEX, ROX and 6-FAM) in 5′end of forward primer, and then defined the reference variety for every alleles. Total twenty core primers were used to establish fifty cabbage varieties SSR fingerprinting, and for ‘Zhonggan 21’ variety identification. [Result] Six cabbage varieties of different resources were used to screen 978 EST-SSR primers, out of 128 polymorphic primers were obtained according to the PCR bands stability, high polymorphism information content (PIC), easy discrimination of different alleles and even distribution of molecular markers on each chromosome, and 20 pairs of primers were selected to detect a total of 58 alleles at 20 loci, with 2.22 loci per chromosome and 2.9 alleles per locus on average. The PIC values varied among the primers ranging from 0.34 to 0.76. The length of amplified fragment varied in the range of 143-296 bp. The maximum number of alleles for each primer pairs of BoE607 and BoE723 was five. Fingerprinting database of 50 cabbage representative varieties from China was established with 20 pairs of core primers. The authenticity of‘Zhonggan 21’ was identified by artificial simulated population and these results were identical with that made from field investigation.[Conclusion]Twenty pairs of core primers were selected and used to establish DNA fingerprint database of 50 cabbage representative varieties from China, and the authenticity of‘Zhonggan 21’ was identified by

  16. Discovery and Characterization of Chromatin States for Systematic Annotation of the Human Genome

    Science.gov (United States)

    Ernst, Jason; Kellis, Manolis

    A plethora of epigenetic modifications have been described in the human genome and shown to play diverse roles in gene regulation, cellular differentiation and the onset of disease. Although individual modifications have been linked to the activity levels of various genetic functional elements, their combinatorial patterns are still unresolved and their potential for systematic de novo genome annotation remains untapped. Here, we use a multivariate Hidden Markov Model to reveal chromatin states in human T cells, based on recurrent and spatially coherent combinations of chromatin marks.We define 51 distinct chromatin states, including promoter-associated, transcription-associated, active intergenic, largescale repressed and repeat-associated states. Each chromatin state shows specific enrichments in functional annotations, sequence motifs and specific experimentally observed characteristics, suggesting distinct biological roles. This approach provides a complementary functional annotation of the human genome that reveals the genome-wide locations of diverse classes of epigenetic function.

  17. Annotated English

    CERN Document Server

    Hernandez-Orallo, Jose

    2010-01-01

    This document presents Annotated English, a system of diacritical symbols which turns English pronunciation into a precise and unambiguous process. The annotations are defined and located in such a way that the original English text is not altered (not even a letter), thus allowing for a consistent reading and learning of the English language with and without annotations. The annotations are based on a set of general rules that make the frequency of annotations not dramatically high. This makes the reader easily associate annotations with exceptions, and makes it possible to shape, internalise and consolidate some rules for the English language which otherwise are weakened by the enormous amount of exceptions in English pronunciation. The advantages of this annotation system are manifold. Any existing text can be annotated without a significant increase in size. This means that we can get an annotated version of any document or book with the same number of pages and fontsize. Since no letter is affected, the ...

  18. Characterization of Liaoning cashmere goat transcriptome: sequencing, de novo assembly, functional annotation and comparative analysis.

    Directory of Open Access Journals (Sweden)

    Hongliang Liu

    Full Text Available BACKGROUND: Liaoning cashmere goat is a famous goat breed for cashmere wool. In order to increase the transcriptome data and accelerate genetic improvement for this breed, we performed de novo transcriptome sequencing to generate the first expressed sequence tag dataset for the Liaoning cashmere goat, using next-generation sequencing technology. RESULTS: Transcriptome sequencing of Liaoning cashmere goat on a Roche 454 platform yielded 804,601 high-quality reads. Clustering and assembly of these reads produced a non-redundant set of 117,854 unigenes, comprising 13,194 isotigs and 104,660 singletons. Based on similarity searches with known proteins, 17,356 unigenes were assigned to 6,700 GO categories, and the terms were summarized into three main GO categories and 59 sub-categories. 3,548 and 46,778 unigenes had significant similarity to existing sequences in the KEGG and COG databases, respectively. Comparative analysis revealed that 42,254 unigenes were aligned to 17,532 different sequences in NCBI non-redundant nucleotide databases. 97,236 (82.51% unigenes were mapped to the 30 goat chromosomes. 35,551 (30.17% unigenes were matched to 11,438 reported goat protein-coding genes. The remaining non-matched unigenes were further compared with cattle and human reference genes, 67 putative new goat genes were discovered. Additionally, 2,781 potential simple sequence repeats were initially identified from all unigenes. CONCLUSION: The transcriptome of Liaoning cashmere goat was deep sequenced, de novo assembled, and annotated, providing abundant data to better understand the Liaoning cashmere goat transcriptome. The potential simple sequence repeats provide a material basis for future genetic linkage and quantitative trait loci analyses.

  19. Development of EST-SSR Markers from Jatropha curcas (Euphorbiaceae) and Their Application in Genetic Diversity Analysis among Germplasms%小桐子EST-SSR分子标记的开发与种质遗传多样性分析

    Institute of Scientific and Technical Information of China (English)

    杨春; 刘爱忠

    2011-01-01

    Jatropha curcas ( Euphorbiaceae) has created tremendous interest all over the world for the use of its seed oil as a commercial source of biodiesel. Based on 9843 ESTs available from the developing seeds of Jatropha curcas, we identified 1009 SSRs in 4640 unigenes and developed 11 polymorphic EST-SSR markers which exhibited a low level of genetic diversity among germplasms, I. E. Allele number varied from 2 to 3, with a average of 2. 45; Het-erozygosity (He) ranged from 0.0887-0.5128, with a average of 0.2736; Polymorphic Information Content (PIC) ranged from 0.0847-0.4031, with a average of 0. 2313. Further, we analyzed the genetic relationships among 24 germplasms collected from different areas in southern China, northern Vietnam, and India using the 11 EST-SSR markers. The results showed that there was no a geographic pattern of genotypes across the collection areas of Jatropha curcas. The EST-SSR markers developed in current study is useful for both genetic diversity analysis and identification of genetic relationships among germplasms in Jatropha curcas.%小桐子(Jatropha curcas)适应性强,不择土壤,种子油脂性能适宜生物柴油的生产,是重要的生物柴油植物.基于小桐子种子发育过程中的EST序列,采用生物信息学方法,从4640个EST非冗余序列上鉴别了1009个SSR位点并分析其分布特征;开发了11对多态的EST-SSR分子标记,并利用这些分子标记调查了24个不同地理种源的遗传多样性,从每个位点的等位基因数目(2~3,平均为2.45)、期望杂合度(He为0.0887 ~0.5128,平均是0.2736)、多态信息含量(PIC为0.0847~0.4031,平均是0.2313)等方面反映了小桐子种质的遗传多样性低.进一步分析显示不同地理种源的遗传关系缺乏明显的地理结构.作者开发的EST-SSR分子标记不仅有助于小桐子种质的遗传多样性研究,也有助于小桐子种质间的遗传关系鉴别.

  20. Annotated Answer Set Programming

    OpenAIRE

    Straccia, Umberto

    2005-01-01

    We present Annotated Answer Set Programming, that extends the ex pressive power of disjunctive logic programming with annotation terms, taken from the generalized annotated logic programming framework.

  1. Development and Characterization of Microsatellite Markers from the Transcriptome of Firmiana danxiaensis (Malvaceae s.l.

    Directory of Open Access Journals (Sweden)

    Qiang Fan

    2013-11-01

    Full Text Available Premise of the study: Firmiana consists of 12–16 species, many of which are narrow endemics. Expressed sequence tag (EST–simple sequence repeat (SSR markers were developed and characterized for size polymorphism in four Firmiana species. Methods and Results: A total of 102 EST-SSR primer pairs were designed based on the transcriptome sequences of F. danxiaensis; these were then characterized in four Firmiana species—F. danxiaensis, F. kwangsiensis, F. hainanensis, and F. simplex. In these four species, 17 primer pairs were successfully amplified, and 14 were polymorphic in at least one species. The number of alleles ranged from one to 13, and the observed and expected heterozygosities ranged from 0 to 1 and 0 to 0.925, respectively. The lowest level of polymorphism was observed in F. danxiaensis. Conclusions: These polymorphic EST-SSR markers are valuable for conservation genetics studies in the endangered Firmiana species.

  2. Modeling Loosely Annotated Images with Imagined Annotations

    CERN Document Server

    Tang, Hong; Chen, Yunhao

    2008-01-01

    In this paper, we present an approach to learning latent semantic analysis models from loosely annotated images for automatic image annotation and indexing. The given annotation in training images is loose due to: (1) ambiguous correspondences between visual features and annotated keywords; (2) incomplete lists of annotated keywords. The second reason motivates us to enrich the incomplete annotation in a simple way before learning topic models. In particular, some imagined keywords are poured into the incomplete annotation through measuring similarity between keywords. Then, both given and imagined annotations are used to learning probabilistic topic models for automatically annotating new images. We conduct experiments on a typical Corel dataset of images and loose annotations, and compare the proposed method with state-of-the-art discrete annotation methods (using a set of discrete blobs to represent an image). The proposed method improves word-driven probability Latent Semantic Analysis (PLSA-words) up to ...

  3. Annotated bibliography

    International Nuclear Information System (INIS)

    Under a cooperative agreement with the U.S. Department of Energy's Office of Science and Technology, Waste Policy Institute (WPI) is conducting a five-year research project to develop a research-based approach for integrating communication products in stakeholder involvement related to innovative technology. As part of the research, WPI developed this annotated bibliography which contains almost 100 citations of articles/books/resources involving topics related to communication and public involvement aspects of deploying innovative cleanup technology. To compile the bibliography, WPI performed on-line literature searches (e.g., Dialog, International Association of Business Communicators Public Relations Society of America, Chemical Manufacturers Association, etc.), consulted past years proceedings of major environmental waste cleanup conferences (e.g., Waste Management), networked with professional colleagues and DOE sites to gather reports or case studies, and received input during the August 1996 Research Design Team meeting held to discuss the project's research methodology. Articles were selected for annotation based upon their perceived usefulness to the broad range of public involvement and communication practitioners

  4. Characterization and Development of EST-SSRs by Deep Transcriptome Sequencing in Chinese Cabbage (Brassica rapa L. ssp. pekinensis

    Directory of Open Access Journals (Sweden)

    Qian Ding

    2015-01-01

    Full Text Available Simple sequence repeats (SSRs are among the most important markers for population analysis and have been widely used in plant genetic mapping and molecular breeding. Expressed sequence tag-SSR (EST-SSR markers, located in the coding regions, are potentially more efficient for QTL mapping, gene targeting, and marker-assisted breeding. In this study, we investigated 51,694 nonredundant unigenes, assembled from clean reads from deep transcriptome sequencing with a Solexa/Illumina platform, for identification and development of EST-SSRs in Chinese cabbage. In total, 10,420 EST-SSRs with over 12 bp were identified and characterized, among which 2744 EST-SSRs are new and 2317 are known ones showing polymorphism with previously reported SSRs. A total of 7877 PCR primer pairs for 1561 EST-SSR loci were designed, and primer pairs for twenty-four EST-SSRs were selected for primer evaluation. In nineteen EST-SSR loci (79.2%, amplicons were successfully generated with high quality. Seventeen (89.5% showed polymorphism in twenty-four cultivars of Chinese cabbage. The polymorphic alleles of each polymorphic locus were sequenced, and the results showed that most polymorphisms were due to variations of SSR repeat motifs. The EST-SSRs identified and characterized in this study have important implications for developing new tools for genetics and molecular breeding in Chinese cabbage.

  5. 不同来源 SSR 和 EST-SSR 在披碱草属和鹅观草属物种中的通用性分析%The transferability of SSR and EST-SSR markers of different origins in Elymus and Roegneria in the Triticeae (Poaceae)

    Institute of Scientific and Technical Information of China (English)

    陈仕勇; 马啸; 张新全; 陈智华; 周凯

    2016-01-01

    important forage grasses,and also carry elite genes for improving cereal crops.However,there is ongoing dispute on the species boundaries and interspecific systematic relationships in the two genera.The biosystematic relationships in the group are of keen interest to agrostologists around the world.Based on known SSR markers in Triticeae,theobjective of this study was to screen out the high transferability markers for Elymus and Roegneria,which would provide important information for understanding the biosystematic relationships of the two genera.A to-tal of 230 simple sequence repeats (SSR)and SSR based on expressed sequence tags (EST -SSR)markers from 5 different genera including wheat,barley,Elymus ,Pseudoroegneria and Leymus ,were used to study the transferability to 23 species containing the St,H,and Y genomes.Among the 230 SSR markers,163 (70.87%)markers could generate clear bands,which showed a high transferability for those markers.The EST-SSR markers (87.60%)showed a higher transferability than genomic SSR markers (49.50%),but the genomic SSR markers showed a higher polymorphism (85.98%)than the EST-SSR markers (79.37%).A total of 579 bands were amplified from the 163 SSR markers able to generate clear bands,of which 533 bands were polymorphic.The number of amplified bands of each of these 163 SSR markers ranged from 1 to 11 with an average of 3.55 bands.Cluster analysis using the unweighted pair group method with arithmetic average (UPGMA)showed that the species with same or similar genome could be grouped together.Additionally,the H genome showed a rather distant phylogenetic relationship with St genome,and distant phylogenetic relation-ships were also revealed between the diploid species containing the H genome and other species in the study. The selected SSR markers from 5 genera could be amplified successfully in Elymus and Roegneria.To sum-marise,a total of 163 high transferability SSR markers were found to be suitable for the further phylogeny a

  6. Annotations for Intersection Typechecking

    Directory of Open Access Journals (Sweden)

    Joshua Dunfield

    2013-07-01

    Full Text Available In functional programming languages, the classic form of annotation is a single type constraint on a term. Intersection types add complications: a single term may have to be checked several times against different types, in different contexts, requiring annotation with several types. Moreover, it is useful (in some systems, necessary to indicate the context in which each such type is to be used. This paper explores the technical design space of annotations in systems with intersection types. Earlier work (Dunfield and Pfenning 2004 introduced contextual typing annotations, which we now tease apart into more elementary mechanisms: a "right hand" annotation (the standard form, a "left hand" annotation (the context in which a right-hand annotation is to be used, a merge that allows for multiple annotations, and an existential binder for index variables. The most novel element is the left-hand annotation, which guards terms (and right-hand annotations with a judgment that must follow from the current context.

  7. Computing human image annotation.

    Science.gov (United States)

    Channin, David S; Mongkolwat, Pattanasak; Kleper, Vladimir; Rubin, Daniel L

    2009-01-01

    An image annotation is the explanatory or descriptive information about the pixel data of an image that is generated by a human (or machine) observer. An image markup is the graphical symbols placed over the image to depict an annotation. In the majority of current, clinical and research imaging practice, markup is captured in proprietary formats and annotations are referenced only in free text radiology reports. This makes these annotations difficult to query, retrieve and compute upon, hampering their integration into other data mining and analysis efforts. This paper describes the National Cancer Institute's Cancer Biomedical Informatics Grid's (caBIG) Annotation and Image Markup (AIM) project, focusing on how to use AIM to query for annotations. The AIM project delivers an information model for image annotation and markup. The model uses controlled terminologies for important concepts. All of the classes and attributes of the model have been harmonized with the other models and common data elements in use at the National Cancer Institute. The project also delivers XML schemata necessary to instantiate AIMs in XML as well as a software application for translating AIM XML into DICOM S/R and HL7 CDA. Large collections of AIM annotations can be built and then queried as Grid or Web services. Using the tools of the AIM project, image annotations and their markup can be captured and stored in human and machine readable formats. This enables the inclusion of human image observation and inference as part of larger data mining and analysis activities. PMID:19964202

  8. Personnalisation de Syst\\`emes OLAP Annot\\'es

    CERN Document Server

    Jerbi, Houssem; Ravat, Franck; Teste, Olivier

    2010-01-01

    This paper deals with personalization of annotated OLAP systems. Data constellation is extended to support annotations and user preferences. Annotations reflect the decision-maker experience whereas user preferences enable users to focus on the most interesting data. User preferences allow annotated contextual recommendations helping the decision-maker during his/her multidimensional navigations.

  9. Creating Annotation Tools with the Annotation Graph Toolkit

    OpenAIRE

    Maeda, Kazuaki; Bird, Steven; Ma, Xiaoyi; Lee, Haejoong

    2002-01-01

    The Annotation Graph Toolkit is a collection of software supporting the development of annotation tools based on the annotation graph model. The toolkit includes application programming interfaces for manipulating annotation graph data and for importing data from other formats. There are interfaces for the scripting languages Tcl and Python, a database interface, specialized graphical user interfaces for a variety of annotation tasks, and several sample applications. This paper describes all ...

  10. An Introduction to Genome Annotation.

    Science.gov (United States)

    Campbell, Michael S; Yandell, Mark

    2015-12-17

    Genome projects have evolved from large international undertakings to tractable endeavors for a single lab. Accurate genome annotation is critical for successful genomic, genetic, and molecular biology experiments. These annotations can be generated using a number of approaches and available software tools. This unit describes methods for genome annotation and a number of software tools commonly used in gene annotation.

  11. On Anomalies in Annotation Systems

    CERN Document Server

    Brust, Matthias R

    2007-01-01

    Today's computer-based annotation systems implement a wide range of functionalities that often go beyond those available in traditional paper-and-pencil annotations. Conceptually, annotation systems are based on thoroughly investigated psycho-sociological and pedagogical learning theories. They offer a huge diversity of annotation types that can be placed in textual as well as in multimedia format. Additionally, annotations can be published or shared with a group of interested parties via well-organized repositories. Although highly sophisticated annotation systems exist both conceptually as well as technologically, we still observe that their acceptance is somewhat limited. In this paper, we argue that nowadays annotation systems suffer from several fundamental problems that are inherent in the traditional paper-and-pencil annotation paradigm. As a solution, we propose to shift the annotation paradigm for the implementation of annotation system.

  12. Development and characterization of a Psathyrostachys huashanica Keng 7Ns chromosome addition line with leaf rust resistance.

    Directory of Open Access Journals (Sweden)

    Wanli Du

    Full Text Available The aim of this study was to characterize a Triticum aestivum-Psathyrostachys huashanica Keng (2n = 2x = 14, NsNs disomic addition line 2-1-6-3. Individual line 2-1-6-3 plants were analyzed using cytological, genomic in situ hybridization (GISH, EST-SSR, and EST-STS techniques. The alien addition line 2-1-6-3 was shown to have two P. huashanica chromosomes, with a meiotic configuration of 2n = 44 = 22 II. We tested 55 EST-SSR and 336 EST-STS primer pairs that mapped onto seven different wheat chromosomes using DNA from parents and the P. huashanica addition line. One EST-SSR and nine EST-STS primer pairs indicated that the additional chromosome of P. huashanica belonged to homoeologous group 7, the diagnostic fragments of five EST-STS markers (BE404955, BE591127, BE637663, BF482781 and CD452422 were cloned, sequenced and compared. The results showed that the amplified polymorphic bands of P. huashanica and disomic addition line 2-1-6-3 shared 100% sequence identity, which was designated as the 7Ns disomic addition line. Disomic addition line 2-1-6-3 was evaluated to test the leaf rust resistance of adult stages in the field. We found that one pair of the 7Ns genome chromosomes carried new leaf rust resistance gene(s. Moreover, wheat line 2-1-6-3 had a superior numbers of florets and grains per spike, which were associated with the introgression of the paired P. huashanica chromosomes. These high levels of disease resistance and stable, excellent agronomic traits suggest that this line could be utilized as a novel donor in wheat breeding programs.

  13. Semantic annotation of mutable data.

    Directory of Open Access Journals (Sweden)

    Robert A Morris

    Full Text Available Electronic annotation of scientific data is very similar to annotation of documents. Both types of annotation amplify the original object, add related knowledge to it, and dispute or support assertions in it. In each case, annotation is a framework for discourse about the original object, and, in each case, an annotation needs to clearly identify its scope and its own terminology. However, electronic annotation of data differs from annotation of documents: the content of the annotations, including expectations and supporting evidence, is more often shared among members of networks. Any consequent actions taken by the holders of the annotated data could be shared as well. But even those current annotation systems that admit data as their subject often make it difficult or impossible to annotate at fine-enough granularity to use the results in this way for data quality control. We address these kinds of issues by offering simple extensions to an existing annotation ontology and describe how the results support an interest-based distribution of annotations. We are using the result to design and deploy a platform that supports annotation services overlaid on networks of distributed data, with particular application to data quality control. Our initial instance supports a set of natural science collection metadata services. An important application is the support for data quality control and provision of missing data. A previous proof of concept demonstrated such use based on data annotations modeled with XML-Schema.

  14. Algal functional annotation tool

    Energy Technology Data Exchange (ETDEWEB)

    2012-07-12

    Abstract BACKGROUND: Progress in genome sequencing is proceeding at an exponential pace, and several new algal genomes are becoming available every year. One of the challenges facing the community is the association of protein sequences encoded in the genomes with biological function. While most genome assembly projects generate annotations for predicted protein sequences, they are usually limited and integrate functional terms from a limited number of databases. Another challenge is the use of annotations to interpret large lists of 'interesting' genes generated by genome-scale datasets. Previously, these gene lists had to be analyzed across several independent biological databases, often on a gene-by-gene basis. In contrast, several annotation databases, such as DAVID, integrate data from multiple functional databases and reveal underlying biological themes of large gene lists. While several such databases have been constructed for animals, none is currently available for the study of algae. Due to renewed interest in algae as potential sources of biofuels and the emergence of multiple algal genome sequences, a significant need has arisen for such a database to process the growing compendiums of algal genomic data. DESCRIPTION: The Algal Functional Annotation Tool is a web-based comprehensive analysis suite integrating annotation data from several pathway, ontology, and protein family databases. The current version provides annotation for the model alga Chlamydomonas reinhardtii, and in the future will include additional genomes. The site allows users to interpret large gene lists by identifying associated functional terms, and their enrichment. Additionally, expression data for several experimental conditions were compiled and analyzed to provide an expression-based enrichment search. A tool to search for functionally-related genes based on gene expression across these conditions is also provided. Other features include dynamic visualization of genes

  15. Human Genome Annotation

    Science.gov (United States)

    Gerstein, Mark

    A central problem for 21st century science is annotating the human genome and making this annotation useful for the interpretation of personal genomes. My talk will focus on annotating the 99% of the genome that does not code for canonical genes, concentrating on intergenic features such as structural variants (SVs), pseudogenes (protein fossils), binding sites, and novel transcribed RNAs (ncRNAs). In particular, I will describe how we identify regulatory sites and variable blocks (SVs) based on processing next-generation sequencing experiments. I will further explain how we cluster together groups of sites to create larger annotations. Next, I will discuss a comprehensive pseudogene identification pipeline, which has enabled us to identify >10K pseudogenes in the genome and analyze their distribution with respect to age, protein family, and chromosomal location. Throughout, I will try to introduce some of the computational algorithms and approaches that are required for genome annotation. Much of this work has been carried out in the framework of the ENCODE, modENCODE, and 1000 genomes projects.

  16. Algal functional annotation tool

    Energy Technology Data Exchange (ETDEWEB)

    Lopez, D. [UCLA; Casero, D. [UCLA; Cokus, S. J. [UCLA; Merchant, S. S. [UCLA; Pellegrini, M. [UCLA

    2012-07-01

    The Algal Functional Annotation Tool is a web-based comprehensive analysis suite integrating annotation data from several pathway, ontology, and protein family databases. The current version provides annotation for the model alga Chlamydomonas reinhardtii, and in the future will include additional genomes. The site allows users to interpret large gene lists by identifying associated functional terms, and their enrichment. Additionally, expression data for several experimental conditions were compiled and analyzed to provide an expression-based enrichment search. A tool to search for functionally-related genes based on gene expression across these conditions is also provided. Other features include dynamic visualization of genes on KEGG pathway maps and batch gene identifier conversion.

  17. NCBI和cDNA文库中栽培花生EST-SSR分子标记的开发及其特点%Development and Characterization of EST-SSR Markers from NCBI and cDNA Library in Cultivated Peanut (Arachis hypogaea L.)

    Institute of Scientific and Technical Information of China (English)

    王金彦; 潘丽娟; 杨庆利; 禹山林

    2009-01-01

    86 132 ESTs downloaded from GenBank in NCBI and 12 501 ESTs from cDNA library constructed by high-oil linoleic acid accession E 12 were analysed. After the preprocession, there were 18 051 singletons and 9 972 contigs in the GenBank of NCBI and cDNA library. Totally 3 104 SSR loci had been screened by MISA software, accounting for 11.08% for these non-redundant ESTs. All SSR loci are divided into di-nucleotide, thi-nucleotide, tetra-nucleotide, penta-nucleotide, hexa-nucleotide and multi-nucleotide etc., and thi-nucleotide motif is the most motifs and the frequency was 43.0% and 56.8% in NCBI and cDNA libraray, respectively. The number of di-and penta-nucleotide motifs were second and third in all motifs. And the hexa-nucleotide was the least mo-tif both in NCBI and cDNA library. In all repeat motifs nucleotide, AG/TC was the most motifs and accounted for 8.65% and 13.42% in NCBI and cDNA library, respectively. Among the tri-nucleotide repeats, CTT/GAA was the most frequent motif, accounting for 6.7% and 13.42%, respectively. The repeat unit number of SSR loci is from 4 to 51.%本研究利用NCBI的GenBank数据库中公布的花生86 132条EST序列以及利用高油酸品种E12所创建的cDNA文库中的12 501条EST序列,对这些序列进行前期处理,总共获得非冗余且拼接较长的singleton 11 260条,contig 9 972条.通过MISA软件分析发现两个EST库中共包含有3 104个SSR位点,占到总共非冗余序列的11.08%.这些SSR位点被分成二核苷酸重复、三核苷酸重复、四核苷酸重复、五核苷酸重复、六核苷酸重复以及混合核苷酸重复等,其中三核苷酸重复占的比例最多,分别占到NCBI和cDNA文库的43.0%和56.8%,二核苷酸和五核苷酸重复占到所有重复位点的第二位和第三位,六核苷酸重复的比例最少.在所有重复基序中,AG/TC重复的数量最多,分别占到NCBI和cDNA文库的8.65%和13.42%.在三核苷酸重复中,CTT/GAA出现的频率最大,分别占到6.7%和13.42%.所有这些SSR基序的长度在4~51个之间.

  18. Improving microbial genome annotations in an integrated database context.

    Directory of Open Access Journals (Sweden)

    I-Min A Chen

    Full Text Available Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/.

  19. Annotated bibliography traceability

    NARCIS (Netherlands)

    Narain, G.

    2006-01-01

    This annotated bibliography contains summaries of articles and chapters of books, which are relevant to traceability. After each summary there is a part about the relevancy of the paper for the LEI project. The aim of the LEI-project is to gain insight in several aspects of traceability in order to

  20. Collaborative Movie Annotation

    Science.gov (United States)

    Zad, Damon Daylamani; Agius, Harry

    In this paper, we focus on metadata for self-created movies like those found on YouTube and Google Video, the duration of which are increasing in line with falling upload restrictions. While simple tags may have been sufficient for most purposes for traditionally very short video footage that contains a relatively small amount of semantic content, this is not the case for movies of longer duration which embody more intricate semantics. Creating metadata is a time-consuming process that takes a great deal of individual effort; however, this effort can be greatly reduced by harnessing the power of Web 2.0 communities to create, update and maintain it. Consequently, we consider the annotation of movies within Web 2.0 environments, such that users create and share that metadata collaboratively and propose an architecture for collaborative movie annotation. This architecture arises from the results of an empirical experiment where metadata creation tools, YouTube and an MPEG-7 modelling tool, were used by users to create movie metadata. The next section discusses related work in the areas of collaborative retrieval and tagging. Then, we describe the experiments that were undertaken on a sample of 50 users. Next, the results are presented which provide some insight into how users interact with existing tools and systems for annotating movies. Based on these results, the paper then develops an architecture for collaborative movie annotation.

  1. Annotation of Regular Polysemy

    DEFF Research Database (Denmark)

    Martinez Alonso, Hector

    Regular polysemy has received a lot of attention from the theory of lexical semantics and from computational linguistics. However, there is no consensus on how to represent the sense of underspecified examples at the token level, namely when annotating or disambiguating senses of metonymic words...

  2. Intellectuals in China: Annotations.

    Science.gov (United States)

    Parker, Franklin

    This annotated bibliography of 72 books, journal articles, government reports, and newspaper feature stories focuses on the changing role of intellectuals in China, primarily since the 1949 Chinese Revolution. Particular attention is given to the Hundred Flowers Movement of 1957 and the Cultural Revolution. Most of the cited works are in English,…

  3. Annotation: The Savant Syndrome

    Science.gov (United States)

    Heaton, Pamela; Wallace, Gregory L.

    2004-01-01

    Background: Whilst interest has focused on the origin and nature of the savant syndrome for over a century, it is only within the past two decades that empirical group studies have been carried out. Methods: The following annotation briefly reviews relevant research and also attempts to address outstanding issues in this research area.…

  4. Annotation of Ehux ESTs

    Energy Technology Data Exchange (ETDEWEB)

    Kuo, Alan; Grigoriev, Igor

    2009-06-12

    22 percent ESTs do no align with scaffolds. EST Pipeleine assembles 17126 consensi from the noaligned ESTs. Annotation Pipeline predicts 8564 ORFS on the consensi. Domain analysis of ORFs reveals missing genes. Cluster analysis reveals missing genes. Expression analysis reveals potential strain specific genes.

  5. Functional annotation and ENU

    OpenAIRE

    Gunn, Teresa M.

    2012-01-01

    Functional annotation of every gene in the mouse genome is a herculean task that requires a multifaceted approach. Many large-scale initiatives are contributing to this undertaking. The International Knockout Mouse Consortium (IKMC) plans to mutate every protein-coding gene, using a combination of gene trapping and gene targeting in embryonic stem cells. Many other groups are performing using the chemical mutagen ethylnitrosourea (ENU) or transpon-based systems to induce mutations, screening ...

  6. A Factor Graph Approach to Automated GO Annotation.

    Science.gov (United States)

    Spetale, Flavio E; Tapia, Elizabeth; Krsticevic, Flavia; Roda, Fernando; Bulacio, Pilar

    2016-01-01

    As volume of genomic data grows, computational methods become essential for providing a first glimpse onto gene annotations. Automated Gene Ontology (GO) annotation methods based on hierarchical ensemble classification techniques are particularly interesting when interpretability of annotation results is a main concern. In these methods, raw GO-term predictions computed by base binary classifiers are leveraged by checking the consistency of predefined GO relationships. Both formal leveraging strategies, with main focus on annotation precision, and heuristic alternatives, with main focus on scalability issues, have been described in literature. In this contribution, a factor graph approach to the hierarchical ensemble formulation of the automated GO annotation problem is presented. In this formal framework, a core factor graph is first built based on the GO structure and then enriched to take into account the noisy nature of GO-term predictions. Hence, starting from raw GO-term predictions, an iterative message passing algorithm between nodes of the factor graph is used to compute marginal probabilities of target GO-terms. Evaluations on Saccharomyces cerevisiae, Arabidopsis thaliana and Drosophila melanogaster protein sequences from the GO Molecular Function domain showed significant improvements over competing approaches, even when protein sequences were naively characterized by their physicochemical and secondary structure properties or when loose noisy annotation datasets were considered. Based on these promising results and using Arabidopsis thaliana annotation data, we extend our approach to the identification of most promising molecular function annotations for a set of proteins of unknown function in Solanum lycopersicum. PMID:26771463

  7. Ontological Annotation with WordNet

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Tratz, Stephen C.; Gregory, Michelle L.; Chappell, Alan R.; Whitney, Paul D.; Posse, Christian; Paulson, Patrick R.; Baddeley, Bob; Hohimer, Ryan E.; White, Amanda M.

    2006-06-06

    Semantic Web applications require robust and accurate annotation tools that are capable of automating the assignment of ontological classes to words in naturally occurring text (ontological annotation). Most current ontologies do not include rich lexical databases and are therefore not easily integrated with word sense disambiguation algorithms that are needed to automate ontological annotation. WordNet provides a potentially ideal solution to this problem as it offers a highly structured lexical conceptual representation that has been extensively used to develop word sense disambiguation algorithms. However, WordNet has not been designed as an ontology, and while it can be easily turned into one, the result of doing this would present users with serious practical limitations due to the great number of concepts (synonym sets) it contains. Moreover, mapping WordNet to an existing ontology may be difficult and requires substantial labor. We propose to overcome these limitations by developing an analytical platform that (1) provides a WordNet-based ontology offering a manageable and yet comprehensive set of concept classes, (2) leverages the lexical richness of WordNet to give an extensive characterization of concept class in terms of lexical instances, and (3) integrates a class recognition algorithm that automates the assignment of concept classes to words in naturally occurring text. The ensuing framework makes available an ontological annotation platform that can be effectively integrated with intelligence analysis systems to facilitate evidence marshaling and sustain the creation and validation of inference models.

  8. Automating Ontological Annotation with WordNet

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Tratz, Stephen C.; Gregory, Michelle L.; Chappell, Alan R.; Whitney, Paul D.; Posse, Christian; Paulson, Patrick R.; Baddeley, Bob L.; Hohimer, Ryan E.; White, Amanda M.

    2006-01-22

    Semantic Web applications require robust and accurate annotation tools that are capable of automating the assignment of ontological classes to words in naturally occurring text (ontological annotation). Most current ontologies do not include rich lexical databases and are therefore not easily integrated with word sense disambiguation algorithms that are needed to automate ontological annotation. WordNet provides a potentially ideal solution to this problem as it offers a highly structured lexical conceptual representation that has been extensively used to develop word sense disambiguation algorithms. However, WordNet has not been designed as an ontology, and while it can be easily turned into one, the result of doing this would present users with serious practical limitations due to the great number of concepts (synonym sets) it contains. Moreover, mapping WordNet to an existing ontology may be difficult and requires substantial labor. We propose to overcome these limitations by developing an analytical platform that (1) provides a WordNet-based ontology offering a manageable and yet comprehensive set of concept classes, (2) leverages the lexical richness of WordNet to give an extensive characterization of concept class in terms of lexical instances, and (3) integrates a class recognition algorithm that automates the assignment of concept classes to words in naturally occurring text. The ensuing framework makes available an ontological annotation platform that can be effectively integrated with intelligence analysis systems to facilitate evidence marshaling and sustain the creation and validation of inference models.

  9. Predicting word sense annotation agreement

    DEFF Research Database (Denmark)

    Martinez Alonso, Hector; Johannsen, Anders Trærup; Lopez de Lacalle, Oier;

    2015-01-01

    High agreement is a common objective when annotating data for word senses. However, a number of factors make perfect agreement impossible, e.g. the limitations of the sense inventories, the difficulty of the examples or the interpretation preferences of the annotations. Estimating potential...... agreement is thus a relevant task to supplement the evaluation of sense annotations. In this article we propose two methods to predict agreement on word-annotation instances. We experiment with a continuous representation and a three-way discretization of observed agreement. In spite of the difficulty...

  10. The Ensembl gene annotation system.

    Science.gov (United States)

    Aken, Bronwen L; Ayling, Sarah; Barrell, Daniel; Clarke, Laura; Curwen, Valery; Fairley, Susan; Fernandez Banet, Julio; Billis, Konstantinos; García Girón, Carlos; Hourlier, Thibaut; Howe, Kevin; Kähäri, Andreas; Kokocinski, Felix; Martin, Fergal J; Murphy, Daniel N; Nag, Rishi; Ruffier, Magali; Schuster, Michael; Tang, Y Amy; Vogel, Jan-Hinnerk; White, Simon; Zadissa, Amonida; Flicek, Paul; Searle, Stephen M J

    2016-01-01

    The Ensembl gene annotation system has been used to annotate over 70 different vertebrate species across a wide range of genome projects. Furthermore, it generates the automatic alignment-based annotation for the human and mouse GENCODE gene sets. The system is based on the alignment of biological sequences, including cDNAs, proteins and RNA-seq reads, to the target genome in order to construct candidate transcript models. Careful assessment and filtering of these candidate transcripts ultimately leads to the final gene set, which is made available on the Ensembl website. Here, we describe the annotation process in detail.Database URL: http://www.ensembl.org/index.html. PMID:27337980

  11. Detection and Characterization of Engineered Nanomaterials in the Environment: Current State-of-the-art and Future Directions Report, Annotated Bibliography, and Image Library

    Science.gov (United States)

    The increasing manufacture and implementation of engineered nanomaterials (ENMs) will continue to lead to the release of these materials into the environment. Reliably assessing the environmental exposure risk of ENMs will depend highly on the ability to quantify and characterize...

  12. Gene Ontology annotations and resources.

    Science.gov (United States)

    Blake, J A; Dolan, M; Drabkin, H; Hill, D P; Li, Ni; Sitnikov, D; Bridges, S; Burgess, S; Buza, T; McCarthy, F; Peddinti, D; Pillai, L; Carbon, S; Dietze, H; Ireland, A; Lewis, S E; Mungall, C J; Gaudet, P; Chrisholm, R L; Fey, P; Kibbe, W A; Basu, S; Siegele, D A; McIntosh, B K; Renfro, D P; Zweifel, A E; Hu, J C; Brown, N H; Tweedie, S; Alam-Faruque, Y; Apweiler, R; Auchinchloss, A; Axelsen, K; Bely, B; Blatter, M -C; Bonilla, C; Bouguerleret, L; Boutet, E; Breuza, L; Bridge, A; Chan, W M; Chavali, G; Coudert, E; Dimmer, E; Estreicher, A; Famiglietti, L; Feuermann, M; Gos, A; Gruaz-Gumowski, N; Hieta, R; Hinz, C; Hulo, C; Huntley, R; James, J; Jungo, F; Keller, G; Laiho, K; Legge, D; Lemercier, P; Lieberherr, D; Magrane, M; Martin, M J; Masson, P; Mutowo-Muellenet, P; O'Donovan, C; Pedruzzi, I; Pichler, K; Poggioli, D; Porras Millán, P; Poux, S; Rivoire, C; Roechert, B; Sawford, T; Schneider, M; Stutz, A; Sundaram, S; Tognolli, M; Xenarios, I; Foulgar, R; Lomax, J; Roncaglia, P; Khodiyar, V K; Lovering, R C; Talmud, P J; Chibucos, M; Giglio, M Gwinn; Chang, H -Y; Hunter, S; McAnulla, C; Mitchell, A; Sangrador, A; Stephan, R; Harris, M A; Oliver, S G; Rutherford, K; Wood, V; Bahler, J; Lock, A; Kersey, P J; McDowall, D M; Staines, D M; Dwinell, M; Shimoyama, M; Laulederkind, S; Hayman, T; Wang, S -J; Petri, V; Lowry, T; D'Eustachio, P; Matthews, L; Balakrishnan, R; Binkley, G; Cherry, J M; Costanzo, M C; Dwight, S S; Engel, S R; Fisk, D G; Hitz, B C; Hong, E L; Karra, K; Miyasato, S R; Nash, R S; Park, J; Skrzypek, M S; Weng, S; Wong, E D; Berardini, T Z; Huala, E; Mi, H; Thomas, P D; Chan, J; Kishore, R; Sternberg, P; Van Auken, K; Howe, D; Westerfield, M

    2013-01-01

    The Gene Ontology (GO) Consortium (GOC, http://www.geneontology.org) is a community-based bioinformatics resource that classifies gene product function through the use of structured, controlled vocabularies. Over the past year, the GOC has implemented several processes to increase the quantity, quality and specificity of GO annotations. First, the number of manual, literature-based annotations has grown at an increasing rate. Second, as a result of a new 'phylogenetic annotation' process, manually reviewed, homology-based annotations are becoming available for a broad range of species. Third, the quality of GO annotations has been improved through a streamlined process for, and automated quality checks of, GO annotations deposited by different annotation groups. Fourth, the consistency and correctness of the ontology itself has increased by using automated reasoning tools. Finally, the GO has been expanded not only to cover new areas of biology through focused interaction with experts, but also to capture greater specificity in all areas of the ontology using tools for adding new combinatorial terms. The GOC works closely with other ontology developers to support integrated use of terminologies. The GOC supports its user community through the use of e-mail lists, social media and web-based resources. PMID:23161678

  13. Sentiment Analysis of Document Based on Annotation

    CERN Document Server

    Shukla, Archana

    2011-01-01

    I present a tool which tells the quality of document or its usefulness based on annotations. Annotation may include comments, notes, observation, highlights, underline, explanation, question or help etc. comments are used for evaluative purpose while others are used for summarization or for expansion also. Further these comments may be on another annotation. Such annotations are referred as meta-annotation. All annotation may not get equal weightage. My tool considered highlights, underline as well as comments to infer the collective sentiment of annotators. Collective sentiments of annotators are classified as positive, negative, objectivity. My tool computes collective sentiment of annotations in two manners. It counts all the annotation present on the documents as well as it also computes sentiment scores of all annotation which includes comments to obtain the collective sentiments about the document or to judge the quality of document. I demonstrate the use of tool on research paper.

  14. Representing NCBO Annotator results in standard RDF with the Annotation Ontology

    OpenAIRE

    Melzi, Soumia; Jonquet, Clement

    2014-01-01

    International audience; Semantic annotation is part of the Semantic Web vision. The Annotation Ontology is a model that have been proposed to represent any annotations in standard RDF. The NCBO Annotator Web service is a broadly used service for annotations in the biomedical domain, offered within the BioPortal platform and giving access to more than 350+ ontologies. This paper presents a new output format to represent the NCBO Annotator results in RDF with the Annotation Ontology. We briefly...

  15. Comparative genomic mapping of the bovine Fragile Histidine Triad (FHIT tumour suppressor gene: characterization of a 2 Mb BAC contig covering the locus, complete annotation of the gene, analysis of cDNA and of physiological expression profiles

    Directory of Open Access Journals (Sweden)

    Boussaha Mekki

    2006-05-01

    Full Text Available Abstract Background The Fragile Histidine Triad gene (FHIT is an oncosuppressor implicated in many human cancers, including vesical tumors. FHIT is frequently hit by deletions caused by fragility at FRA3B, the most active of human common fragile sites, where FHIT lays. Vesical tumors affect also cattle, including animals grazing in the wild on bracken fern; compounds released by the fern are known to induce chromosome fragility and may trigger cancer with the interplay of latent Papilloma virus. Results The bovine FHIT was characterized by assembling a contig of 78 BACs. Sequence tags were designed on human exons and introns and used directly to select bovine BACs, or compared with sequence data in the bovine genome database or in the trace archive of the bovine genome sequencing project, and adapted before use. FHIT is split in ten exons like in man, with exons 5 to 9 coding for a 149 amino acids protein. VISTA global alignments between bovine genomic contigs retrieved from the bovine genome database and the human FHIT region were performed. Conservation was extremely high over a 2 Mb region spanning the whole FHIT locus, including the size of introns. Thus, the bovine FHIT covers about 1.6 Mb compared to 1.5 Mb in man. Expression was analyzed by RT-PCR and Northern blot, and was found to be ubiquitous. Four cDNA isoforms were isolated and sequenced, that originate from an alternative usage of three variants of exon 4, revealing a size very close to the major human FHIT cDNAs. Conclusion A comparative genomic approach allowed to assemble a contig of 78 BACs and to completely annotate a 1.6 Mb region spanning the bovine FHIT gene. The findings confirmed the very high level of conservation between human and bovine genomes and the importance of comparative mapping to speed the annotation process of the recently sequenced bovine genome. The detailed knowledge of the genomic FHIT region will allow to study the role of FHIT in bovine cancerogenesis

  16. Development of EST-SSR and TRAP markers from transcriptome sequencing data of the mango.

    Science.gov (United States)

    Luo, C; Wu, H X; Yao, Q S; Wang, S B; Xu, W T

    2015-01-01

    Mango is one of the most commercially important fruit crops in tropical and subtropical regions. To increase the efficiency of breeding strategies, two EST-derived marker systems were developed in the present study using information from the mango fruit transcriptome. Using simple sequence repeats, 218 of 230 primer pairs showed stable amplification for 7 mango genotypes with amplicons ranging from 84 to 160 bp; 93 of the primer pairs yielded polymorphic products. The proportion of polymorphic bands ranged from 16.67 to 100%, with a mean of 55.64%. In contrast, 86 primer pairs exhibited good amplification with clear bands for target region amplification polymorphism analysis, and a total of 66 primer combinations were polymorphic. These two novel sets of EST-derived markers will be of use in future studies of genetic diversity, genetic map construction, and marker-assisted selection in mango. PMID:26214472

  17. Sequence alignment status and amplicon size difference affecting EST-SSR primer performance and polymorphism

    Science.gov (United States)

    Little attention has been given to failed, poorly-performing, and non-polymorphic expressed sequence tag (EST) simple sequence repeat (SSR) primers. This is due in part to a lack of interest and value in reporting them but also because of the difficulty in addressing the causes of failure on a prime...

  18. Transferability and polymorphism of barley EST-SSR markers used for phylogenetic analysis in Hordeum chilense

    OpenAIRE

    Dorado Gabriel; Varshney Rajeev K; Budak Hikmet; Castillo Almudena; Graner Andreas; Hernandez Pilar

    2008-01-01

    Abstract Background Hordeum chilense, a native South American diploid wild barley, is a potential source of useful genes for cereal breeding. The use of this wild species to increase genetic variation in cereals will be greatly facilitated by marker-assisted selection. Different economically feasible approaches have been undertaken for this wild species with limited direct agricultural use in a search for suitable and cost-effective markers. The availability of Expressed Sequence Tags (EST) d...

  19. NCBI prokaryotic genome annotation pipeline.

    Science.gov (United States)

    Tatusova, Tatiana; DiCuccio, Michael; Badretdin, Azat; Chetvernin, Vyacheslav; Nawrocki, Eric P; Zaslavsky, Leonid; Lomsadze, Alexandre; Pruitt, Kim D; Borodovsky, Mark; Ostell, James

    2016-08-19

    Recent technological advances have opened unprecedented opportunities for large-scale sequencing and analysis of populations of pathogenic species in disease outbreaks, as well as for large-scale diversity studies aimed at expanding our knowledge across the whole domain of prokaryotes. To meet the challenge of timely interpretation of structure, function and meaning of this vast genetic information, a comprehensive approach to automatic genome annotation is critically needed. In collaboration with Georgia Tech, NCBI has developed a new approach to genome annotation that combines alignment based methods with methods of predicting protein-coding and RNA genes and other functional elements directly from sequence. A new gene finding tool, GeneMarkS+, uses the combined evidence of protein and RNA placement by homology as an initial map of annotation to generate and modify ab initio gene predictions across the whole genome. Thus, the new NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) relies more on sequence similarity when confident comparative data are available, while it relies more on statistical predictions in the absence of external evidence. The pipeline provides a framework for generation and analysis of annotation on the full breadth of prokaryotic taxonomy. For additional information on PGAP see https://www.ncbi.nlm.nih.gov/genome/annotation_prok/ and the NCBI Handbook, https://www.ncbi.nlm.nih.gov/books/NBK174280/. PMID:27342282

  20. NCBI prokaryotic genome annotation pipeline.

    Science.gov (United States)

    Tatusova, Tatiana; DiCuccio, Michael; Badretdin, Azat; Chetvernin, Vyacheslav; Nawrocki, Eric P; Zaslavsky, Leonid; Lomsadze, Alexandre; Pruitt, Kim D; Borodovsky, Mark; Ostell, James

    2016-08-19

    Recent technological advances have opened unprecedented opportunities for large-scale sequencing and analysis of populations of pathogenic species in disease outbreaks, as well as for large-scale diversity studies aimed at expanding our knowledge across the whole domain of prokaryotes. To meet the challenge of timely interpretation of structure, function and meaning of this vast genetic information, a comprehensive approach to automatic genome annotation is critically needed. In collaboration with Georgia Tech, NCBI has developed a new approach to genome annotation that combines alignment based methods with methods of predicting protein-coding and RNA genes and other functional elements directly from sequence. A new gene finding tool, GeneMarkS+, uses the combined evidence of protein and RNA placement by homology as an initial map of annotation to generate and modify ab initio gene predictions across the whole genome. Thus, the new NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) relies more on sequence similarity when confident comparative data are available, while it relies more on statistical predictions in the absence of external evidence. The pipeline provides a framework for generation and analysis of annotation on the full breadth of prokaryotic taxonomy. For additional information on PGAP see https://www.ncbi.nlm.nih.gov/genome/annotation_prok/ and the NCBI Handbook, https://www.ncbi.nlm.nih.gov/books/NBK174280/.

  1. Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium.

    Science.gov (United States)

    Gaudet, Pascale; Livstone, Michael S; Lewis, Suzanna E; Thomas, Paul D

    2011-09-01

    The goal of the Gene Ontology (GO) project is to provide a uniform way to describe the functions of gene products from organisms across all kingdoms of life and thereby enable analysis of genomic data. Protein annotations are either based on experiments or predicted from protein sequences. Since most sequences have not been experimentally characterized, most available annotations need to be based on predictions. To make as accurate inferences as possible, the GO Consortium's Reference Genome Project is using an explicit evolutionary framework to infer annotations of proteins from a broad set of genomes from experimental annotations in a semi-automated manner. Most components in the pipeline, such as selection of sequences, building multiple sequence alignments and phylogenetic trees, retrieving experimental annotations and depositing inferred annotations, are fully automated. However, the most crucial step in our pipeline relies on software-assisted curation by an expert biologist. This curation tool, Phylogenetic Annotation and INference Tool (PAINT) helps curators to infer annotations among members of a protein family. PAINT allows curators to make precise assertions as to when functions were gained and lost during evolution and record the evidence (e.g. experimentally supported GO annotations and phylogenetic information including orthology) for those assertions. In this article, we describe how we use PAINT to infer protein function in a phylogenetic context with emphasis on its strengths, limitations and guidelines. We also discuss specific examples showing how PAINT annotations compare with those generated by other highly used homology-based methods. PMID:21873635

  2. An automated annotation tool for genomic DNA sequences using GeneScan and BLAST

    Indian Academy of Sciences (India)

    Andrew M. Lynn; Chakresh Kumar Jain; K. Kosalai; Pranjan Barman; Nupur Thakur; Harish Batra; Alok Bhattacharya

    2001-04-01

    Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated annotation of genome DNA sequences.

  3. ProGMap: an integrated annotation resource for protein orthology

    NARCIS (Netherlands)

    Kuzniar, A.; Lin, K.; He, Y.; Nijveen, H.; Pongor, S.; Leunissen, J.A.M.

    2009-01-01

    Current protein sequence databases employ different classification schemes that often provide conflicting annotations, especially for poorly characterized proteins. ProGMap (Protein Group Mappings, http://www.bioinformatics.nl/progmap) is a web-tool designed to help researchers and database annotato

  4. Phylogenetic molecular function annotation

    OpenAIRE

    Barbara E Engelhardt; Jordan, Michael I.; Repo, Susanna T; Brenner, Steven E

    2009-01-01

    It is now easier to discover thousands of protein sequences in a new microbial genome than it is to biochemically characterize the specific activity of a single protein of unknown function. The molecular functions of protein sequences have typically been predicted using homology-based computational methods, which rely on the principle that homologous proteins share a similar function. However, some protein families include groups of proteins with different molecular functions. A phylogenetic ...

  5. Annotated Bibliography, Grades K-6.

    Science.gov (United States)

    Massachusetts Dept. of Education, Boston. Bureau of Nutrition Education and School Food Services.

    This annotated bibliography on nutrition is for the use of teachers at the elementary grade level. It contains a list of books suitable for reading about nutrition and foods for pupils from kindergarten through the sixth grade. Films and audiovisual presentations for classroom use are also listed. The names and addresses from which these materials…

  6. Bioinformatics for plant genome annotation

    NARCIS (Netherlands)

    Fiers, M.W.E.J.

    2006-01-01

    Large amounts of genome sequence data are available and much more will become available in the near future. A DNA sequence alone has, however, limited use. Genome annotation is required to assign biological interpretation to the DNA sequence. This thesis describ

  7. Annotated Bibliography on Humanistic Education

    Science.gov (United States)

    Ganung, Cynthia

    1975-01-01

    Part I of this annotated bibliography deals with books and articles on such topics as achievement motivation, process education, transactional analysis, discipline without punishment, role-playing, interpersonal skills, self-acceptance, moral education, self-awareness, values clarification, and non-verbal communication. Part II focuses on…

  8. Multicultural Education. An Annotated Bibliography.

    Science.gov (United States)

    Narang, H. L.

    This annotated bibliography contains references to books, journal articles, ERIC documents, doctoral dissertations, and audio-visual materials on the subject of multicultural education. Topics include integrating multiculturalism in school subjects, prejudice and discrimination, intercultural communication, ethnic identity and ethnic bias.…

  9. Nikos Kazantzakis: An Annotated Bibliography.

    Science.gov (United States)

    Qiu, Kui

    This research paper consists of an annotated bibliography about Nikos Kazantzakis, one of the major modern Greek writers and author of "The Last Temptation of Christ,""Zorba the Greek," and many other works. Because of Kazantzakis' position in world literature there are many critical works about him; however, bibliographical control of these works…

  10. Systems Theory and Communication. Annotated Bibliography.

    Science.gov (United States)

    Covington, William G., Jr.

    This annotated bibliography presents annotations of 31 books and journal articles dealing with systems theory and its relation to organizational communication, marketing, information theory, and cybernetics. Materials were published between 1963 and 1992 and are listed alphabetically by author. (RS)

  11. Towards Automated Annotation of Benthic Survey Images: Variability of Human Experts and Operational Modes of Automation.

    Science.gov (United States)

    Beijbom, Oscar; Edmunds, Peter J; Roelfsema, Chris; Smith, Jennifer; Kline, David I; Neal, Benjamin P; Dunlap, Matthew J; Moriarty, Vincent; Fan, Tung-Yung; Tan, Chih-Jui; Chan, Stephen; Treibitz, Tali; Gamst, Anthony; Mitchell, B Greg; Kriegman, David

    2015-01-01

    Global climate change and other anthropogenic stressors have heightened the need to rapidly characterize ecological changes in marine benthic communities across large scales. Digital photography enables rapid collection of survey images to meet this need, but the subsequent image annotation is typically a time consuming, manual task. We investigated the feasibility of using automated point-annotation to expedite cover estimation of the 17 dominant benthic categories from survey-images captured at four Pacific coral reefs. Inter- and intra- annotator variability among six human experts was quantified and compared to semi- and fully- automated annotation methods, which are made available at coralnet.ucsd.edu. Our results indicate high expert agreement for identification of coral genera, but lower agreement for algal functional groups, in particular between turf algae and crustose coralline algae. This indicates the need for unequivocal definitions of algal groups, careful training of multiple annotators, and enhanced imaging technology. Semi-automated annotation, where 50% of the annotation decisions were performed automatically, yielded cover estimate errors comparable to those of the human experts. Furthermore, fully-automated annotation yielded rapid, unbiased cover estimates but with increased variance. These results show that automated annotation can increase spatial coverage and decrease time and financial outlay for image-based reef surveys. PMID:26154157

  12. Towards Automated Annotation of Benthic Survey Images: Variability of Human Experts and Operational Modes of Automation.

    Directory of Open Access Journals (Sweden)

    Oscar Beijbom

    Full Text Available Global climate change and other anthropogenic stressors have heightened the need to rapidly characterize ecological changes in marine benthic communities across large scales. Digital photography enables rapid collection of survey images to meet this need, but the subsequent image annotation is typically a time consuming, manual task. We investigated the feasibility of using automated point-annotation to expedite cover estimation of the 17 dominant benthic categories from survey-images captured at four Pacific coral reefs. Inter- and intra- annotator variability among six human experts was quantified and compared to semi- and fully- automated annotation methods, which are made available at coralnet.ucsd.edu. Our results indicate high expert agreement for identification of coral genera, but lower agreement for algal functional groups, in particular between turf algae and crustose coralline algae. This indicates the need for unequivocal definitions of algal groups, careful training of multiple annotators, and enhanced imaging technology. Semi-automated annotation, where 50% of the annotation decisions were performed automatically, yielded cover estimate errors comparable to those of the human experts. Furthermore, fully-automated annotation yielded rapid, unbiased cover estimates but with increased variance. These results show that automated annotation can increase spatial coverage and decrease time and financial outlay for image-based reef surveys.

  13. A Manual Curation Strategy to Improve Genome Annotation: Application to a Set of Haloarchael Genomes

    Directory of Open Access Journals (Sweden)

    Friedhelm Pfeiffer

    2015-06-01

    Full Text Available Genome annotation errors are a persistent problem that impede research in the biosciences. A manual curation effort is described that attempts to produce high-quality genome annotations for a set of haloarchaeal genomes (Halobacterium salinarum and Hbt. hubeiense, Haloferax volcanii and Hfx. mediterranei, Natronomonas pharaonis and Nmn. moolapensis, Haloquadratum walsbyi strains HBSQ001 and C23, Natrialba magadii, Haloarcula marismortui and Har. hispanica, and Halohasta litchfieldiae. Genomes are checked for missing genes, start codon misassignments, and disrupted genes. Assignments of a specific function are preferably based on experimentally characterized homologs (Gold Standard Proteins. To avoid overannotation, which is a major source of database errors, we restrict annotation to only general function assignments when support for a specific substrate assignment is insufficient. This strategy results in annotations that are resistant to the plethora of errors that compromise public databases. Annotation consistency is rigorously validated for ortholog pairs from the genomes surveyed. The annotation is regularly crosschecked against the UniProt database to further improve annotations and increase the level of standardization. Enhanced genome annotations are submitted to public databases (EMBL/GenBank, UniProt, to the benefit of the scientific community. The enhanced annotations are also publically available via HaloLex.

  14. Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs.

    Directory of Open Access Journals (Sweden)

    Norihiro Maeda

    2006-04-01

    Full Text Available The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts, providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.

  15. Annotating images by mining image search results

    NARCIS (Netherlands)

    X.J. Wang; L. Zhang; X. Li; W.Y. Ma

    2008-01-01

    Although it has been studied for years by the computer vision and machine learning communities, image annotation is still far from practical. In this paper, we propose a novel attempt at model-free image annotation, which is a data-driven approach that annotates images by mining their search results

  16. Are clickthrough data reliable as image annotations?

    NARCIS (Netherlands)

    Tsikrika, T.; Diou, C.; Vries, A.P. de; Delopoulos, A.

    2009-01-01

    We examine the reliability of clickthrough data as concept-based image annotations, by comparing them against manual annotations, for different concept categories. Our analysis shows that, for many concepts, the image annotations generated by using clickthrough data are reliable, with up to 90% of t

  17. Genome re-annotation: a wiki solution?

    OpenAIRE

    Salzberg, Steven L.

    2007-01-01

    The annotation of most genomes becomes outdated over time, owing in part to our ever-improving knowledge of genomes and in part to improvements in bioinformatics software. Unfortunately, annotation is rarely if ever updated and resources to support routine reannotation are scarce. Wiki software, which would allow many scientists to edit each genome's annotation, offers one possible solution.

  18. Functional Annotations of Paralogs: A Blessing and a Curse.

    Science.gov (United States)

    Zallot, Rémi; Harrison, Katherine J; Kolaczkowski, Bryan; de Crécy-Lagard, Valérie

    2016-01-01

    Gene duplication followed by mutation is a classic mechanism of neofunctionalization, producing gene families with functional diversity. In some cases, a single point mutation is sufficient to change the substrate specificity and/or the chemistry performed by an enzyme, making it difficult to accurately separate enzymes with identical functions from homologs with different functions. Because sequence similarity is often used as a basis for assigning functional annotations to genes, non-isofunctional gene families pose a great challenge for genome annotation pipelines. Here we describe how integrating evolutionary and functional information such as genome context, phylogeny, metabolic reconstruction and signature motifs may be required to correctly annotate multifunctional families. These integrative analyses can also lead to the discovery of novel gene functions, as hints from specific subgroups can guide the functional characterization of other members of the family. We demonstrate how careful manual curation processes using comparative genomics can disambiguate subgroups within large multifunctional families and discover their functions. We present the COG0720 protein family as a case study. We also discuss strategies to automate this process to improve the accuracy of genome functional annotation pipelines. PMID:27618105

  19. Project Aloha:indexing, highlighting and annotation

    OpenAIRE

    Fallahkhair, Sanaz; Kennedy, Ian

    2010-01-01

    Lifelong learning requires many skills that are often not taught or are poorly taught. Such skills include speed reading, critical analysis, creative thinking, active reading and even a “little” skill like annotation. There are many ways that readers annotate. A short classification of some ways that reader may annotate includes underlining, using coloured highlighters, interlinear notes, marginal notes, and disassociated notes. This paper presents an investigation into the use of a tool for ...

  20. Knowledge Annotation maknig implicit knowledge explicit

    CERN Document Server

    Dingli, Alexiei

    2011-01-01

    Did you ever read something on a book, felt the need to comment, took up a pencil and scribbled something on the books' text'? If you did, you just annotated a book. But that process has now become something fundamental and revolutionary in these days of computing. Annotation is all about adding further information to text, pictures, movies and even to physical objects. In practice, anything which can be identified either virtually or physically can be annotated. In this book, we will delve into what makes annotations, and analyse their significance for the future evolutions of the web. We wil

  1. ANNOTATION SUPPORTED OCCLUDED OBJECT TRACKING

    Directory of Open Access Journals (Sweden)

    Devinder Kumar

    2012-08-01

    Full Text Available Tracking occluded objects at different depths has become as extremely important component of study for any video sequence having wide applications in object tracking, scene recognition, coding, editing the videos and mosaicking. The paper studies the ability of annotation to track the occluded object based on pyramids with variation in depth further establishing a threshold at which the ability of the system to track the occluded object fails. Image annotation is applied on 3 similar video sequences varying in depth. In the experiment, one bike occludes the other at a depth of 60cm, 80cm and 100cm respectively. Another experiment is performed on tracking humans with similar depth to authenticate the results. The paper also computes the frame by frame error incurred by the system, supported by detailed simulations. This system can be effectively used to analyze the error in motion tracking and further correcting the error leading to flawless tracking. This can be of great interest to computer scientists while designing surveillance systems etc.

  2. New directions in biomedical text annotation: definitions, guidelines and corpus construction

    Directory of Open Access Journals (Sweden)

    Rzhetsky Andrey

    2006-07-01

    Full Text Available Abstract Background While biomedical text mining is emerging as an important research area, practical results have proven difficult to achieve. We believe that an important first step towards more accurate text-mining lies in the ability to identify and characterize text that satisfies various types of information needs. We report here the results of our inquiry into properties of scientific text that have sufficient generality to transcend the confines of a narrow subject area, while supporting practical mining of text for factual information. Our ultimate goal is to annotate a significant corpus of biomedical text and train machine learning methods to automatically categorize such text along certain dimensions that we have defined. Results We have identified five qualitative dimensions that we believe characterize a broad range of scientific sentences, and are therefore useful for supporting a general approach to text-mining: focus, polarity, certainty, evidence, and directionality. We define these dimensions and describe the guidelines we have developed for annotating text with regard to them. To examine the effectiveness of the guidelines, twelve annotators independently annotated the same set of 101 sentences that were randomly selected from current biomedical periodicals. Analysis of these annotations shows 70–80% inter-annotator agreement, suggesting that our guidelines indeed present a well-defined, executable and reproducible task. Conclusion We present our guidelines defining a text annotation task, along with annotation results from multiple independently produced annotations, demonstrating the feasibility of the task. The annotation of a very large corpus of documents along these guidelines is currently ongoing. These annotations form the basis for the categorization of text along multiple dimensions, to support viable text mining for experimental results, methodology statements, and other forms of information. We are currently

  3. Towards Automated Annotation of Benthic Survey Images: Variability of Human Experts and Operational Modes of Automation

    OpenAIRE

    Oscar Beijbom; Edmunds, Peter J.; Chris Roelfsema; Jennifer Smith; Kline, David I.; Neal, Benjamin P.; Matthew J Dunlap; Vincent Moriarty; Tung-Yung Fan; Chih-Jui Tan; Stephen Chan; Tali Treibitz; Anthony Gamst; B. Greg Mitchell; David Kriegman

    2015-01-01

    Global climate change and other anthropogenic stressors have heightened the need to rapidly characterize ecological changes in marine benthic communities across large scales. Digital photography enables rapid collection of survey images to meet this need, but the subsequent image annotation is typically a time consuming, manual task. We investigated the feasibility of using automated point-annotation to expedite cover estimation of the 17 dominant benthic categories from survey-images capture...

  4. The surplus value of semantic annotations

    NARCIS (Netherlands)

    M. Marx

    2010-01-01

    We compare the costs of semantic annotation of textual documents to its benefits for information processing tasks. Semantic annotation can improve the performance of retrieval tasks and facilitates an improved search experience through faceted search, focused retrieval, better document summaries, an

  5. Annotation of regular polysemy and underspecification

    DEFF Research Database (Denmark)

    Martínez Alonso, Héctor; Pedersen, Bolette Sandford; Bel, Núria

    2013-01-01

    We present the result of an annotation task on regular polysemy for a series of seman- tic classes or dot types in English, Dan- ish and Spanish. This article describes the annotation process, the results in terms of inter-encoder agreement, and the sense distributions obtained with two methods...

  6. Ground Truth Annotation in T Analyst

    DEFF Research Database (Denmark)

    2015-01-01

    This video shows how to annotate the ground truth tracks in the thermal videos. The ground truth tracks are produced to be able to compare them to tracks obtained from a Computer Vision tracking approach. The program used for annotation is T-Analyst, which is developed by Aliaksei Laureshyn, Ph...

  7. Creating Gaze Annotations in Head Mounted Displays

    DEFF Research Database (Denmark)

    Mardanbeigi, Diako; Qvarfordt, Pernilla

    2015-01-01

    To facilitate distributed communication in mobile settings, we developed GazeNote for creating and sharing gaze annotations in head mounted displays (HMDs). With gaze annotations it possible to point out objects of interest within an image and add a verbal description. To create an annota- tion, ...

  8. DIMA – Annotation guidelines for German intonation

    DEFF Research Database (Denmark)

    Kügler, Frank; Smolibocki, Bernadett; Arnold, Denis;

    2015-01-01

    easier since German intonation is currently annotated according to different models. To this end, we aim to provide guidelines that are easy to learn. The guidelines were evaluated running an inter-annotator reliability study on three different speech styles (read speech, monologue and dialogue...

  9. Using Rhetorical Annotations for Generating Video Documentaries

    NARCIS (Netherlands)

    Bocconi, S.; Nack, F.-M.; Hardman, L.

    2005-01-01

    We use rhetorical annotations to specify a generation process that can assemble meaningful video sequences with a communicative goal and an argumentative progression. Our annotation schema encodes the verbal information contained in the audio channel, identifying the claims the interviewees make and

  10. Using rhetorical annotations for generating video documentaries

    NARCIS (Netherlands)

    Bocconi, S.; Nack, F.-M.; Hardman, L.

    2005-01-01

    We use rhetorical annotations to specify a generation process that can assemble meaningful video sequences with a communicative goal and an argumentative progression. Our annotation schema encodes the verbal information contained in the audio channel, identifying the claims the interviewees make and

  11. Manual Annotation of Translational Equivalence The Blinker Project

    CERN Document Server

    Melamed, I D

    1998-01-01

    Bilingual annotators were paid to link roughly sixteen thousand corresponding words between on-line versions of the Bible in modern French and modern English. These annotations are freely available to the research community from http://www.cis.upenn.edu/~melamed . The annotations can be used for several purposes. First, they can be used as a standard data set for developing and testing translation lexicons and statistical translation models. Second, researchers in lexical semantics will be able to mine the annotations for insights about cross-linguistic lexicalization patterns. Third, the annotations can be used in research into certain recently proposed methods for monolingual word-sense disambiguation. This paper describes the annotated texts, the specially-designed annotation tool, and the strategies employed to increase the consistency of the annotations. The annotation process was repeated five times by different annotators. Inter-annotator agreement rates indicate that the annotations are reasonably rel...

  12. Facilitating functional annotation of chicken microarray data

    Directory of Open Access Journals (Sweden)

    Gresham Cathy R

    2009-10-01

    Full Text Available Abstract Background Modeling results from chicken microarray studies is challenging for researchers due to little functional annotation associated with these arrays. The Affymetrix GenChip chicken genome array, one of the biggest arrays that serve as a key research tool for the study of chicken functional genomics, is among the few arrays that link gene products to Gene Ontology (GO. However the GO annotation data presented by Affymetrix is incomplete, for example, they do not show references linked to manually annotated functions. In addition, there is no tool that facilitates microarray researchers to directly retrieve functional annotations for their datasets from the annotated arrays. This costs researchers amount of time in searching multiple GO databases for functional information. Results We have improved the breadth of functional annotations of the gene products associated with probesets on the Affymetrix chicken genome array by 45% and the quality of annotation by 14%. We have also identified the most significant diseases and disorders, different types of genes, and known drug targets represented on Affymetrix chicken genome array. To facilitate functional annotation of other arrays and microarray experimental datasets we developed an Array GO Mapper (AGOM tool to help researchers to quickly retrieve corresponding functional information for their dataset. Conclusion Results from this study will directly facilitate annotation of other chicken arrays and microarray experimental datasets. Researchers will be able to quickly model their microarray dataset into more reliable biological functional information by using AGOM tool. The disease, disorders, gene types and drug targets revealed in the study will allow researchers to learn more about how genes function in complex biological systems and may lead to new drug discovery and development of therapies. The GO annotation data generated will be available for public use via AgBase website and

  13. Concept annotation in the CRAFT corpus

    Directory of Open Access Journals (Sweden)

    Bada Michael

    2012-07-01

    Full Text Available Abstract Background Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text. Results This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released. Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement. Conclusions As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens, our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection, the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are

  14. Making web annotations persistent over time

    Energy Technology Data Exchange (ETDEWEB)

    Sanderson, Robert [Los Alamos National Laboratory; Van De Sompel, Herbert [Los Alamos National Laboratory

    2010-01-01

    As Digital Libraries (DL) become more aligned with the web architecture, their functional components need to be fundamentally rethought in terms of URIs and HTTP. Annotation, a core scholarly activity enabled by many DL solutions, exhibits a clearly unacceptable characteristic when existing models are applied to the web: due to the representations of web resources changing over time, an annotation made about a web resource today may no longer be relevant to the representation that is served from that same resource tomorrow. We assume the existence of archived versions of resources, and combine the temporal features of the emerging Open Annotation data model with the capability offered by the Memento framework that allows seamless navigation from the URI of a resource to archived versions of that resource, and arrive at a solution that provides guarantees regarding the persistence of web annotations over time. More specifically, we provide theoretical solutions and proof-of-concept experimental evaluations for two problems: reconstructing an existing annotation so that the correct archived version is displayed for all resources involved in the annotation, and retrieving all annotations that involve a given archived version of a web resource.

  15. Annotating user-defined abstractions for optimization

    Energy Technology Data Exchange (ETDEWEB)

    Quinlan, D; Schordan, M; Vuduc, R; Yi, Q

    2005-12-05

    This paper discusses the features of an annotation language that we believe to be essential for optimizing user-defined abstractions. These features should capture semantics of function, data, and object-oriented abstractions, express abstraction equivalence (e.g., a class represents an array abstraction), and permit extension of traditional compiler optimizations to user-defined abstractions. Our future work will include developing a comprehensive annotation language for describing the semantics of general object-oriented abstractions, as well as automatically verifying and inferring the annotated semantics.

  16. Crowdsourcing and annotating NER for Twitter #drift

    DEFF Research Database (Denmark)

    Fromreide, Hege; Hovy, Dirk; Søgaard, Anders

    2014-01-01

    We present two new NER datasets for Twitter; a manually annotated set of 1,467 tweets (kappa=0.942) and a set of 2,975 expert-corrected, crowdsourced NER annotated tweets from the dataset described in Finin et al. (2010). In our experiments with these datasets, we observe two important points: (a......) language drift on Twitter is significant, and while off-the-shelf systems have been reported to perform well on in-sample data, they often perform poorly on new samples of tweets, (b) state-of-the-art performance across various datasets can beobtained from crowdsourced annotations, making it more feasible...

  17. SeqAnt: A web service to rapidly identify and annotate DNA sequence variations

    Directory of Open Access Journals (Sweden)

    Patel Viren

    2010-09-01

    Full Text Available Abstract Background The enormous throughput and low cost of second-generation sequencing platforms now allow research and clinical geneticists to routinely perform single experiments that identify tens of thousands to millions of variant sites. Existing methods to annotate variant sites using information from publicly available databases via web browsers are too slow to be useful for the large sequencing datasets being routinely generated by geneticists. Because sequence annotation of variant sites is required before functional characterization can proceed, the lack of a high-throughput pipeline to efficiently annotate variant sites can act as a significant bottleneck in genetics research. Results SeqAnt (Sequence Annotator is an open source web service and software package that rapidly annotates DNA sequence variants and identifies recessive or compound heterozygous loci in human, mouse, fly, and worm genome sequencing experiments. Variants are characterized with respect to their functional type, frequency, and evolutionary conservation. Annotated variants can be viewed on a web browser, downloaded in a tab-delimited text file, or directly uploaded in a BED format to the UCSC genome browser. To demonstrate the speed of SeqAnt, we annotated a series of publicly available datasets that ranged in size from 37 to 3,439,107 variant sites. The total time to completely annotate these data completely ranged from 0.17 seconds to 28 minutes 49.8 seconds. Conclusion SeqAnt is an open source web service and software package that overcomes a critical bottleneck facing research and clinical geneticists using second-generation sequencing platforms. SeqAnt will prove especially useful for those investigators who lack dedicated bioinformatics personnel or infrastructure in their laboratories.

  18. Computational annotation of genes differentially expressed along olive fruit development

    Directory of Open Access Journals (Sweden)

    Martinelli Federico

    2009-10-01

    used to query all known KEGG (Kyoto Encyclopaedia of Genes and Genomes metabolic pathways for characterizing and positioning retrieved EST records. The integration of the olive sequence datasets within the MapMan platform for microarray analysis allowed the identification of specific biosynthetic pathways useful for the definition of key functional categories in time course analyses for gene groups. Conclusion The bioinformatic annotation of all gene sequences was useful to shed light on metabolic pathways and transcriptional aspects related to carbohydrates, fatty acids, secondary metabolites, transcription factors and hormones as well as response to biotic and abiotic stresses throughout olive drupe development. These results represent a first step toward both functional genomics and systems biology research for understanding the gene functions and regulatory networks in olive fruit growth and ripening.

  19. Meteor showers an annotated catalog

    CERN Document Server

    Kronk, Gary W

    2014-01-01

    Meteor showers are among the most spectacular celestial events that may be observed by the naked eye, and have been the object of fascination throughout human history. In “Meteor Showers: An Annotated Catalog,” the interested observer can access detailed research on over 100 annual and periodic meteor streams in order to capitalize on these majestic spectacles. Each meteor shower entry includes details of their discovery, important observations and orbits, and gives a full picture of duration, location in the sky, and expected hourly rates. Armed with a fuller understanding, the amateur observer can better view and appreciate the shower of their choice. The original book, published in 1988, has been updated with over 25 years of research in this new and improved edition. Almost every meteor shower study is expanded, with some original minor showers being dropped while new ones are added. The book also includes breakthroughs in the study of meteor showers, such as accurate predictions of outbursts as well ...

  20. Modeling Social Annotation: a Bayesian Approach

    CERN Document Server

    Plangprasopchok, Anon

    2008-01-01

    Collaborative tagging systems, such as del.icio.us, CiteULike, and others, allow users to annotate objects, e.g., Web pages or scientific papers, with descriptive labels called tags. The social annotations, contributed by thousands of users, can potentially be used to infer categorical knowledge, classify documents or recommend new relevant information. Traditional text inference methods do not make best use of socially-generated data, since they do not take into account variations in individual users' perspectives and vocabulary. In a previous work, we introduced a simple probabilistic model that takes interests of individual annotators into account in order to find hidden topics of annotated objects. Unfortunately, our proposed approach had a number of shortcomings, including overfitting, local maxima and the requirement to specify values for some parameters. In this paper we address these shortcomings in two ways. First, we extend the model to a fully Bayesian framework. Second, we describe an infinite ver...

  1. Annotation of Scientific Summaries for Information Retrieval

    CERN Document Server

    Ibekwe-Sanjuan, Fidelia; Eric, Sanjuan; Eric, Charton

    2011-01-01

    We present a methodology combining surface NLP and Machine Learning techniques for ranking asbtracts and generating summaries based on annotated corpora. The corpora were annotated with meta-semantic tags indicating the category of information a sentence is bearing (objective, findings, newthing, hypothesis, conclusion, future work, related work). The annotated corpus is fed into an automatic summarizer for query-oriented abstract ranking and multi- abstract summarization. To adapt the summarizer to these two tasks, two novel weighting functions were devised in order to take into account the distribution of the tags in the corpus. Results, although still preliminary, are encouraging us to pursue this line of work and find better ways of building IR systems that can take into account semantic annotations in a corpus.

  2. SASL: A Semantic Annotation System for Literature

    Science.gov (United States)

    Yuan, Pingpeng; Wang, Guoyin; Zhang, Qin; Jin, Hai

    Due to ambiguity, search engines for scientific literatures may not return right search results. One efficient solution to the problems is to automatically annotate literatures and attach the semantic information to them. Generally, semantic annotation requires identifying entities before attaching semantic information to them. However, due to abbreviation and other reasons, it is very difficult to identify entities correctly. The paper presents a Semantic Annotation System for Literature (SASL), which utilizes Wikipedia as knowledge base to annotate literatures. SASL mainly attaches semantic to terminology, academic institutions, conferences, and journals etc. Many of them are usually abbreviations, which induces ambiguity. Here, SASL uses regular expressions to extract the mapping between full name of entities and their abbreviation. Since full names of several entities may map to a single abbreviation, SASL introduces Hidden Markov Model to implement name disambiguation. Finally, the paper presents the experimental results, which confirm SASL a good performance.

  3. Learning Object Annotation for Agricultural Learning Repositories

    OpenAIRE

    Ebner, Hannes; Manouselis, Nikos; Palmér, Matthias; Enoksson, Fredrik; Palavitsinis, Nikos; Kastrantas, Kostas; Naeve, Ambjörn

    2009-01-01

    This paper introduces a Web-based tool that has been developed to facilitate learning object annotation in agricultural learning repositories with IEEE LOM-compliant metadata. More specifically, it presents how an application profile of the IEEE LOM standard has been developed for the description of learning objects on organic agriculture and agroecology. Then, it describes the design and prototype development of the Organic.Edunet repository tool: a Web-based for annotating learning objects ...

  4. Services for annotation of biomedical text

    OpenAIRE

    Hakenberg, Jörg

    2008-01-01

    Motivation: Text mining in the biomedical domain in recent years has focused on the development of tools for recognizing named entities and extracting relations. Such research resulted from the need for such tools as basic components for more advanced solutions. Named entity recognition, entity mention normalization, and relationship extraction now have reached a stage where they perform comparably to human annotators (considering inter--annotator agreement, measured in many studies to be aro...

  5. Fluid Annotations in a Open World

    DEFF Research Database (Denmark)

    Zellweger, Polle Trescott; Bouvin, Niels Olof; Jehøj, Henning;

    2001-01-01

    Fluid Documents use animated typographical changes to provide a novel and appealing user experience for hypertext browsing and for viewing document annotations in context. This paper describes an effort to broaden the utility of Fluid Documents by using the open hypermedia Arakne Environment to l...... to layer fluid annotations and links on top of abitrary HTML pages on the World Wide Web. Changes to both Fluid Documents and Arakne are required....

  6. Annotating Honorifics Denoting Social Ranking of Referents

    OpenAIRE

    Nariyama, Shigeko; Nakaiwa, Hiromi; Siegel, Melanie

    2011-01-01

    This paper proposes an annotating scheme that encodes honorifics (respectful words). Honorifics are used extensively in Japanese, reflecting the social relationship (e.g. social ranks and age) of the referents. This referential information is vital for resolving zero pronouns and improving machine translation outputs. Annotating honorifics is a complex task that involves identifying a predicate with honorifics, assigning ranks to referents of the predicate, calibrating the ranks, and co...

  7. Collaborative annotation of 3D crystallographic models.

    Science.gov (United States)

    Hunter, J; Henderson, M; Khan, I

    2007-01-01

    This paper describes the AnnoCryst system-a tool that was designed to enable authenticated collaborators to share online discussions about 3D crystallographic structures through the asynchronous attachment, storage, and retrieval of annotations. Annotations are personal comments, interpretations, questions, assessments, or references that can be attached to files, data, digital objects, or Web pages. The AnnoCryst system enables annotations to be attached to 3D crystallographic models retrieved from either private local repositories (e.g., Fedora) or public online databases (e.g., Protein Data Bank or Inorganic Crystal Structure Database) via a Web browser. The system uses the Jmol plugin for viewing and manipulating the 3D crystal structures but extends Jmol by providing an additional interface through which annotations can be created, attached, stored, searched, browsed, and retrieved. The annotations are stored on a standardized Web annotation server (Annotea), which has been extended to support 3D macromolecular structures. Finally, the system is embedded within a security framework that is capable of authenticating users and restricting access only to trusted colleagues.

  8. JGI Plant Genomics Gene Annotation Pipeline

    Energy Technology Data Exchange (ETDEWEB)

    Shu, Shengqiang; Rokhsar, Dan; Goodstein, David; Hayes, David; Mitros, Therese

    2014-07-14

    Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward this aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.

  9. An annotation based approach to support design communication

    CERN Document Server

    Hisarciklilar, Onur

    2007-01-01

    The aim of this paper is to propose an approach based on the concept of annotation for supporting design communication. In this paper, we describe a co-operative design case study where we analyse some annotation practices, mainly focused on design minutes recorded during project reviews. We point out specific requirements concerning annotation needs. Based on these requirements, we propose an annotation model, inspired from the Speech Act Theory (SAT) to support communication in a 3D digital environment. We define two types of annotations in the engineering design context, locutionary and illocutionary annotations. The annotations we describe in this paper are materialised by a set of digital artefacts, which have a semantic dimension allowing express/record elements of technical justifications, traces of contradictory debates, etc. In this paper, we first clarify the semantic annotation concept, and we define general properties of annotations in the engineering design context, and the role of annotations in...

  10. A fast and cost-effective approach to develop and map EST-SSR markers: oak as a case study

    Directory of Open Access Journals (Sweden)

    Cherubini Marcello

    2010-10-01

    Full Text Available Abstract Background Expressed Sequence Tags (ESTs are a source of simple sequence repeats (SSRs that can be used to develop molecular markers for genetic studies. The availability of ESTs for Quercus robur and Quercus petraea provided a unique opportunity to develop microsatellite markers to accelerate research aimed at studying adaptation of these long-lived species to their environment. As a first step toward the construction of a SSR-based linkage map of oak for quantitative trait locus (QTL mapping, we describe the mining and survey of EST-SSRs as well as a fast and cost-effective approach (bin mapping to assign these markers to an approximate map position. We also compared the level of polymorphism between genomic and EST-derived SSRs and address the transferability of EST-SSRs in Castanea sativa (chestnut. Results A catalogue of 103,000 Sanger ESTs was assembled into 28,024 unigenes from which 18.6% presented one or more SSR motifs. More than 42% of these SSRs corresponded to trinucleotides. Primer pairs were designed for 748 putative unigenes. Overall 37.7% (283 were found to amplify a single polymorphic locus in a reference full-sib pedigree of Quercus robur. The usefulness of these loci for establishing a genetic map was assessed using a bin mapping approach. Bin maps were constructed for the male and female parental tree for which framework linkage maps based on AFLP markers were available. The bin set consisting of 14 highly informative offspring selected based on the number and position of crossover sites. The female and male maps comprised 44 and 37 bins, with an average bin length of 16.5 cM and 20.99 cM, respectively. A total of 256 EST-SSRs were assigned to bins and their map position was further validated by linkage mapping. EST-SSRs were found to be less polymorphic than genomic SSRs, but their transferability rate to chestnut, a phylogenetically related species to oak, was higher. Conclusion We have generated a bin map for oak comprising 256 EST-SSRs. This resource constitutes a first step toward the establishment of a gene-based map for this genus that will facilitate the dissection of QTLs affecting complex traits of ecological importance.

  11. Web Database Query Interface Annotation Based on User Collaboration

    Institute of Scientific and Technical Information of China (English)

    LIU Wei; LIN Can; MENG Xiaofeng

    2006-01-01

    A vision based query interface annotation method is used to relate attributes and form elements in form-based web query interfaces, this method can reach accuracy of 82%.And a user participation method is used to tune the result; user can answer "yes" or "no" for existing annotations, or manually annotate form elements.Mass feedback is added to the annotation algorithm to produce more accurate result.By this approach, query interface annotation can reach a perfect accuracy.

  12. GO annotation in InterPro: why stability does not indicate accuracy in a sea of changing annotations.

    Science.gov (United States)

    Sangrador-Vegas, Amaia; Mitchell, Alex L; Chang, Hsin-Yu; Yong, Siew-Yit; Finn, Robert D

    2016-01-01

    The removal of annotation from biological databases is often perceived as an indicator of erroneous annotation. As a corollary, annotation stability is considered to be a measure of reliability. However, diverse data-driven events can affect the stability of annotations in both primary protein sequence databases and the protein family databases that are built upon the sequence databases and used to help annotate them. Here, we describe some of these events and their consequences for the InterPro database, and demonstrate that annotation removal or reassignment is not always linked to incorrect annotation by the curator. Database URL: http://www.ebi.ac.uk/interpro.

  13. Annotated chemical patent corpus: a gold standard for text mining.

    Directory of Open Access Journals (Sweden)

    Saber A Akhondi

    Full Text Available Exploring the chemical and biological space covered by patent applications is crucial in early-stage medicinal chemistry activities. Patent analysis can provide understanding of compound prior art, novelty checking, validation of biological assays, and identification of new starting points for chemical exploration. Extracting chemical and biological entities from patents through manual extraction by expert curators can take substantial amount of time and resources. Text mining methods can help to ease this process. To validate the performance of such methods, a manually annotated patent corpus is essential. In this study we have produced a large gold standard chemical patent corpus. We developed annotation guidelines and selected 200 full patents from the World Intellectual Property Organization, United States Patent and Trademark Office, and European Patent Office. The patents were pre-annotated automatically and made available to four independent annotator groups each consisting of two to ten annotators. The annotators marked chemicals in different subclasses, diseases, targets, and modes of action. Spelling mistakes and spurious line break due to optical character recognition errors were also annotated. A subset of 47 patents was annotated by at least three annotator groups, from which harmonized annotations and inter-annotator agreement scores were derived. One group annotated the full set. The patent corpus includes 400,125 annotations for the full set and 36,537 annotations for the harmonized set. All patents and annotated entities are publicly available at www.biosemantics.org.

  14. MixtureTree annotator: a program for automatic colorization and visual annotation of MixtureTree.

    Directory of Open Access Journals (Sweden)

    Shu-Chuan Chen

    Full Text Available The MixtureTree Annotator, written in JAVA, allows the user to automatically color any phylogenetic tree in Newick format generated from any phylogeny reconstruction program and output the Nexus file. By providing the ability to automatically color the tree by sequence name, the MixtureTree Annotator provides a unique advantage over any other programs which perform a similar function. In addition, the MixtureTree Annotator is the only package that can efficiently annotate the output produced by MixtureTree with mutation information and coalescent time information. In order to visualize the resulting output file, a modified version of FigTree is used. Certain popular methods, which lack good built-in visualization tools, for example, MEGA, Mesquite, PHY-FI, TreeView, treeGraph and Geneious, may give results with human errors due to either manually adding colors to each node or with other limitations, for example only using color based on a number, such as branch length, or by taxonomy. In addition to allowing the user to automatically color any given Newick tree by sequence name, the MixtureTree Annotator is the only method that allows the user to automatically annotate the resulting tree created by the MixtureTree program. The MixtureTree Annotator is fast and easy-to-use, while still allowing the user full control over the coloring and annotating process.

  15. MixtureTree annotator: a program for automatic colorization and visual annotation of MixtureTree.

    Science.gov (United States)

    Chen, Shu-Chuan; Ogata, Aaron

    2015-01-01

    The MixtureTree Annotator, written in JAVA, allows the user to automatically color any phylogenetic tree in Newick format generated from any phylogeny reconstruction program and output the Nexus file. By providing the ability to automatically color the tree by sequence name, the MixtureTree Annotator provides a unique advantage over any other programs which perform a similar function. In addition, the MixtureTree Annotator is the only package that can efficiently annotate the output produced by MixtureTree with mutation information and coalescent time information. In order to visualize the resulting output file, a modified version of FigTree is used. Certain popular methods, which lack good built-in visualization tools, for example, MEGA, Mesquite, PHY-FI, TreeView, treeGraph and Geneious, may give results with human errors due to either manually adding colors to each node or with other limitations, for example only using color based on a number, such as branch length, or by taxonomy. In addition to allowing the user to automatically color any given Newick tree by sequence name, the MixtureTree Annotator is the only method that allows the user to automatically annotate the resulting tree created by the MixtureTree program. The MixtureTree Annotator is fast and easy-to-use, while still allowing the user full control over the coloring and annotating process.

  16. Genome Annotation Transfer Utility (GATU: rapid annotation of viral genomes using a closely related reference genome

    Directory of Open Access Journals (Sweden)

    Upton Chris

    2006-06-01

    Full Text Available Abstract Background Since DNA sequencing has become easier and cheaper, an increasing number of closely related viral genomes have been sequenced. However, many of these have been deposited in GenBank without annotations, severely limiting their value to researchers. While maintaining comprehensive genomic databases for a set of virus families at the Viral Bioinformatics Resource Center http://www.biovirus.org and Viral Bioinformatics – Canada http://www.virology.ca, we found that researchers were unnecessarily spending time annotating viral genomes that were close relatives of already annotated viruses. We have therefore designed and implemented a novel tool, Genome Annotation Transfer Utility (GATU, to transfer annotations from a previously annotated reference genome to a new target genome, thereby greatly reducing this laborious task. Results GATU transfers annotations from a reference genome to a closely related target genome, while still giving the user final control over which annotations should be included. GATU also detects open reading frames present in the target but not the reference genome and provides the user with a variety of bioinformatics tools to quickly determine if these ORFs should also be included in the annotation. After this process is complete, GATU saves the newly annotated genome as a GenBank, EMBL or XML-format file. The software is coded in Java and runs on a variety of computer platforms. Its user-friendly Graphical User Interface is specifically designed for users trained in the biological sciences. Conclusion GATU greatly simplifies the initial stages of genome annotation by using a closely related genome as a reference. It is not intended to be a gene prediction tool or a "complete" annotation system, but we have found that it significantly reduces the time required for annotation of genes and mature peptides as well as helping to standardize gene names between related organisms by transferring reference genome

  17. Automated analysis and annotation of basketball video

    Science.gov (United States)

    Saur, Drew D.; Tan, Yap-Peng; Kulkarni, Sanjeev R.; Ramadge, Peter J.

    1997-01-01

    Automated analysis and annotation of video sequences are important for digital video libraries, content-based video browsing and data mining projects. A successful video annotation system should provide users with useful video content summary in a reasonable processing time. Given the wide variety of video genres available today, automatically extracting meaningful video content for annotation still remains hard by using current available techniques. However, a wide range video has inherent structure such that some prior knowledge about the video content can be exploited to improve our understanding of the high-level video semantic content. In this paper, we develop tools and techniques for analyzing structured video by using the low-level information available directly from MPEG compressed video. Being able to work directly in the video compressed domain can greatly reduce the processing time and enhance storage efficiency. As a testbed, we have developed a basketball annotation system which combines the low-level information extracted from MPEG stream with the prior knowledge of basketball video structure to provide high level content analysis, annotation and browsing for events such as wide- angle and close-up views, fast breaks, steals, potential shots, number of possessions and possession times. We expect our approach can also be extended to structured video in other domains.

  18. Critical Assessment of Function Annotation Meeting, 2011

    Energy Technology Data Exchange (ETDEWEB)

    Friedberg, Iddo

    2015-01-21

    The Critical Assessment of Function Annotation meeting was held July 14-15, 2011 at the Austria Conference Center in Vienna, Austria. There were 73 registered delegates at the meeting. We thank the DOE for this award. It helped us organize and support a scientific meeting AFP 2011 as a special interest group (SIG) meeting associated with the ISMB 2011 conference. The conference was held in Vienna, Austria, in July 2011. The AFP SIG was held on July 15-16, 2011 (immediately preceding the conference). The meeting consisted of two components, the first being a series of talks (invited and contributed) and discussion sections dedicated to protein function research, with an emphasis on the theory and practice of computational methods utilized in functional annotation. The second component provided a large-scale assessment of computational methods through participation in the Critical Assessment of Functional Annotation (CAFA).

  19. Annotation of selection strengths in viral genomes

    DEFF Research Database (Denmark)

    McCauley, Stephen; de Groot, Saskia; Mailund, Thomas;

    2007-01-01

    - and intergenomic regions. The presence of multiple coding regions complicates the concept of Ka/Ks ratio, and thus begs for an alternative approach when investigating selection strengths. Building on the paper by McCauley & Hein (2006), we develop a method for annotating a viral genome coding in overlapping...... may thus achieve an annotation both of coding regions as well as selection strengths, allowing us to investigate different selection patterns and hypotheses. Results: We illustrate our method by applying it to a multiple alignment of four HIV2 sequences, as well as four Hepatitis B sequences. We...... obtain an annotation of the coding regions, as well as a posterior probability for each site of the strength of selection acting on it. From this we may deduce the average posterior selection acting on the different genes. Whilst we are encouraged to see in HIV2, that the known to be conserved genes gag...

  20. Graph Annotations in Modeling Complex Network Topologies

    CERN Document Server

    Dimitropoulos, Xenofontas; Vahdat, Amin; Riley, George

    2007-01-01

    The coarsest approximation of the structure of a complex network, such as the Internet, is a simple undirected unweighted graph. This approximation, however, loses too much detail. In reality, objects represented by vertices and edges in such a graph possess some non-trivial internal structure that varies across and differentiates among distinct types of links or nodes. In this work, we abstract such additional information as network annotations. We introduce a network topology modeling framework that treats annotations as an extended correlation profile of a network. Assuming we have this profile measured for a given network, we present an algorithm to rescale it in order to construct networks of varying size that still reproduce the original measured annotation profile. Using this methodology, we accurately capture the network properties essential for realistic simulations of network applications and protocols, or any other simulations involving complex network topologies, including modeling and simulation ...

  1. I2Cnet medical image annotation service.

    Science.gov (United States)

    Chronaki, C E; Zabulis, X; Orphanoudakis, S C

    1997-01-01

    I2Cnet (Image Indexing by Content network) aims to provide services related to the content-based management of images in healthcare over the World-Wide Web. Each I2Cnet server maintains an autonomous repository of medical images and related information. The annotation service of I2Cnet allows specialists to interact with the contents of the repository, adding comments or illustrations to medical images of interest. I2Cnet annotations may be communicated to other users via e-mail or posted to I2Cnet for inclusion in its local repositories. This paper discusses the annotation service of I2Cnet and argues that such services pave the way towards the evolution of active digital medical image libraries.

  2. Corpus annotation for mining biomedical events from literature

    Directory of Open Access Journals (Sweden)

    Tsujii Jun'ichi

    2008-01-01

    Full Text Available Abstract Background Advanced Text Mining (TM such as semantic enrichment of papers, event or relation extraction, and intelligent Question Answering have increasingly attracted attention in the bio-medical domain. For such attempts to succeed, text annotation from the biological point of view is indispensable. However, due to the complexity of the task, semantic annotation has never been tried on a large scale, apart from relatively simple term annotation. Results We have completed a new type of semantic annotation, event annotation, which is an addition to the existing annotations in the GENIA corpus. The corpus has already been annotated with POS (Parts of Speech, syntactic trees, terms, etc. The new annotation was made on half of the GENIA corpus, consisting of 1,000 Medline abstracts. It contains 9,372 sentences in which 36,114 events are identified. The major challenges during event annotation were (1 to design a scheme of annotation which meets specific requirements of text annotation, (2 to achieve biology-oriented annotation which reflect biologists' interpretation of text, and (3 to ensure the homogeneity of annotation quality across annotators. To meet these challenges, we introduced new concepts such as Single-facet Annotation and Semantic Typing, which have collectively contributed to successful completion of a large scale annotation. Conclusion The resulting event-annotated corpus is the largest and one of the best in quality among similar annotation efforts. We expect it to become a valuable resource for NLP (Natural Language Processing-based TM in the bio-medical domain.

  3. Protein function annotation by local binding site surface similarity.

    Science.gov (United States)

    Spitzer, Russell; Cleves, Ann E; Varela, Rocco; Jain, Ajay N

    2014-04-01

    Hundreds of protein crystal structures exist for proteins whose function cannot be confidently determined from sequence similarity. Surflex-PSIM, a previously reported surface-based protein similarity algorithm, provides an alternative method for hypothesizing function for such proteins. The method now supports fully automatic binding site detection and is fast enough to screen comprehensive databases of protein binding sites. The binding site detection methodology was validated on apo/holo cognate protein pairs, correctly identifying 91% of ligand binding sites in holo structures and 88% in apo structures where corresponding sites existed. For correctly detected apo binding sites, the cognate holo site was the most similar binding site 87% of the time. PSIM was used to screen a set of proteins that had poorly characterized functions at the time of crystallization, but were later biochemically annotated. Using a fully automated protocol, this set of 8 proteins was screened against ∼60,000 ligand binding sites from the PDB. PSIM correctly identified functional matches that predated query protein biochemical annotation for five out of the eight query proteins. A panel of 12 currently unannotated proteins was also screened, resulting in a large number of statistically significant binding site matches, some of which suggest likely functions for the poorly characterized proteins.

  4. An Annotated Dataset of 14 Meat Images

    DEFF Research Database (Denmark)

    Stegmann, Mikkel Bille

    2002-01-01

    This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given.......This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....

  5. Software for computing and annotating genomic ranges.

    Directory of Open Access Journals (Sweden)

    Michael Lawrence

    Full Text Available We describe Bioconductor infrastructure for representing and computing on annotated genomic ranges and integrating genomic data with the statistical computing features of R and its extensions. At the core of the infrastructure are three packages: IRanges, GenomicRanges, and GenomicFeatures. These packages provide scalable data structures for representing annotated ranges on the genome, with special support for transcript structures, read alignments and coverage vectors. Computational facilities include efficient algorithms for overlap and nearest neighbor detection, coverage calculation and other range operations. This infrastructure directly supports more than 80 other Bioconductor packages, including those for sequence analysis, differential expression analysis and visualization.

  6. Annotation for information extraction from mammography reports.

    Science.gov (United States)

    Bozkurt, Selen; Gulkesen, Kemal Hakan; Rubin, Daniel

    2013-01-01

    Inter and intra-observer variability in mammographic interpretation is a challenging problem, and decision support systems (DSS) may be helpful to reduce variation in practice. Since radiology reports are created as unstructured text reports, Natural language processing (NLP) techniques are needed to extract structured information from reports in order to provide the inputs to DSS. Before creating NLP systems, producing high quality annotated data set is essential. The goal of this project is to develop an annotation schema to guide the information extraction tasks needed from free-text mammography reports. PMID:23823416

  7. Solar Tutorial and Annotation Resource (STAR)

    Science.gov (United States)

    Showalter, C.; Rex, R.; Hurlburt, N. E.; Zita, E. J.

    2009-12-01

    We have written a software suite designed to facilitate solar data analysis by scientists, students, and the public, anticipating enormous datasets from future instruments. Our “STAR" suite includes an interactive learning section explaining 15 classes of solar events. Users learn software tools that exploit humans’ superior ability (over computers) to identify many events. Annotation tools include time slice generation to quantify loop oscillations, the interpolation of event shapes using natural cubic splines (for loops, sigmoids, and filaments) and closed cubic splines (for coronal holes). Learning these tools in an environment where examples are provided prepares new users to comfortably utilize annotation software with new data. Upon completion of our tutorial, users are presented with media of various solar events and asked to identify and annotate the images, to test their mastery of the system. Goals of the project include public input into the data analysis of very large datasets from future solar satellites, and increased public interest and knowledge about the Sun. In 2010, the Solar Dynamics Observatory (SDO) will be launched into orbit. SDO’s advancements in solar telescope technology will generate a terabyte per day of high-quality data, requiring innovation in data management. While major projects develop automated feature recognition software, so that computers can complete much of the initial event tagging and analysis, still, that software cannot annotate features such as sigmoids, coronal magnetic loops, coronal dimming, etc., due to large amounts of data concentrated in relatively small areas. Previously, solar physicists manually annotated these features, but with the imminent influx of data it is unrealistic to expect specialized researchers to examine every image that computers cannot fully process. A new approach is needed to efficiently process these data. Providing analysis tools and data access to students and the public have proven

  8. Ranking Biomedical Annotations with Annotator’s Semantic Relevancy

    Directory of Open Access Journals (Sweden)

    Aihua Wu

    2014-01-01

    Full Text Available Biomedical annotation is a common and affective artifact for researchers to discuss, show opinion, and share discoveries. It becomes increasing popular in many online research communities, and implies much useful information. Ranking biomedical annotations is a critical problem for data user to efficiently get information. As the annotator’s knowledge about the annotated entity normally determines quality of the annotations, we evaluate the knowledge, that is, semantic relationship between them, in two ways. The first is extracting relational information from credible websites by mining association rules between an annotator and a biomedical entity. The second way is frequent pattern mining from historical annotations, which reveals common features of biomedical entities that an annotator can annotate with high quality. We propose a weighted and concept-extended RDF model to represent an annotator, a biomedical entity, and their background attributes and merge information from the two ways as the context of an annotator. Based on that, we present a method to rank the annotations by evaluating their correctness according to user’s vote and the semantic relevancy between the annotator and the annotated entity. The experimental results show that the approach is applicable and efficient even when data set is large.

  9. An annotated history of container candidate material selection

    International Nuclear Information System (INIS)

    This paper documents events in the Nevada Nuclear Waste Storage Investigations (NNWSI) Project that have influenced the selection of metals and alloys proposed for fabrication of waste package containers for permanent disposal of high-level nuclear waste in a repository at Yucca Mountain, Nevada. The time period from 1981 to 1988 is covered in this annotated history. The history traces the candidate materials that have been considered at different stages of site characterization planning activities. At present, six candidate materials are considered and described in the 1988 Consultation Draft of the NNWSI Site Characterization Plan (SCP). The six materials are grouped into two alloy families, copper-base materials and iron to nickel-base materials with an austenitic structure. The three austenitic candidates resulted from a 1983 survey of a longer list of candidate materials; the other three candidates resulted from a special request from DOE in 1984 to evaluate copper and copper-base alloys. 24 refs., 2 tabs

  10. Bibliografia de Aztlan: An Annotated Chicano Bibliography.

    Science.gov (United States)

    Barrios, Ernie, Ed.

    More than 300 books and articles published from 1920 to 1971 are reviewed in this annotated bibliography of literature on the Chicano. The citations and reviews are categorized by subject area and deal with contemporary Chicano history, education, health, history of Mexico, literature, native Americans, philosophy, political science, pre-Columbian…

  11. Annotated bibliography of psychomotor testing. Technical report

    Energy Technology Data Exchange (ETDEWEB)

    Ervin, C.

    1987-03-01

    An annotated bibliography of 67 publications in the field of psychomotor testing has been prepared. The collection includes technical reports, journal articles, presented at scientific meetings, books and conference proceedings. The publications were assembled as preliminary work in the development of a dexterity test battery designed to measure the effects of chemical-defense-treatment drugs.

  12. Annotated Bibliography of EDGE2D Use

    International Nuclear Information System (INIS)

    This annotated bibliography is intended to help EDGE2D users, and particularly new users, find existing published literature that has used EDGE2D. Our idea is that a person can find existing studies which may relate to his intended use, as well as gain ideas about other possible applications by scanning the attached tables

  13. Genotyping and annotation of Affymetrix SNP arrays

    DEFF Research Database (Denmark)

    Lamy, Philippe; Andersen, Claus Lindbjerg; Wikman, Friedrik;

    2006-01-01

    allows us to annotate SNPs that have poor performance, either because of poor experimental conditions or because for one of the alleles the probes do not behave in a dose-response manner. Generally, our method agrees well with a method developed by Affymetrix. When both methods make a call they agree...

  14. SNAD: sequence name annotation-based designer

    Directory of Open Access Journals (Sweden)

    Gorbalenya Alexander E

    2009-08-01

    Full Text Available Abstract Background A growing diversity of biological data is tagged with unique identifiers (UIDs associated with polynucleotides and proteins to ensure efficient computer-mediated data storage, maintenance, and processing. These identifiers, which are not informative for most people, are often substituted by biologically meaningful names in various presentations to facilitate utilization and dissemination of sequence-based knowledge. This substitution is commonly done manually that may be a tedious exercise prone to mistakes and omissions. Results Here we introduce SNAD (Sequence Name Annotation-based Designer that mediates automatic conversion of sequence UIDs (associated with multiple alignment or phylogenetic tree, or supplied as plain text list into biologically meaningful names and acronyms. This conversion is directed by precompiled or user-defined templates that exploit wealth of annotation available in cognate entries of external databases. Using examples, we demonstrate how this tool can be used to generate names for practical purposes, particularly in virology. Conclusion A tool for controllable annotation-based conversion of sequence UIDs into biologically meaningful names and acronyms has been developed and placed into service, fostering links between quality of sequence annotation, and efficiency of communication and knowledge dissemination among researchers.

  15. Organizational and Intercultural Communication: An Annotated Bibliography.

    Science.gov (United States)

    Constantinides, Helen; St. Amant, Kirk; Kampf, Connie

    2001-01-01

    Presents a 27-item annotated bibliography that overviews theories of organization from the viewpoint of culture, using five themes of organizational research as a framework. Notes that each section introduces specific theories of international, intercultural, or organizational communication, building upon them through a series of related articles,…

  16. Effective function annotation through catalytic residue conservation.

    Science.gov (United States)

    George, Richard A; Spriggs, Ruth V; Bartlett, Gail J; Gutteridge, Alex; MacArthur, Malcolm W; Porter, Craig T; Al-Lazikani, Bissan; Thornton, Janet M; Swindells, Mark B

    2005-08-30

    Because of the extreme impact of genome sequencing projects, protein sequences without accompanying experimental data now dominate public databases. Homology searches, by providing an opportunity to transfer functional information between related proteins, have become the de facto way to address this. Although a single, well annotated, close relationship will often facilitate sufficient annotation, this situation is not always the case, particularly if mutations are present in important functional residues. When only distant relationships are available, the transfer of function information is more tenuous, and the likelihood of encountering several well annotated proteins with different functions is increased. The consequence for a researcher is a range of candidate functions with little way of knowing which, if any, are correct. Here, we address the problem directly by introducing a computational approach to accurately identify and segregate related proteins into those with a functional similarity and those where function differs. This approach should find a wide range of applications, including the interpretation of genomics/proteomics data and the prioritization of targets for high-throughput structure determination. The method is generic, but here we concentrate on enzymes and apply high-quality catalytic site data. In addition to providing a series of comprehensive benchmarks to show the overall performance of our approach, we illustrate its utility with specific examples that include the correct identification of haptoglobin as a nonenzymatic relative of trypsin, discrimination of acid-d-amino acid ligases from a much larger ligase pool, and the successful annotation of BioH, a structural genomics target.

  17. Statistical mechanics of ontology based annotations

    CERN Document Server

    Hoyle, David C

    2016-01-01

    We present a statistical mechanical theory of the process of annotating an object with terms selected from an ontology. The term selection process is formulated as an ideal lattice gas model, but in a highly structured inhomogeneous field. The model enables us to explain patterns recently observed in real-world annotation data sets, in terms of the underlying graph structure of the ontology. By relating the external field strengths to the information content of each node in the ontology graph, the statistical mechanical model also allows us to propose a number of practical metrics for assessing the quality of both the ontology, and the annotations that arise from its use. Using the statistical mechanical formalism we also study an ensemble of ontologies of differing size and complexity; an analysis not readily performed using real data alone. Focusing on regular tree ontology graphs we uncover a rich set of scaling laws describing the growth in the optimal ontology size as the number of objects being annotate...

  18. Statistical mechanics of ontology based annotations

    Science.gov (United States)

    Hoyle, David C.; Brass, Andrew

    2016-01-01

    We present a statistical mechanical theory of the process of annotating an object with terms selected from an ontology. The term selection process is formulated as an ideal lattice gas model, but in a highly structured inhomogeneous field. The model enables us to explain patterns recently observed in real-world annotation data sets, in terms of the underlying graph structure of the ontology. By relating the external field strengths to the information content of each node in the ontology graph, the statistical mechanical model also allows us to propose a number of practical metrics for assessing the quality of both the ontology, and the annotations that arise from its use. Using the statistical mechanical formalism we also study an ensemble of ontologies of differing size and complexity; an analysis not readily performed using real data alone. Focusing on regular tree ontology graphs we uncover a rich set of scaling laws describing the growth in the optimal ontology size as the number of objects being annotated increases. In doing so we provide a further possible measure for assessment of ontologies.

  19. Nutrition & Adolescent Pregnancy: A Selected Annotated Bibliography.

    Science.gov (United States)

    National Agricultural Library (USDA), Washington, DC.

    This annotated bibliography on nutrition and adolescent pregnancy is intended to be a source of technical assistance for nurses, nutritionists, physicians, educators, social workers, and other personnel concerned with improving the health of teenage mothers and their babies. It is divided into two major sections. The first section lists selected…

  20. La Mujer Chicana: An Annotated Bibliography, 1976.

    Science.gov (United States)

    Chapa, Evey, Ed.; And Others

    Intended to provide interested persons, researchers, and educators with information about "la mujer Chicana", this annotated bibliography cites 320 materials published between 1916 and 1975, with the majority being between 1960 and 1975. The 12 sections cover the following subject areas: Chicana publications; Chicana feminism and "el movimiento";…

  1. Annotated Bibliography of EDGE2D Use

    Energy Technology Data Exchange (ETDEWEB)

    J.D. Strachan and G. Corrigan

    2005-06-24

    This annotated bibliography is intended to help EDGE2D users, and particularly new users, find existing published literature that has used EDGE2D. Our idea is that a person can find existing studies which may relate to his intended use, as well as gain ideas about other possible applications by scanning the attached tables.

  2. An Annotated Publications List on Homelessness.

    Science.gov (United States)

    Tutunjian, Beth Ann

    This annotated publications list on homelessness contains citations for 19 publications, most of which deal with problems of alcohol or drug abuse among homeless persons. Citations are listed alphabetically by author and cover the topics of homelessness and alcoholism, drug abuse, public policy, research methodologies, mental illness, alcohol- and…

  3. Teleconferencing, an annotated bibliography, volume 3

    Science.gov (United States)

    Shervis, K.

    1971-01-01

    In this annotated and indexed listing of works on teleconferencing, emphasis has been placed upon teleconferencing as real-time, two way audio communication with or without visual aids. However, works on the use of television in two-way or multiway nets, data transmission, regional communications networks and on telecommunications in general are also included.

  4. Postsecondary Peer Cooperative Learning Programs: Annotated Bibliography

    Science.gov (United States)

    Arendale, David R., Comp.

    2005-01-01

    Purpose: This annotated bibliography is focused intentionally on postsecondary peer cooperative learning programs that increasing student achievement. Peer learning has been popular in education for decades. As both a pedagogy and learning strategy, it has been frequently adapted for a wide range of academic content areas at the elementary,…

  5. Small Group Communication: An Annotated Bibliography.

    Science.gov (United States)

    Gouran, Dennis S.; Guadagnino, Christopher S.

    This annotated bibliography includes sources of information that are primarily concerned with problem solving, decision making, and processes of social influence in small groups, and secondarily deal with other aspects of communication and interaction in groups, such as conflict management and negotiation. The 57 entries, all dating from 1980…

  6. Ludwig von Mises: An Annotated Bibliography.

    Science.gov (United States)

    Gordon, David

    A 117-item annotated bibliography of books, articles, essays, lectures, and reviews by economist Ludwig von Mises is presented. The bibliography is arranged chronologicaly, and is followed by an alphabetical listing of the citations, excluding books. An index and information on the Ludwig von Mises Institute at Auburn University (Alabama) are…

  7. Kwanzaa: A Selective Annotated Bibliography for Teachers.

    Science.gov (United States)

    Dupree, Sandra K., Comp.; Gillum, Holly A., Comp.

    This annotated bibliography about Kwanzaa, an end-of-the-year holiday that emphasizes an appreciation for the culture of African Americans, aims to provide ready access to information for classroom teachers. Noting that Kwanzaa (celebrated from December 26 to January 1) is an important cultural event, the bibliography states that the festival…

  8. DNAVis: interactive visualization of comparative genome annotations

    NARCIS (Netherlands)

    Fiers, M.W.E.J.; Wetering, van de H.; Peeters, T.H.J.M.; Wijk, van J.J.; Nap, J.P.H.

    2006-01-01

    The software package DNAVis offers a fast, interactive and real-time visualization of DNA sequences and their comparative genome annotations. DNAVis implements advanced methods of information visualization such as linked views, perspective walls and semantic zooming, in addition to the display of he

  9. Semantic Annotation to Support Automatic Taxonomy Classification

    DEFF Research Database (Denmark)

    Kim, Sanghee; Ahmed, Saeema; Wallace, Ken

    2006-01-01

    This paper presents a new taxonomy classification method that generates classification criteria from a small number of important sentences identified through semantic annotations, e.g. cause-effect. Rhetorical Structure Theory (RST) is used to discover the semantics (Mann et al. 1988). Specifically...

  10. Skin Cancer Education Materials: Selected Annotations.

    Science.gov (United States)

    National Cancer Inst. (NIH), Bethesda, MD.

    This annotated bibliography presents 85 entries on a variety of approaches to cancer education. The entries are grouped under three broad headings, two of which contain smaller sub-divisions. The first heading, Public Education, contains prevention and general information, and non-print materials. The second heading, Professional Education,…

  11. MEETING: Chlamydomonas Annotation Jamboree - October 2003

    Energy Technology Data Exchange (ETDEWEB)

    Grossman, Arthur R

    2007-04-13

    Shotgun sequencing of the nuclear genome of Chlamydomonas reinhardtii (Chlamydomonas throughout) was performed at an approximate 10X coverage by JGI. Roughly half of the genome is now contained on 26 scaffolds, all of which are at least 1.6 Mb, and the coverage of the genome is ~95%. There are now over 200,000 cDNA sequence reads that we have generated as part of the Chlamydomonas genome project (Grossman, 2003; Shrager et al., 2003; Grossman et al. 2007; Merchant et al., 2007); other sequences have also been generated by the Kasuza sequence group (Asamizu et al., 1999; Asamizu et al., 2000) or individual laboratories that have focused on specific genes. Shrager et al. (2003) placed the reads into distinct contigs (an assemblage of reads with overlapping nucleotide sequences), and contigs that group together as part of the same genes have been designated ACEs (assembly of contigs generated from EST information). All of the reads have also been mapped to the Chlamydomonas nuclear genome and the cDNAs and their corresponding genomic sequences have been reassembled, and the resulting assemblage is called an ACEG (an Assembly of contiguous EST sequences supported by genomic sequence) (Jain et al., 2007). Most of the unique genes or ACEGs are also represented by gene models that have been generated by the Joint Genome Institute (JGI, Walnut Creek, CA). These gene models have been placed onto the DNA scaffolds and are presented as a track on the Chlamydomonas genome browser associated with the genome portal (http://genome.jgi-psf.org/Chlre3/Chlre3.home.html). Ultimately, the meeting grant awarded by DOE has helped enormously in the development of an annotation pipeline (a set of guidelines used in the annotation of genes) and resulted in high quality annotation of over 4,000 genes; the annotators were from both Europe and the USA. Some of the people who led the annotation initiative were Arthur Grossman, Olivier Vallon, and Sabeeha Merchant (with many individual

  12. Computer systems for annotation of single molecule fragments

    Energy Technology Data Exchange (ETDEWEB)

    Schwartz, David Charles; Severin, Jessica

    2016-07-19

    There are provided computer systems for visualizing and annotating single molecule images. Annotation systems in accordance with this disclosure allow a user to mark and annotate single molecules of interest and their restriction enzyme cut sites thereby determining the restriction fragments of single nucleic acid molecules. The markings and annotations may be automatically generated by the system in certain embodiments and they may be overlaid translucently onto the single molecule images. An image caching system may be implemented in the computer annotation systems to reduce image processing time. The annotation systems include one or more connectors connecting to one or more databases capable of storing single molecule data as well as other biomedical data. Such diverse array of data can be retrieved and used to validate the markings and annotations. The annotation systems may be implemented and deployed over a computer network. They may be ergonomically optimized to facilitate user interactions.

  13. Annotation Method (AM): SE22_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available ether with predicted molecular formulae and putative structures, were provided as metabolite annotations. Comparison with public data...bases was performed. A grading system was introduced to describe the evidence supporting the annotations. ...

  14. ASAP: Amplification, sequencing & annotation of plastomes

    Directory of Open Access Journals (Sweden)

    Folta Kevin M

    2005-12-01

    Full Text Available Abstract Background Availability of DNA sequence information is vital for pursuing structural, functional and comparative genomics studies in plastids. Traditionally, the first step in mining the valuable information within a chloroplast genome requires sequencing a chloroplast plasmid library or BAC clones. These activities involve complicated preparatory procedures like chloroplast DNA isolation or identification of the appropriate BAC clones to be sequenced. Rolling circle amplification (RCA is being used currently to amplify the chloroplast genome from purified chloroplast DNA and the resulting products are sheared and cloned prior to sequencing. Herein we present a universal high-throughput, rapid PCR-based technique to amplify, sequence and assemble plastid genome sequence from diverse species in a short time and at reasonable cost from total plant DNA, using the large inverted repeat region from strawberry and peach as proof of concept. The method exploits the highly conserved coding regions or intergenic regions of plastid genes. Using an informatics approach, chloroplast DNA sequence information from 5 available eudicot plastomes was aligned to identify the most conserved regions. Cognate primer pairs were then designed to generate ~1 – 1.2 kb overlapping amplicons from the inverted repeat region in 14 diverse genera. Results 100% coverage of the inverted repeat region was obtained from Arabidopsis, tobacco, orange, strawberry, peach, lettuce, tomato and Amaranthus. Over 80% coverage was obtained from distant species, including Ginkgo, loblolly pine and Equisetum. Sequence from the inverted repeat region of strawberry and peach plastome was obtained, annotated and analyzed. Additionally, a polymorphic region identified from gel electrophoresis was sequenced from tomato and Amaranthus. Sequence analysis revealed large deletions in these species relative to tobacco plastome thus exhibiting the utility of this method for structural and

  15. Ontology Learning and Semantic Annotation: a Necessary Symbiosis

    OpenAIRE

    Giovannetti, Emiliano; Marchi, Simone; Montemagni, Simonetta; Bartolini, Roberto

    2008-01-01

    Semantic annotation of text requires the dynamic merging of linguistically structured information and a ?world model?, usually represented as a domain-specific ontology. On the other hand, the process of engineering a domain-ontology through semi-automatic ontology learning system requires the availability of a considerable amount of semantically annotated documents. Facing this bootstrapping paradox requires an incremental process of annotation-acquisition-annotation, whereby domain-specific...

  16. SURFACE: a database of protein surface regions for functional annotation

    OpenAIRE

    Ferrè, Fabrizio; Ausiello, Gabriele; Zanzoni, Andreas; Helmer-Citterich, Manuela

    2004-01-01

    The SURFACE (SUrface Residues and Functions Annotated, Compared and Evaluated, URL http://cbm.bio.uniroma2.it/surface/) database is a repository of annotated and compared protein surface regions. SURFACE contains the results of a large-scale protein annotation and local structural comparison project. A non-redundant set of protein chains is used to build a database of protein surface patches, defined as putative surface functional sites. Each patch is annotated with sequence and structure-der...

  17. AnnaBot: A Static Verifier for Java Annotation Usage

    OpenAIRE

    Ian Darwin

    2010-01-01

    This paper describes AnnaBot, one of the first tools to verify correct use of Annotation-based metadata in the Java programming language. These Annotations are a standard Java 5 mechanism used to attach metadata to types, methods, or fields without using an external configuration file. A binary representation of the Annotation becomes part of the compiled “.class” file, for inspection by another component or library at runtime. Java Annotations were introduced into the Java language in 2004 a...

  18. BEACON: automated tool for Bacterial GEnome Annotation ComparisON

    KAUST Repository

    Kalkatawi, Manal Matoq Saeed

    2015-08-18

    Background Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). Results The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON’s utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27 %, while the number of genes without any function assignment is reduced. Conclusions We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/

  19. A unified representation for morphological, syntactic, semantic, and referential annotations

    OpenAIRE

    Hinrichs, Erhard W.; Kübler, Sandra; Naumann, Karin

    2008-01-01

    This paper reports on the SYN-RA (SYNtax-based Reference Annotation) project, an on-going project of annotating German newspaper texts with referential relations. The project has developed an inventory of anaphoric and coreference relations for German in the context of a unified, XML-based annotation scheme for combining morphological, syntactic, semantic, and anaphoric information. The paper discusses how this unified annotation scheme relates to other formats currently discussed in the lite...

  20. Annotation of the protein coding regions of the equine genome

    DEFF Research Database (Denmark)

    Hestand, Matthew S.; Kalbfleisch, Theodore S.; Coleman, Stephen J.;

    2015-01-01

    Current gene annotation of the horse genome is largely derived from in silico predictions and cross-species alignments. Only a small number of genes are annotated based on equine EST and mRNA sequences. To expand the number of equine genes annotated from equine experimental evidence, we sequenced...

  1. Automatic annotation of head velocity and acceleration in Anvil

    DEFF Research Database (Denmark)

    Jongejan, Bart

    2012-01-01

    We describe an automatic face tracker plugin for the ANVIL annotation tool. The face tracker produces data for velocity and for acceleration in two dimensions. We compare the annotations generated by the face tracking algorithm with independently made manual annotations for head movements...

  2. Systematic interpretation of microarray data using experiment annotations

    Directory of Open Access Journals (Sweden)

    Frohme Marcus

    2006-12-01

    Full Text Available Abstract Background Up to now, microarray data are mostly assessed in context with only one or few parameters characterizing the experimental conditions under study. More explicit experiment annotations, however, are highly useful for interpreting microarray data, when available in a statistically accessible format. Results We provide means to preprocess these additional data, and to extract relevant traits corresponding to the transcription patterns under study. We found correspondence analysis particularly well-suited for mapping such extracted traits. It visualizes associations both among and between the traits, the hereby annotated experiments, and the genes, revealing how they are all interrelated. Here, we apply our methods to the systematic interpretation of radioactive (single channel and two-channel data, stemming from model organisms such as yeast and drosophila up to complex human cancer samples. Inclusion of technical parameters allows for identification of artifacts and flaws in experimental design. Conclusion Biological and clinical traits can act as landmarks in transcription space, systematically mapping the variance of large datasets from the predominant changes down toward intricate details.

  3. Model and Interoperability using Meta Data Annotations

    Science.gov (United States)

    David, O.

    2011-12-01

    Software frameworks and architectures are in need for meta data to efficiently support model integration. Modelers have to know the context of a model, often stepping into modeling semantics and auxiliary information usually not provided in a concise structure and universal format, consumable by a range of (modeling) tools. XML often seems the obvious solution for capturing meta data, but its wide adoption to facilitate model interoperability is limited by XML schema fragmentation, complexity, and verbosity outside of a data-automation process. Ontologies seem to overcome those shortcomings, however the practical significance of their use remains to be demonstrated. OMS version 3 took a different approach for meta data representation. The fundamental building block of a modular model in OMS is a software component representing a single physical process, calibration method, or data access approach. Here, programing language features known as Annotations or Attributes were adopted. Within other (non-modeling) frameworks it has been observed that annotations lead to cleaner and leaner application code. Framework-supported model integration, traditionally accomplished using Application Programming Interfaces (API) calls is now achieved using descriptive code annotations. Fully annotated components for various hydrological and Ag-system models now provide information directly for (i) model assembly and building, (ii) data flow analysis for implicit multi-threading or visualization, (iii) automated and comprehensive model documentation of component dependencies, physical data properties, (iv) automated model and component testing, calibration, and optimization, and (v) automated audit-traceability to account for all model resources leading to a particular simulation result. Such a non-invasive methodology leads to models and modeling components with only minimal dependencies on the modeling framework but a strong reference to its originating code. Since models and

  4. How well are protein structures annotated in secondary databases?

    Science.gov (United States)

    Rother, Kristian; Michalsky, Elke; Leser, Ulf

    2005-09-01

    We investigated to what extent Protein Data Bank (PDB) entries are annotated with second-party information based on existing cross-references between PDB and 15 other databases. We report 2 interesting findings. First, there is a clear "annotation gap" for structures less than 7 years old for secondary databases that are manually curated. Second, the examined databases overlap with each other quite well, dividing the PDB into 2 well-annotated thirds and one poorly annotated third. Both observations should be taken into account in any study depending on the selection of protein structures by their annotation.

  5. Image Semantic Automatic Annotation by Relevance Feedback

    Institute of Scientific and Technical Information of China (English)

    ZHANG Tong-zhen; SHEN Rui-min

    2007-01-01

    A large semantic gap exists between content based index retrieval (CBIR) and high-level semantic, additional semantic information should be attached to the images, it refers in three respects including semantic representation model, semantic information building and semantic retrieval techniques. In this paper, we introduce an associated semantic network and an automatic semantic annotation system. In the system, a semantic network model is employed as the semantic representation model, it uses semantic keywords, linguistic ontology and low-level features in semantic similarity calculating. Through several times of users' relevance feedback, semantic network is enriched automatically. To speed up the growth of semantic network and get a balance annotation, semantic seeds and semantic loners are employed especially.

  6. Exploiting Social Annotation for Automatic Resource Discovery

    CERN Document Server

    Plangprasopchok, Anon

    2007-01-01

    Information integration applications, such as mediators or mashups, that require access to information resources currently rely on users manually discovering and integrating them in the application. Manual resource discovery is a slow process, requiring the user to sift through results obtained via keyword-based search. Although search methods have advanced to include evidence from document contents, its metadata and the contents and link structure of the referring pages, they still do not adequately cover information sources -- often called ``the hidden Web''-- that dynamically generate documents in response to a query. The recently popular social bookmarking sites, which allow users to annotate and share metadata about various information sources, provide rich evidence for resource discovery. In this paper, we describe a probabilistic model of the user annotation process in a social bookmarking system del.icio.us. We then use the model to automatically find resources relevant to a particular information dom...

  7. A Concept Annotation System for Clinical Records

    CERN Document Server

    Kang, Ning; Afzal, Zubair; Singh, Bharat; Schuemie, Martijn J; van Mulligen, Erik M; Kors, Jan A

    2010-01-01

    Unstructured information comprises a valuable source of data in clinical records. For text mining in clinical records, concept extraction is the first step in finding assertions and relationships. This study presents a system developed for the annotation of medical concepts, including medical problems, tests, and treatments, mentioned in clinical records. The system combines six publicly available named entity recognition system into one framework, and uses a simple voting scheme that allows to tune precision and recall of the system to specific needs. The system provides both a web service interface and a UIMA interface which can be easily used by other systems. The system was tested in the fourth i2b2 challenge and achieved an F-score of 82.1% for the concept exact match task, a score which is among the top-ranking systems. To our knowledge, this is the first publicly available clinical record concept annotation system.

  8. A Novel Technique to Image Annotation using Neural Network

    Directory of Open Access Journals (Sweden)

    Pankaj Savita

    2013-03-01

    Full Text Available : Automatic annotation of digital pictures is a key technology for managing and retrieving images from large image collection. Traditional image semantics extraction and representation schemes were commonly divided into two categories, namely visual features and text annotations. However, visual feature scheme are difficult to extract and are often semantically inconsistent. On the other hand, the image semantics can be well represented by text annotations. It is also easier to retrieve images according to their annotations. Traditional image annotation techniques are time-consuming and requiring lots of human effort. In this paper we propose Neural Network based a novel approach to the problem of image annotation. These approaches are applied to the Image data set. Our main work is focused on the image annotation by using multilayer perceptron, which exhibits a clear-cut idea on application of multilayer perceptron with special features. MLP Algorithm helps us to discover the concealed relations between image data and annotation data, and annotate image according to such relations. By using this algorithm we can save more memory space, and in case of web applications, transferring of images and download should be fast. This paper reviews 50 image annotation systems using supervised machine learning Techniques to annotate images for image retrieval. Results obtained show that the multi layer perceptron Neural Network classifier outperforms conventional DST Technique.

  9. FragKB: structural and literature annotation resource of conserved peptide fragments and residues.

    Directory of Open Access Journals (Sweden)

    Ashish V Tendulkar

    Full Text Available BACKGROUND: FragKB (Fragment Knowledgebase is a repository of clusters of structurally similar fragments from proteins. Fragments are annotated with information at the level of sequence, structure and function, integrating biological descriptions derived from multiple existing resources and text mining. METHODOLOGY: FragKB contains approximately 400,000 conserved fragments from 4,800 representative proteins from PDB. Literature annotations are extracted from more than 1,700 articles and are available for over 12,000 fragments. The underlying systematic annotation workflow of FragKB ensures efficient update and maintenance of this database. The information in FragKB can be accessed through a web interface that facilitates sequence and structural visualization of fragments together with known literature information on the consequences of specific residue mutations and functional annotations of proteins and fragment clusters. FragKB is accessible online at http://ubio.bioinfo.cnio.es/biotools/fragkb/. SIGNIFICANCE: The information presented in FragKB can be used for modeling protein structures, for designing novel proteins and for functional characterization of related fragments. The current release is focused on functional characterization of proteins through inspection of conservation of the fragments.

  10. Html template system using java annotations

    OpenAIRE

    Speck, Peter

    2007-01-01

    The problems that motivate this project are to (1) solve the lack of separation between html templates and java code when using existing template systems (e.g. embedded language or macros), to (2) solve the lack of scoped declaration of macros and java variables inside template loops, and (3) to solve the lack of validation of template macro definitions at compile time to help finding bugs before the web applications are deployed. Annotations are used as metadata format for...

  11. Deburring: an annotated bibliography. Volume V

    International Nuclear Information System (INIS)

    An annotated summary of 204 articles and publications on burrs, burr prevention and deburring is presented. Thirty-seven deburring processes are listed. Entries cited include English, Russian, French, Japanese and German language articles. Entries are indexed by deburring processes, author, and language. Indexes also indicate which references discuss equipment and tooling, how to use a process, economics, burr properties, and how to design to minimize burr problems. Research studies are identified as are the materials deburred

  12. About Certain Semantic Annotation in Parallel Corpora

    OpenAIRE

    Violetta Koseska-Toszewa

    2015-01-01

    About Certain Semantic Annotation in Parallel CorporaThe semantic notation analyzed in this works is contained in the second stream of semantic theories presented here – in the direct approach semantics. We used this stream in our work on the Bulgarian-Polish Contrastive Grammar. Our semantic notation distinguishes quantificational meanings of names and predicates, and indicates aspectual and temporal meanings of verbs. It relies on logical scope-based quantification and on the contemporary t...

  13. Deburring: an annotated bibliography. Volume VI

    Energy Technology Data Exchange (ETDEWEB)

    Gillespie, L.K.

    1980-07-01

    An annotated summary of 138 articles and publications on burrs, burr prevention and deburring is presented. Thirty-seven deburring processes are listed. Entries cited include English, Russian, French, Japanese, and German language articles. Entries are indexed by deburring processes, author, and language. Indexes also indicate which references discuss equipment and tooling, how to use a proces economics, burr properties, and how to design to minimize burr problems. Research studies are identified as are the materials deburred.

  14. MOCAT2: a metagenomic assembly, annotation and profiling framework

    Science.gov (United States)

    Kultima, Jens Roat; Coelho, Luis Pedro; Forslund, Kristoffer; Huerta-Cepas, Jaime; Li, Simone S.; Driessen, Marja; Voigt, Anita Yvonne; Zeller, Georg; Sunagawa, Shinichi; Bork, Peer

    2016-01-01

    Summary: MOCAT2 is a software pipeline for metagenomic sequence assembly and gene prediction with novel features for taxonomic and functional abundance profiling. The automated generation and efficient annotation of non-redundant reference catalogs by propagating pre-computed assignments from 18 databases covering various functional categories allows for fast and comprehensive functional characterization of metagenomes. Availability and Implementation: MOCAT2 is implemented in Perl 5 and Python 2.7, designed for 64-bit UNIX systems and offers support for high-performance computer usage via LSF, PBS or SGE queuing systems; source code is freely available under the GPL3 license at http://mocat.embl.de. Contact: bork@embl.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153620

  15. Gene Ontology annotation of the rice blast fungus, Magnaporthe oryzae

    Directory of Open Access Journals (Sweden)

    Deng Jixin

    2009-02-01

    Full Text Available Abstract Background Magnaporthe oryzae, the causal agent of blast disease of rice, is the most destructive disease of rice worldwide. The genome of this fungal pathogen has been sequenced and an automated annotation has recently been updated to Version 6 http://www.broad.mit.edu/annotation/genome/magnaporthe_grisea/MultiDownloads.html. However, a comprehensive manual curation remains to be performed. Gene Ontology (GO annotation is a valuable means of assigning functional information using standardized vocabulary. We report an overview of the GO annotation for Version 5 of M. oryzae genome assembly. Methods A similarity-based (i.e., computational GO annotation with manual review was conducted, which was then integrated with a literature-based GO annotation with computational assistance. For similarity-based GO annotation a stringent reciprocal best hits method was used to identify similarity between predicted proteins of M. oryzae and GO proteins from multiple organisms with published associations to GO terms. Significant alignment pairs were manually reviewed. Functional assignments were further cross-validated with manually reviewed data, conserved domains, or data determined by wet lab experiments. Additionally, biological appropriateness of the functional assignments was manually checked. Results In total, 6,286 proteins received GO term assignment via the homology-based annotation, including 2,870 hypothetical proteins. Literature-based experimental evidence, such as microarray, MPSS, T-DNA insertion mutation, or gene knockout mutation, resulted in 2,810 proteins being annotated with GO terms. Of these, 1,673 proteins were annotated with new terms developed for Plant-Associated Microbe Gene Ontology (PAMGO. In addition, 67 experiment-determined secreted proteins were annotated with PAMGO terms. Integration of the two data sets resulted in 7,412 proteins (57% being annotated with 1,957 distinct and specific GO terms. Unannotated proteins

  16. An integrated approach for genome annotation of the eukaryotic thermophile Chaetomium thermophilum.

    Science.gov (United States)

    Bock, Thomas; Chen, Wei-Hua; Ori, Alessandro; Malik, Nayab; Silva-Martin, Noella; Huerta-Cepas, Jaime; Powell, Sean T; Kastritis, Panagiotis L; Smyshlyaev, Georgy; Vonkova, Ivana; Kirkpatrick, Joanna; Doerks, Tobias; Nesme, Leo; Baßler, Jochen; Kos, Martin; Hurt, Ed; Carlomagno, Teresa; Gavin, Anne-Claude; Barabas, Orsolya; Müller, Christoph W; van Noort, Vera; Beck, Martin; Bork, Peer

    2014-12-16

    The thermophilic fungus Chaetomium thermophilum holds great promise for structural biology. To increase the efficiency of its biochemical and structural characterization and to explore its thermophilic properties beyond those of individual proteins, we obtained transcriptomics and proteomics data, and integrated them with computational annotation methods and a multitude of biochemical experiments conducted by the structural biology community. We considerably improved the genome annotation of Chaetomium thermophilum and characterized the transcripts and expression of thousands of genes. We furthermore show that the composition and structure of the expressed proteome of Chaetomium thermophilum is similar to its mesophilic relatives. Data were deposited in a publicly available repository and provide a rich source to the structural biology community. PMID:25398899

  17. FINDING GENERIFS VIA GENE ONTOLOGY ANNOTATIONS

    OpenAIRE

    Lu, Zhiyong; Cohen, K Bretonnel; Hunter, Lawrence

    2006-01-01

    A Gene Reference Into Function (GeneRIF) is a concise phrase describing a function of a gene in the Entrez Gene database. Applying techniques from the area of natural language processing known as automatic summarization, it is possible to link the Entrez Gene database, the Gene Ontology, and the biomedical literature. A system was implemented that automatically suggests a sentence from a PubMed/MEDLINE abstract as a candidate GeneRIF by exploiting a gene’s GO annotations along with location f...

  18. Eval: A software package for analysis of genome annotations

    OpenAIRE

    Brent Michael R; Keibler Evan

    2003-01-01

    Abstract Summary Eval is a flexible tool for analyzing the performance of gene annotation systems. It provides summaries and graphical distributions for many descriptive statistics about any set of annotations, regardless of their source. It also compares sets of predictions to standard annotations and to one another. Input is in the standard Gene Transfer Format (GTF). Eval can be run interactively or via the command line, in which case output options include easily parsable tab-delimited fi...

  19. Barcode Annotations for Medical Image Retrieval: A Preliminary Investigation

    OpenAIRE

    Tizhoosh, Hamid R.

    2015-01-01

    This paper proposes to generate and to use barcodes to annotate medical images and/or their regions of interest such as organs, tumors and tissue types. A multitude of efficient feature-based image retrieval methods already exist that can assign a query image to a certain image class. Visual annotations may help to increase the retrieval accuracy if combined with existing feature-based classification paradigms. Whereas with annotations we usually mean textual descriptions, in this paper barco...

  20. An Extensible, Kinematically-Based Gesture Annotation Scheme

    OpenAIRE

    Martell, Craig H.

    2002-01-01

    Chapter 1 in the book: Advances in Natural Multimodal Dialogue Systems Annotated corpora have played a critical role in speech and natural language research; and, there is an increasing interest in corpora-based research in sign language and gesture as well. We present a non-semantic, geometrically-based annotation scheme, FORM, which allows an annotator to capture the kinematic information in a gesture just from videos of speakers. In addition, FORM stores this gestural in...

  1. AnnTools: a comprehensive and versatile annotation toolkit for genomic variants

    OpenAIRE

    Makarov, Vladimir; O'Grady, Tina; Cai, Guiqing; Lihm, Jayon; Buxbaum, Joseph D; Yoon, Seungtai

    2012-01-01

    Summary: AnnTools is a versatile bioinformatics application designed for comprehensive annotation of a full spectrum of human genome variation: novel and known single-nucleotide substitutions (SNP/SNV), short insertions/deletions (INDEL) and structural variants/copy number variation (SV/CNV). The variants are interpreted by interrogating data compiled from 15 constantly updated sources. In addition to detailed functional characterization of the coding variants, AnnTools searches for overlaps ...

  2. Correction of the Caulobacter crescentus NA1000 genome annotation.

    Science.gov (United States)

    Ely, Bert; Scott, LaTia Etheredge

    2014-01-01

    Bacterial genome annotations are accumulating rapidly in the GenBank database and the use of automated annotation technologies to create these annotations has become the norm. However, these automated methods commonly result in a small, but significant percentage of genome annotation errors. To improve accuracy and reliability, we analyzed the Caulobacter crescentus NA1000 genome utilizing computer programs Artemis and MICheck to manually examine the third codon position GC content, alignment to a third codon position GC frame plot peak, and matches in the GenBank database. We identified 11 new genes, modified the start site of 113 genes, and changed the reading frame of 38 genes that had been incorrectly annotated. Furthermore, our manual method of identifying protein-coding genes allowed us to remove 112 non-coding regions that had been designated as coding regions. The improved NA1000 genome annotation resulted in a reduction in the use of rare codons since noncoding regions with atypical codon usage were removed from the annotation and 49 new coding regions were added to the annotation. Thus, a more accurate codon usage table was generated as well. These results demonstrate that a comparison of the location of peaks third codon position GC content to the location of protein coding regions could be used to verify the annotation of any genome that has a GC content that is greater than 60%.

  3. Correction of the Caulobacter crescentus NA1000 genome annotation.

    Directory of Open Access Journals (Sweden)

    Bert Ely

    Full Text Available Bacterial genome annotations are accumulating rapidly in the GenBank database and the use of automated annotation technologies to create these annotations has become the norm. However, these automated methods commonly result in a small, but significant percentage of genome annotation errors. To improve accuracy and reliability, we analyzed the Caulobacter crescentus NA1000 genome utilizing computer programs Artemis and MICheck to manually examine the third codon position GC content, alignment to a third codon position GC frame plot peak, and matches in the GenBank database. We identified 11 new genes, modified the start site of 113 genes, and changed the reading frame of 38 genes that had been incorrectly annotated. Furthermore, our manual method of identifying protein-coding genes allowed us to remove 112 non-coding regions that had been designated as coding regions. The improved NA1000 genome annotation resulted in a reduction in the use of rare codons since noncoding regions with atypical codon usage were removed from the annotation and 49 new coding regions were added to the annotation. Thus, a more accurate codon usage table was generated as well. These results demonstrate that a comparison of the location of peaks third codon position GC content to the location of protein coding regions could be used to verify the annotation of any genome that has a GC content that is greater than 60%.

  4. Review of actinide-sediment reactions with an annotated bibliography

    Energy Technology Data Exchange (ETDEWEB)

    Ames, L.L.; Rai, D.; Serne, R.J.

    1976-02-10

    The annotated bibliography is divided into sections on chemistry and geochemistry, migration and accumulation, cultural distributions, natural distributions, and bibliographies and annual reviews. (LK)

  5. A Novel Approach to Semantic and Coreference Annotation at LLNL

    Energy Technology Data Exchange (ETDEWEB)

    Firpo, M

    2005-02-04

    A case is made for the importance of high quality semantic and coreference annotation. The challenges of providing such annotation are described. Asperger's Syndrome is introduced, and the connections are drawn between the needs of text annotation and the abilities of persons with Asperger's Syndrome to meet those needs. Finally, a pilot program is recommended wherein semantic annotation is performed by people with Asperger's Syndrome. The primary points embodied in this paper are as follows: (1) Document annotation is essential to the Natural Language Processing (NLP) projects at Lawrence Livermore National Laboratory (LLNL); (2) LLNL does not currently have a system in place to meet its need for text annotation; (3) Text annotation is challenging for a variety of reasons, many related to its very rote nature; (4) Persons with Asperger's Syndrome are particularly skilled at rote verbal tasks, and behavioral experts agree that they would excel at text annotation; and (6) A pilot study is recommend in which two to three people with Asperger's Syndrome annotate documents and then the quality and throughput of their work is evaluated relative to that of their neuro-typical peers.

  6. Semantator: semantic annotator for converting biomedical text to linked data.

    Science.gov (United States)

    Tao, Cui; Song, Dezhao; Sharma, Deepak; Chute, Christopher G

    2013-10-01

    More than 80% of biomedical data is embedded in plain text. The unstructured nature of these text-based documents makes it challenging to easily browse and query the data of interest in them. One approach to facilitate browsing and querying biomedical text is to convert the plain text to a linked web of data, i.e., converting data originally in free text to structured formats with defined meta-level semantics. In this paper, we introduce Semantator (Semantic Annotator), a semantic-web-based environment for annotating data of interest in biomedical documents, browsing and querying the annotated data, and interactively refining annotation results if needed. Through Semantator, information of interest can be either annotated manually or semi-automatically using plug-in information extraction tools. The annotated results will be stored in RDF and can be queried using the SPARQL query language. In addition, semantic reasoners can be directly applied to the annotated data for consistency checking and knowledge inference. Semantator has been released online and was used by the biomedical ontology community who provided positive feedbacks. Our evaluation results indicated that (1) Semantator can perform the annotation functionalities as designed; (2) Semantator can be adopted in real applications in clinical and transactional research; and (3) the annotated results using Semantator can be easily used in Semantic-web-based reasoning tools for further inference.

  7. Introduction to annotated logics foundations for paracomplete and paraconsistent reasoning

    CERN Document Server

    Abe, Jair Minoro; Nakamatsu, Kazumi

    2015-01-01

    This book is written as an introduction to annotated logics. It provides logical foundations for annotated logics, discusses some interesting applications of these logics and also includes the authors' contributions to annotated logics. The central idea of the book is to show how annotated logic can be applied as a tool to solve problems of technology and of applied science. The book will be of interest to pure and applied logicians, philosophers, and computer scientists as a monograph on a kind of paraconsistent logic. But, the layman will also take profit from its reading.

  8. EFFICIENT VIDEO ANNOTATIONS BY AN IMAGE GROUPS

    Directory of Open Access Journals (Sweden)

    K . Mahi balan

    2015-10-01

    Full Text Available Searching desirable events in uncontrolled videos is a challenging task. So, researches mainly focus on obtaining concepts from numerous labelled videos. But it is time consuming and labour expensive to collect a large amount of required labelled videos for training event models under various condition. To avoid this problem, we propose to leverage abundant Web images for videos since Web images contain a rich source of information with many events roughly annotated and taken under various conditions. However, information from the Web is difficult .so,brute force knowledge transfer of images may hurt the video annotation performance. so, we propose a novel Group-based Domain Adaptation learning framework to leverage different groups of knowledge (source target queried from the Web image search engine to consumer videos (domain target. Different from old methods using multiple source domains of images, our method makes the Web images according to their intrinsic semantic relationships instead of source. Specifically, two different types of groups ( event-specific groups and concept-specific groups are exploited to respectively describe the event-level and concept-level semantic meanings of target-domain videos.

  9. Comparing functional annotation analyses with Catmap

    Directory of Open Access Journals (Sweden)

    Krogh Morten

    2004-12-01

    Full Text Available Abstract Background Ranked gene lists from microarray experiments are usually analysed by assigning significance to predefined gene categories, e.g., based on functional annotations. Tools performing such analyses are often restricted to a category score based on a cutoff in the ranked list and a significance calculation based on random gene permutations as null hypothesis. Results We analysed three publicly available data sets, in each of which samples were divided in two classes and genes ranked according to their correlation to class labels. We developed a program, Catmap (available for download at http://bioinfo.thep.lu.se/Catmap, to compare different scores and null hypotheses in gene category analysis, using Gene Ontology annotations for category definition. When a cutoff-based score was used, results depended strongly on the choice of cutoff, introducing an arbitrariness in the analysis. Comparing results using random gene permutations and random sample permutations, respectively, we found that the assigned significance of a category depended strongly on the choice of null hypothesis. Compared to sample label permutations, gene permutations gave much smaller p-values for large categories with many coexpressed genes. Conclusions In gene category analyses of ranked gene lists, a cutoff independent score is preferable. The choice of null hypothesis is very important; random gene permutations does not work well as an approximation to sample label permutations.

  10. Towards a Library of Standard Operating Procedures (SOPs) for (meta)genomic annotation

    Energy Technology Data Exchange (ETDEWEB)

    Kyrpides, Nikos; Angiuoli, Samuel V.; Cochrane, Guy; Field, Dawn; Garrity, George; Gussman, Aaron; Kodira, Chinnappa D.; Klimke, William; Kyrpides, Nikos; Madupu, Ramana; Markowitz, Victor; Tatusova, Tatiana; Thomson, Nick; White, Owen

    2008-04-01

    Genome annotations describe the features of genomes and accompany sequences in genome databases. The methodologies used to generate genome annotation are diverse and typically vary amongst groups. Descriptions of the annotation procedure are helpful in interpreting genome annotation data. Standard Operating Procedures (SOPs) for genome annotation describe the processes that generate genome annotations. Some groups are currently documenting procedures but standards are lacking for structure and content of annotation SOPs. In addition, there is no central repository to store and disseminate procedures and protocols for genome annotation. We highlight the importance of SOPs for genome annotation and endorse a central online repository of SOPs.

  11. Annotate-it: a Swiss-knife approach to annotation, analysis and interpretation of single nucleotide variation in human disease.

    Science.gov (United States)

    Sifrim, Alejandro; Van Houdt, Jeroen Kj; Tranchevent, Leon-Charles; Nowakowska, Beata; Sakai, Ryo; Pavlopoulos, Georgios A; Devriendt, Koen; Vermeesch, Joris R; Moreau, Yves; Aerts, Jan

    2012-01-01

    The increasing size and complexity of exome/genome sequencing data requires new tools for clinical geneticists to discover disease-causing variants. Bottlenecks in identifying the causative variation include poor cross-sample querying, constantly changing functional annotation and not considering existing knowledge concerning the phenotype. We describe a methodology that facilitates exploration of patient sequencing data towards identification of causal variants under different genetic hypotheses. Annotate-it facilitates handling, analysis and interpretation of high-throughput single nucleotide variant data. We demonstrate our strategy using three case studies. Annotate-it is freely available and test data are accessible to all users at http://www.annotate-it.org.

  12. Annotation Method (AM): SE40_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available used for annotation and identification of the compounds. Retension time of 56 authentic chemicals (13-OxoOD...lcone, Nicotinamide, Nicotinate, Pantothenate, Phloretin, Prunin, Rutin, S-Adenosyl-L-methionine, Tomatine, UMP, Uridine) are used for annotation and identification of the compounds. ...

  13. Product annotations - KOME | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us ...ile name: kome_product_annotation.zip File URL: ftp://ftp.biosciencedbc.jp/archiv...ate History of This Database Site Policy | Contact Us Product annotations - KOME | LSDB Archive ...

  14. From the Margins to the Center: The Future of Annotation.

    Science.gov (United States)

    Wolfe, Joanna L.; Neuwirth, Christine M.

    2001-01-01

    Describes the importance of annotation to reading and writing practices and reviews new technologies that complicate the ways annotation can be used to support and enhance traditional reading, writing, and collaboration processes. Emphasizes issues and methods that will be productive for enhancing theories of workplace and classroom communication…

  15. Annotation-Based Whole Genomic Prediction and Selection

    DEFF Research Database (Denmark)

    Kadarmideen, Haja; Do, Duy Ngoc; Janss, Luc;

    in their contribution to estimated genomic variances and in prediction of genomic breeding values by applying SNP annotation approaches to feed efficiency. Ensembl Variant Predictor (EVP) and Pig QTL database were used as the source of genomic annotation for 60K chip. Genomic prediction was performed using the Bayes...... prove useful for less heritable traits such as diseases and fertility...

  16. MUTAGEN: Multi-user tool for annotating GENomes

    DEFF Research Database (Denmark)

    Brugger, K.; Redder, P.; Skovgaard, Marie

    2003-01-01

    MUTAGEN is a free prokaryotic annotation system. It offers the advantages of genome comparison, graphical sequence browsers, search facilities and open-source for user-specific adjustments. The web-interface allows several users to access the system from standard desktop computers. The Sulfolobus...... acidocaldarius genome, and several plasmids and viruses have so far been analysed and annotated using MUTAGEN....

  17. Bioinformatics Assisted Gene Discovery and Annotation of Human Genome

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    As the sequencing stage of human genome project is near the end, the work has begun for discovering novel genes from genome sequences and annotating their biological functions. Here are reviewed current major bioinformatics tools and technologies available for large scale gene discovery and annotation from human genome sequences. Some ideas about possible future development are also provided.

  18. Evaluating techniques for metagenome annotation using simulated sequence data.

    Science.gov (United States)

    Randle-Boggis, Richard J; Helgason, Thorunn; Sapp, Melanie; Ashton, Peter D

    2016-07-01

    The advent of next-generation sequencing has allowed huge amounts of DNA sequence data to be produced, advancing the capabilities of microbial ecosystem studies. The current challenge is to identify from which microorganisms and genes the DNA originated. Several tools and databases are available for annotating DNA sequences. The tools, databases and parameters used can have a significant impact on the results: naïve choice of these factors can result in a false representation of community composition and function. We use a simulated metagenome to show how different parameters affect annotation accuracy by evaluating the sequence annotation performances of MEGAN, MG-RAST, One Codex and Megablast. This simulated metagenome allowed the recovery of known organism and function abundances to be quantitatively evaluated, which is not possible for environmental metagenomes. The performance of each program and database varied, e.g. One Codex correctly annotated many sequences at the genus level, whereas MG-RAST RefSeq produced many false positive annotations. This effect decreased as the taxonomic level investigated increased. Selecting more stringent parameters decreases the annotation sensitivity, but increases precision. Ultimately, there is a trade-off between taxonomic resolution and annotation accuracy. These results should be considered when annotating metagenomes and interpreting results from previous studies. PMID:27162180

  19. JAFA: a protein function annotation meta-server

    DEFF Research Database (Denmark)

    Friedberg, Iddo; Harder, Tim; Godzik, Adam

    2006-01-01

    With the high number of sequences and structures streaming in from genomic projects, there is a need for more powerful and sophisticated annotation tools. Most problematic of the annotation efforts is predicting gene and protein function. Over the past few years there has been considerable progress...

  20. The GATO gene annotation tool for research laboratories

    Directory of Open Access Journals (Sweden)

    A. Fujita

    2005-11-01

    Full Text Available Large-scale genome projects have generated a rapidly increasing number of DNA sequences. Therefore, development of computational methods to rapidly analyze these sequences is essential for progress in genomic research. Here we present an automatic annotation system for preliminary analysis of DNA sequences. The gene annotation tool (GATO is a Bioinformatics pipeline designed to facilitate routine functional annotation and easy access to annotated genes. It was designed in view of the frequent need of genomic researchers to access data pertaining to a common set of genes. In the GATO system, annotation is generated by querying some of the Web-accessible resources and the information is stored in a local database, which keeps a record of all previous annotation results. GATO may be accessed from everywhere through the internet or may be run locally if a large number of sequences are going to be annotated. It is implemented in PHP and Perl and may be run on any suitable Web server. Usually, installation and application of annotation systems require experience and are time consuming, but GATO is simple and practical, allowing anyone with basic skills in informatics to access it without any special training. GATO can be downloaded at [http://mariwork.iq.usp.br/gato/]. Minimum computer free space required is 2 MB.

  1. The GATO gene annotation tool for research laboratories.

    Science.gov (United States)

    Fujita, A; Massirer, K B; Durham, A M; Ferreira, C E; Sogayar, M C

    2005-11-01

    Large-scale genome projects have generated a rapidly increasing number of DNA sequences. Therefore, development of computational methods to rapidly analyze these sequences is essential for progress in genomic research. Here we present an automatic annotation system for preliminary analysis of DNA sequences. The gene annotation tool (GATO) is a Bioinformatics pipeline designed to facilitate routine functional annotation and easy access to annotated genes. It was designed in view of the frequent need of genomic researchers to access data pertaining to a common set of genes. In the GATO system, annotation is generated by querying some of the Web-accessible resources and the information is stored in a local database, which keeps a record of all previous annotation results. GATO may be accessed from everywhere through the internet or may be run locally if a large number of sequences are going to be annotated. It is implemented in PHP and Perl and may be run on any suitable Web server. Usually, installation and application of annotation systems require experience and are time consuming, but GATO is simple and practical, allowing anyone with basic skills in informatics to access it without any special training. GATO can be downloaded at [http://mariwork.iq.usp.br/gato/]. Minimum computer free space required is 2 MB. PMID:16258624

  2. Online Metacognitive Strategies, Hypermedia Annotations, and Motivation on Hypertext Comprehension

    Science.gov (United States)

    Shang, Hui-Fang

    2016-01-01

    This study examined the effect of online metacognitive strategies, hypermedia annotations, and motivation on reading comprehension in a Taiwanese hypertext environment. A path analysis model was proposed based on the assumption that if English as a foreign language learners frequently use online metacognitive strategies and hypermedia annotations,…

  3. Behavioral Contributions to "Teaching of Psychology": An Annotated Bibliography

    Science.gov (United States)

    Karsten, A. M.; Carr, J. E.

    2008-01-01

    An annotated bibliography that summarizes behavioral contributions to the journal "Teaching of Psychology" from 1974 to 2006 is provided. A total of 116 articles of potential utility to college-level instructors of behavior analysis and related areas were identified, annotated, and organized into nine categories for ease of accessibility.…

  4. Applied bioinformatics: Genome annotation and transcriptome analysis

    DEFF Research Database (Denmark)

    Gupta, Vikas

    Next generation sequencing (NGS) has revolutionized the field of genomics and its wide range of applications has resulted in the genome-wide analysis of hundreds of species and the development of thousands of computational tools. This thesis represents my work on NGS analysis of four species, Lotus...... japonicus (Lotus), Vaccinium corymbosum (blueberry), Stegodyphus mimosarum (spider) and Trifolium occidentale (clover). From a bioinformatics data analysis perspective, my work can be divided into three parts; genome annotation, small RNA, and gene expression analysis. Lotus is a legume of significant...... agricultural and biological importance. Its capacity to form symbiotic relationships with rhizobia and microrrhizal fungi has fascinated researchers for years. Lotus has a small genome of approximately 470 Mb and a short life cycle of 2 to 3 months, which has made Lotus a model legume plant for many molecular...

  5. Semantator: annotating clinical narratives with semantic web ontologies.

    Science.gov (United States)

    Song, Dezhao; Chute, Christopher G; Tao, Cui

    2012-01-01

    To facilitate clinical research, clinical data needs to be stored in a machine processable and understandable way. Manual annotating clinical data is time consuming. Automatic approaches (e.g., Natural Language Processing systems) have been adopted to convert such data into structured formats; however, the quality of such automatically extracted data may not always be satisfying. In this paper, we propose Semantator, a semi-automatic tool for document annotation with Semantic Web ontologies. With a loaded free text document and an ontology, Semantator supports the creation/deletion of ontology instances for any document fragment, linking/disconnecting instances with the properties in the ontology, and also enables automatic annotation by connecting to the NCBO annotator and cTAKES. By representing annotations in Semantic Web standards, Semantator supports reasoning based upon the underlying semantics of the owl:disjointWith and owl:equivalentClass predicates. We present discussions based on user experiences of using Semantator.

  6. On Semantic Annotation in Clarin-PL Parallel Corpora

    Directory of Open Access Journals (Sweden)

    Violetta Koseska-Toszewa

    2015-12-01

    Full Text Available On Semantic Annotation in Clarin-PL Parallel CorporaIn the article, the authors present a proposal for semantic annotation in Clarin-PL parallel corpora: Polish-Bulgarian-Russian and Polish-Lithuanian ones. Semantic annotation of quantification is a novum in developing sentence level semantics in multilingual parallel corpora. This is why our semantic annotation is manual. The authors hope it will be interesting to IT specialists working on automatic processing of the given natural languages. Semantic annotation defined the way it is defined here will make contrastive studies of natural languages more efficient, which in turn will help verify the results of those studies, and will certainly improve human and machine translations.

  7. Semi-automatic conversion of BioProp semantic annotation to PASBio annotation

    OpenAIRE

    Dai Hong-Jie; Tsai Richard; Huang Chi-Hsin; Hsu Wen-Lian

    2008-01-01

    Abstract Background Semantic role labeling (SRL) is an important text analysis technique. In SRL, sentences are represented by one or more predicate-argument structures (PAS). Each PAS is composed of a predicate (verb) and several arguments (noun phrases, adverbial phrases, etc.) with different semantic roles, including main arguments (agent or patient) as well as adjunct arguments (time, manner, or location). PropBank is the most widely used PAS corpus and annotation format in the newswire d...

  8. The Disease and Gene Annotations (DGA): an annotation resource for human disease.

    Science.gov (United States)

    Peng, Kai; Xu, Wei; Zheng, Jianyong; Huang, Kegui; Wang, Huisong; Tong, Jiansong; Lin, Zhifeng; Liu, Jun; Cheng, Wenqing; Fu, Dong; Du, Pan; Kibbe, Warren A; Lin, Simon M; Xia, Tian

    2013-01-01

    Disease and Gene Annotations database (DGA, http://dga.nubic.northwestern.edu) is a collaborative effort aiming to provide a comprehensive and integrative annotation of the human genes in disease network context by integrating computable controlled vocabulary of the Disease Ontology (DO version 3 revision 2510, which has 8043 inherited, developmental and acquired human diseases), NCBI Gene Reference Into Function (GeneRIF) and molecular interaction network (MIN). DGA integrates these resources together using semantic mappings to build an integrative set of disease-to-gene and gene-to-gene relationships with excellent coverage based on current knowledge. DGA is kept current by periodically reparsing DO, GeneRIF, and MINs. DGA provides a user-friendly and interactive web interface system enabling users to efficiently query, download and visualize the DO tree structure and annotations as a tree, a network graph or a tabular list. To facilitate integrative analysis, DGA provides a web service Application Programming Interface for integration with external analytic tools.

  9. The High Throughput Sequence Annotation Service (HT-SAS – the shortcut from sequence to true Medline words

    Directory of Open Access Journals (Sweden)

    Siedlecki Pawel

    2009-05-01

    Full Text Available Abstract Background Advances in high-throughput technologies available to modern biology have created an increasing flood of experimentally determined facts. Ordering, managing and describing these raw results is the first step which allows facts to become knowledge. Currently there are limited ways to automatically annotate such data, especially utilizing information deposited in published literature. Results To aid researchers in describing results from high-throughput experiments we developed HT-SAS, a web service for automatic annotation of proteins using general English words. For each protein a poll of Medline abstracts connected to homologous proteins is gathered using the UniProt-Medline link. Overrepresented words are detected using binomial statistics approximation. We tested our automatic approach with a protein test set from SGD to determine the accuracy and usefulness of our approach. We also applied the automatic annotation service to improve annotations of proteins from Plasmodium bergei expressed exclusively during the blood stage. Conclusion Using HT-SAS we created new, or enriched already established annotations for over 20% of proteins from Plasmodium bergei expressed in the blood stage, deposited in PlasmoDB. Our tests show this approach to information extraction provides highly specific keywords, often also when the number of abstracts is limited. Our service should be useful for manual curators, as a complement to manually curated information sources and for researchers working with protein datasets, especially from poorly characterized organisms.

  10. Evaluation of probabilistic and logical inference for a SNP annotation system.

    Science.gov (United States)

    Shen, Terry H; Tarczy-Hornoch, Peter; Detwiler, Landon T; Cadag, Eithon; Carlson, Christopher S

    2010-06-01

    Genome wide association studies (GWAS) are an important approach to understanding the genetic mechanisms behind human diseases. Single nucleotide polymorphisms (SNPs) are the predominant markers used in genome wide association studies, and the ability to predict which SNPs are likely to be functional is important for both a priori and a posteriori analyses of GWA studies. This article describes the design, implementation and evaluation of a family of systems for the purpose of identifying SNPs that may cause a change in phenotypic outcomes. The methods described in this article characterize the feasibility of combinations of logical and probabilistic inference with federated data integration for both point and regional SNP annotation and analysis. Evaluations of the methods demonstrate the overall strong predictive value of logical, and logical with probabilistic, inference applied to the domain of SNP annotation.

  11. STRUCTURED WIKI WITH ANNOTATION FOR KNOWLEDGE MANAGEMENT: AN APPLICATION TO CULTURAL HERITAGE

    Directory of Open Access Journals (Sweden)

    Eric Leclercq

    2011-01-01

    Full Text Available In this paper, we highlight how semantic wikis can be relevant solutions for building cooperative data driven applications in domains characterized by a rapid evolution of knowledge. We will point out the semantic capabilities of annotated databases and structured wikis to provide better quality of content, to support complex queries and finally to carry on different type of users. Then we compare database application development with wiki for domains that encompass evolving knowledge. We detail the architecture of WikiBridge, a semantic wiki, which integrates templates forms and allows complex annotations as well as consistency checking. We describe the archaeological CARE project, and explain the conceptual modeling approach. A specific section is dedicated to ontology design, which is the compulsory foundational knowledge for the application. We finally report related works of the semantic wiki use for archaeological projects.

  12. Metalloproteomics: High-Throughput Structural and Functional Annotation of Proteins in Structural Genomics

    Energy Technology Data Exchange (ETDEWEB)

    Shi,W.; Zhan, C.; Lgnatov, A.; Manjasetty, B.; Marinkovic, N.; Sullivan, M.; Huang, R.; Chance, M.; Li, H.; et al.

    2005-01-01

    A high-throughput method for measuring transition metal content based on quantitation of X-ray fluorescence signals was used to analyze 654 proteins selected as targets by the New York Structural GenomiX Research Consortium. Over 10% showed the presence of transition metal atoms in stoichiometric amounts; these totals as well as the abundance distribution are similar to those of the Protein Data Bank. Bioinformatics analysis of the identified metalloproteins in most cases supported the metalloprotein annotation; identification of the conserved metal binding motif was also shown to be useful in verifying structural models of the proteins. Metalloproteomics provides a rapid structural and functional annotation for these sequences and is shown to be {approx}95% accurate in predicting the presence or absence of stoichiometric metal content. The project's goal is to assay at least 1 member from each Pfam family; approximately 500 Pfam families have been characterized with respect to transition metal content so far.

  13. The Collation of Three Versions of Front Annotations of the Siku Quanshu: Based on 365 Pieces of Front Annotations

    Directory of Open Access Journals (Sweden)

    Wen-Chin Lan

    2015-06-01

    Full Text Available A bibliographic annotation (tiyao提要 is a brief description of the author and content of a book as well as a comment on, or a critique of, the book. The Siku Quanshu Zongmu (四庫全書總目 has long been viewed as a model of the traditional Chinese annotated bibliography and its bibliographic annotations have been praised by many scholars. It is suggested that these annotations can be used as examples for learning how to write bibliographic annotations. The compilation of the Siku Quanshu Zongmu went through three stages: (1 individual draft annotations (分纂稿 written by various scholars, (2 front annotations (書前提要 revised and modified by the officials of the Siku Quanshu Project, and (3 finalized annotations (總目提要 mainly edited and compiled by Ji Yun (紀昀. Initially, the Siku Quanshu had seven written copies and there were seven sets of front annotations. They were housed separately in the seven chambers that Qianlong Emperor (乾隆, r. 1736-1795 built to store the Siku Quanshu. Currently, only three of the seven sets are intact and extant, including Wenyuange (文淵閣, Wensuge (文溯閣, and Wenjinge ( 文津閣. This study attempts to conduct a collation project of the three versions of front annotations. We chose 365 pieces of front annotations from the aforementioned three sets, respectively. The results corroborate that there exist variations and differences among the three sets of front annotations. This paper presents three examples to illustrate how the collation task was done. Since these annotations were transcribed manually, it is quite common to notice that the three sets might use variant forms for the same character. The descriptions of author, title, or number of volumes might be different as well. In particular, the annotation for the same book might be different slightly or significantly among the three sets. This paper is a summary report of the preliminary findings of the collation task

  14. cDNA2Genome: A tool for mapping and annotating cDNAs

    Directory of Open Access Journals (Sweden)

    Suhai Sandor

    2003-09-01

    Full Text Available Abstract Background In the last years several high-throughput cDNA sequencing projects have been funded worldwide with the aim of identifying and characterizing the structure of complete novel human transcripts. However some of these cDNAs are error prone due to frameshifts and stop codon errors caused by low sequence quality, or to cloning of truncated inserts, among other reasons. Therefore, accurate CDS prediction from these sequences first require the identification of potentially problematic cDNAs in order to speed up the posterior annotation process. Results cDNA2Genome is an application for the automatic high-throughput mapping and characterization of cDNAs. It utilizes current annotation data and the most up to date databases, especially in the case of ESTs and mRNAs in conjunction with a vast number of approaches to gene prediction in order to perform a comprehensive assessment of the cDNA exon-intron structure. The final result of cDNA2Genome is an XML file containing all relevant information obtained in the process. This XML output can easily be used for further analysis such us program pipelines, or the integration of results into databases. The web interface to cDNA2Genome also presents this data in HTML, where the annotation is additionally shown in a graphical form. cDNA2Genome has been implemented under the W3H task framework which allows the combination of bioinformatics tools in tailor-made analysis task flows as well as the sequential or parallel computation of many sequences for large-scale analysis. Conclusions cDNA2Genome represents a new versatile and easily extensible approach to the automated mapping and annotation of human cDNAs. The underlying approach allows sequential or parallel computation of sequences for high-throughput analysis of cDNAs.

  15. Fuzzy Emotional Semantic Analysis and Automated Annotation of Scene Images

    Directory of Open Access Journals (Sweden)

    Jianfang Cao

    2015-01-01

    Full Text Available With the advances in electronic and imaging techniques, the production of digital images has rapidly increased, and the extraction and automated annotation of emotional semantics implied by images have become issues that must be urgently addressed. To better simulate human subjectivity and ambiguity for understanding scene images, the current study proposes an emotional semantic annotation method for scene images based on fuzzy set theory. A fuzzy membership degree was calculated to describe the emotional degree of a scene image and was implemented using the Adaboost algorithm and a back-propagation (BP neural network. The automated annotation method was trained and tested using scene images from the SUN Database. The annotation results were then compared with those based on artificial annotation. Our method showed an annotation accuracy rate of 91.2% for basic emotional values and 82.4% after extended emotional values were added, which correspond to increases of 5.5% and 8.9%, respectively, compared with the results from using a single BP neural network algorithm. Furthermore, the retrieval accuracy rate based on our method reached approximately 89%. This study attempts to lay a solid foundation for the automated emotional semantic annotation of more types of images and therefore is of practical significance.

  16. Semantic Based Image Annotation Using Descriptive Features and Retagging approach

    Directory of Open Access Journals (Sweden)

    P.Nagarani

    2012-02-01

    Full Text Available The Semantic based annotation of an image is very important and a difficult task in content-based image retrieval (CBIR. The low-level features of the images are described using color and the texture features and the proposed model is used for semantic annotation of images. Also the textual annotations or the tags with multimedia content are the most effective approaches to organize and to support search over digital images and multimedia databases. The quality of the tags was refined using Image retagging method. The process is given as a multi-path graph based problem, which in parallel identifies the visual content of the images, semantic correlation of the tags as well as the primary information provided by users. The image annotation preferred because as the countless images exist in our lives it is not possible to annotate them all by hand. And so annotation by computer is a potential and promising solution to this problem precisely. The ability to annotate images semantically based on the objects that they contain is essential in image retrieval as it provides the mechanism to take advantage of existing text retrieval system

  17. Semantic Based Image Annotation Using Descriptive Features and Retagging approach

    Directory of Open Access Journals (Sweden)

    P.Nagarani

    2012-03-01

    Full Text Available The Semantic based annotation of an image is very important and a difficult task in content-based image retrieval (CBIR. The low-level features of the images are described using color and the texture features and the proposed model is used for semantic annotation of images. Also the textual annotations or the tags with multimedia content are the most effective approaches to organize and to support search over digital images and multimedia databases. The quality of the tags was refined using Image retagging method. Theprocess is given as a multi-path graph based problem, which in parallel identifies the visual content of the images, semantic correlation of the tags as well as the primary information provided by users. The image annotation preferred because as the countless images exist in our lives it is not possible to annotate them all by hand. And so annotation by computer is a potential and promising solution to thisproblem precisely. The ability to annotate images semantically based on the objects that they contain is essential in image retrieval as it provides the mechanism to take advantage of existing text retrieval systems.

  18. Open semantic annotation of scientific publications using DOMEO

    Directory of Open Access Journals (Sweden)

    Ciccarese Paolo

    2012-04-01

    Full Text Available Abstract Background Our group has developed a useful shared software framework for performing, versioning, sharing and viewing Web annotations of a number of kinds, using an open representation model. Methods The Domeo Annotation Tool was developed in tandem with this open model, the Annotation Ontology (AO. Development of both the Annotation Framework and the open model was driven by requirements of several different types of alpha users, including bench scientists and biomedical curators from university research labs, online scientific communities, publishing and pharmaceutical companies. Several use cases were incrementally implemented by the toolkit. These use cases in biomedical communications include personal note-taking, group document annotation, semantic tagging, claim-evidence-context extraction, reagent tagging, and curation of textmining results from entity extraction algorithms. Results We report on the Domeo user interface here. Domeo has been deployed in beta release as part of the NIH Neuroscience Information Framework (NIF, http://www.neuinfo.org and is scheduled for production deployment in the NIF’s next full release. Future papers will describe other aspects of this work in detail, including Annotation Framework Services and components for integrating with external textmining services, such as the NCBO Annotator web service, and with other textmining applications using the Apache UIMA framework.

  19. Scoring consensus of multiple ECG annotators by optimal sequence alignment.

    Science.gov (United States)

    Haghpanahi, Masoumeh; Sameni, Reza; Borkholder, David A

    2014-01-01

    Development of ECG delineation algorithms has been an area of intense research in the field of computational cardiology for the past few decades. However, devising evaluation techniques for scoring and/or merging the results of such algorithms, both in the presence or absence of gold standards, still remains as a challenge. This is mainly due to existence of missed or erroneous determination of fiducial points in the results of different annotation algorithms. The discrepancy between different annotators increases when the reference signal includes arrhythmias or significant noise and its morphology deviates from a clean ECG signal. In this work, we propose a new approach to evaluate and compare the results of different annotators under such conditions. Specifically, we use sequence alignment techniques similar to those used in bioinformatics for the alignment of gene sequences. Our approach is based on dynamic programming where adequate mismatch penalties, depending on the type of the fiducial point and the underlying signal, are defined to optimally align the annotation sequences. We also discuss how to extend the algorithm for more than two sequences by using suitable data structures to align multiple annotation sequences with each other. Once the sequences are aligned, different heuristics are devised to evaluate the performance against a gold standard annotation, or to merge the results of multiple annotations when no gold standard exists. PMID:25570339

  20. MimoSA: a system for minimotif annotation

    Directory of Open Access Journals (Sweden)

    Kundeti Vamsi

    2010-06-01

    Full Text Available Abstract Background Minimotifs are short peptide sequences within one protein, which are recognized by other proteins or molecules. While there are now several minimotif databases, they are incomplete. There are reports of many minimotifs in the primary literature, which have yet to be annotated, while entirely novel minimotifs continue to be published on a weekly basis. Our recently proposed function and sequence syntax for minimotifs enables us to build a general tool that will facilitate structured annotation and management of minimotif data from the biomedical literature. Results We have built the MimoSA application for minimotif annotation. The application supports management of the Minimotif Miner database, literature tracking, and annotation of new minimotifs. MimoSA enables the visualization, organization, selection and editing functions of minimotifs and their attributes in the MnM database. For the literature components, Mimosa provides paper status tracking and scoring of papers for annotation through a freely available machine learning approach, which is based on word correlation. The paper scoring algorithm is also available as a separate program, TextMine. Form-driven annotation of minimotif attributes enables entry of new minimotifs into the MnM database. Several supporting features increase the efficiency of annotation. The layered architecture of MimoSA allows for extensibility by separating the functions of paper scoring, minimotif visualization, and database management. MimoSA is readily adaptable to other annotation efforts that manually curate literature into a MySQL database. Conclusions MimoSA is an extensible application that facilitates minimotif annotation and integrates with the Minimotif Miner database. We have built MimoSA as an application that integrates dynamic abstract scoring with a high performance relational model of minimotif syntax. MimoSA's TextMine, an efficient paper-scoring algorithm, can be used to

  1. Comparative Omics-Driven Genome Annotation Refinement: Application across Yersiniae

    Energy Technology Data Exchange (ETDEWEB)

    Rutledge, Alexandra C.; Jones, Marcus B.; Chauhan, Sadhana; Purvine, Samuel O.; Sanford, James; Monroe, Matthew E.; Brewer, Heather M.; Payne, Samuel H.; Ansong, Charles; Frank, Bryan C.; Smith, Richard D.; Peterson, Scott; Motin, Vladimir L.; Adkins, Joshua N.

    2012-03-27

    Genome sequencing continues to be a rapidly evolving technology, yet most downstream aspects of genome annotation pipelines remain relatively stable or are even being abandoned. To date, the perceived value of manual curation for genome annotations is not offset by the real cost and time associated with the process. In order to balance the large number of sequences generated, the annotation process is now performed almost exclusively in an automated fashion for most genome sequencing projects. One possible way to reduce errors inherent to automated computational annotations is to apply data from 'omics' measurements (i.e. transcriptional and proteomic) to the un-annotated genome with a proteogenomic-based approach. This approach does require additional experimental and bioinformatics methods to include omics technologies; however, the approach is readily automatable and can benefit from rapid developments occurring in those research domains as well. The annotation process can be improved by experimental validation of transcription and translation and aid in the discovery of annotation errors. Here the concept of annotation refinement has been extended to include a comparative assessment of genomes across closely related species, as is becoming common in sequencing efforts. Transcriptomic and proteomic data derived from three highly similar pathogenic Yersiniae (Y. pestis CO92, Y. pestis pestoides F, and Y. pseudotuberculosis PB1/+) was used to demonstrate a comprehensive comparative omic-based annotation methodology. Peptide and oligo measurements experimentally validated the expression of nearly 40% of each strain's predicted proteome and revealed the identification of 28 novel and 68 previously incorrect protein-coding sequences (e.g., observed frameshifts, extended start sites, and translated pseudogenes) within the three current Yersinia genome annotations. Gene loss is presumed to play a major role in Y. pestis acquiring its niche as a virulent

  2. Eval: A software package for analysis of genome annotations

    Directory of Open Access Journals (Sweden)

    Brent Michael R

    2003-10-01

    Full Text Available Abstract Summary Eval is a flexible tool for analyzing the performance of gene annotation systems. It provides summaries and graphical distributions for many descriptive statistics about any set of annotations, regardless of their source. It also compares sets of predictions to standard annotations and to one another. Input is in the standard Gene Transfer Format (GTF. Eval can be run interactively or via the command line, in which case output options include easily parsable tab-delimited files. Availability To obtain the module package with documentation, go to http://genes.cse.wustl.edu/ and follow links for Resources, then Software. Please contact brent@cse.wustl.edu

  3. An annotation system for 3D fluid flow visualization

    Science.gov (United States)

    Loughlin, Maria M.; Hughes, John F.

    1995-01-01

    Annotation is a key activity of data analysis. However, current systems for data analysis focus almost exclusively on visualization. We propose a system which integrates annotations into a visualization system. Annotations are embedded in 3D data space, using the Post-it metaphor. This embedding allows contextual-based information storage and retrieval, and facilitates information sharing in collaborative environments. We provide a traditional database filter and a Magic Lens filter to create specialized views of the data. The system has been customized for fluid flow applications, with features which allow users to store parameters of visualization tools and sketch 3D volumes.

  4. DDBJ in collaboration with mass-sequencing teams on annotation

    OpenAIRE

    Tateno, Y; Saitou, N; Okubo, K; Sugawara, H.; Gojobori, T

    2004-01-01

    In the past year, we at DDBJ (DNA Data Bank of Japan; http://www.ddbj.nig.ac.jp) collected and released 1 066 084 entries or 718 072 425 bases including the whole chromosome 22 of chimpanzee, the whole-genome shotgun sequences of silkworm and various others. On the other hand, we hosted workshops for human full-length cDNA annotation and participated in jamborees of mouse full-length cDNA annotation. The annotated data are made public at DDBJ. We are also in collaboration with a RIKEN team to...

  5. Semantic annotation for biological information retrieval system.

    Science.gov (United States)

    Oshaiba, Mohamed Marouf Z; El Houby, Enas M F; Salah, Akram

    2015-01-01

    Online literatures are increasing in a tremendous rate. Biological domain is one of the fast growing domains. Biological researchers face a problem finding what they are searching for effectively and efficiently. The aim of this research is to find documents that contain any combination of biological process and/or molecular function and/or cellular component. This research proposes a framework that helps researchers to retrieve meaningful documents related to their asserted terms based on gene ontology (GO). The system utilizes GO by semantically decomposing it into three subontologies (cellular component, biological process, and molecular function). Researcher has the flexibility to choose searching terms from any combination of the three subontologies. Document annotation is taking a place in this research to create an index of biological terms in documents to speed the searching process. Query expansion is used to infer semantically related terms to asserted terms. It increases the search meaningful results using the term synonyms and term relationships. The system uses a ranking method to order the retrieved documents based on the ranking weights. The proposed system achieves researchers' needs to find documents that fit the asserted terms semantically.

  6. Annotation and Curation of Uncharacterized proteins- Challenges

    Directory of Open Access Journals (Sweden)

    Johny eIjaq

    2015-03-01

    Full Text Available Hypothetical Proteins are the proteins that are predicted to be expressed from an open reading frame (ORF, constituting a substantial fraction of proteomes in both prokaryotes and eukaryotes. Genome projects have led to the identification of many therapeutic targets, the putative function of the protein and their interactions. In this review we have enlisted various methods. Annotation linked to structural and functional prediction of hypothetical proteins assist in the discovery of new structures and functions serving as markers and pharmacological targets for drug designing, discovery and screening. Mass spectrometry is an analytical technique for validating protein characterisation. Matrix-assisted laser desorption ionization–mass spectrometry (MALDI-MS is an efficient analytical method. Microarrays and Protein expression profiles help understanding the biological systems through a systems-wide study of proteins and their interactions with other proteins and non-proteinaceous molecules to control complex processes in cells and tissues and even whole organism. Next generation sequencing technology accelerates multiple areas of genomics research.

  7. Functional annotation and identification of candidate disease genes by computational analysis of normal tissue gene expression data.

    Directory of Open Access Journals (Sweden)

    Laura Miozzi

    Full Text Available BACKGROUND: High-throughput gene expression data can predict gene function through the "guilt by association" principle: coexpressed genes are likely to be functionally associated. METHODOLOGY/PRINCIPAL FINDINGS: We analyzed publicly available expression data on normal human tissues. The analysis is based on the integration of data obtained with two experimental platforms (microarrays and SAGE and of various measures of dissimilarity between expression profiles. The building blocks of the procedure are the Ranked Coexpression Groups (RCG, small sets of tightly coexpressed genes which are analyzed in terms of functional annotation. Functionally characterized RCGs are selected by means of the majority rule and used to predict new functional annotations. Functionally characterized RCGs are enriched in groups of genes associated to similar phenotypes. We exploit this fact to find new candidate disease genes for many OMIM phenotypes of unknown molecular origin. CONCLUSIONS/SIGNIFICANCE: We predict new functional annotations for many human genes, showing that the integration of different data sets and coexpression measures significantly improves the scope of the results. Combining gene expression data, functional annotation and known phenotype-gene associations we provide candidate genes for several genetic diseases of unknown molecular basis.

  8. Annotation et rature Annotation and Deletion: Outline of a Sociology of Forms

    Directory of Open Access Journals (Sweden)

    Axel Pohn-Weidinger

    2012-05-01

    Full Text Available Ce texte interroge les traces graphiques laissées sur un corpus de formulaires de demande de logement social telles qu’annotations, ratures, biffures et commentaires griffonnés. L’étude de ces traces, laissées en marge des catégories de l’imprimé administratif lors du remplissage, montre le recours au droit comme une opération problématique. Pour les administrés, il s’agit de décrire leur situation de vie de sorte à établir l’éligibilité à un droit, mais bien souvent il est impossible de traduire celle-ci dans les catégories préétablies du formulaire. Les annotations et commentaires laissés sur le formulaire tentent alors d’ouvrir la catégorisation juridique des situations à une prise en compte de la singularité des circonstances de vie du demandeur. Elles montrent le recours au droit comme un accomplissement réflexif, un travail à la fois sur sa propre perception de sa situation et sur celle que l’institution offre à travers le formulaire, et dont la négociation et la mise en œuvre sont au cœur de la production du dossier administratif.This text examines the graphical traces left on a collection of social housing application forms: annotations, erasures, crossed-out words and scribbled-out comments. The study of these traces, left in the margins of the categories on printed administrative forms in the process of being completed, shows the exercising of a right as a problematic operation. Citizens making applications must describe their living situation in a way that will establish their eligibility for a right, but quite often it is impossible to convey this through the form’s predetermined categories. The annotations and comments left on the form attempt to open the legal classification of situations to considering the uniqueness of the applicant’s living circumstances. They show the use of a right as an introspective accomplishment, requiring applicants to work both on their own perception of

  9. MitoBamAnnotator: A web-based tool for detecting and annotating heteroplasmy in human mitochondrial DNA sequences.

    Science.gov (United States)

    Zhidkov, Ilia; Nagar, Tal; Mishmar, Dan; Rubin, Eitan

    2011-11-01

    The use of Next-Generation Sequencing of mitochondrial DNA is becoming widespread in biological and clinical research. This, in turn, creates a need for a convenient tool that detects and analyzes heteroplasmy. Here we present MitoBamAnnotator, a user friendly web-based tool that allows maximum flexibility and control in heteroplasmy research. MitoBamAnnotator provides the user with a comprehensively annotated overview of mitochondrial genetic variation, allowing for an in-depth analysis with no prior knowledge in programming.

  10. Creating New Medical Ontologies for Image Annotation A Case Study

    CERN Document Server

    Stanescu, Liana; Brezovan, Marius; Mihai, Cristian Gabriel

    2012-01-01

    Creating New Medical Ontologies for Image Annotation focuses on the problem of the medical images automatic annotation process, which is solved in an original manner by the authors. All the steps of this process are described in detail with algorithms, experiments and results. The original algorithms proposed by authors are compared with other efficient similar algorithms. In addition, the authors treat the problem of creating ontologies in an automatic way, starting from Medical Subject Headings (MESH). They have presented some efficient and relevant annotation models and also the basics of the annotation model used by the proposed system: Cross Media Relevance Models. Based on a text query the system will retrieve the images that contain objects described by the keywords.

  11. Geothermal wetlands: an annotated bibliography of pertinent literature

    Energy Technology Data Exchange (ETDEWEB)

    Stanley, N.E.; Thurow, T.L.; Russell, B.F.; Sullivan, J.F.

    1980-05-01

    This annotated bibliography covers the following topics: algae, wetland ecosystems; institutional aspects; macrophytes - general, production rates, and mineral absorption; trace metal absorption; wetland soils; water quality; and other aspects of marsh ecosystems. (MHR)

  12. Descriptive Cataloging: A Selected, Annotated Bibliography, 1984-1985.

    Science.gov (United States)

    Cook, C. Donald; Jones, Ellen

    1986-01-01

    This annotated bibliography of materials published during 1984-1985 on descriptive cataloging covers bibliographic control, Anglo American Cataloging Rules, 2nd edition (AACR2), specific types of materials, authority control, retrospective conversion, management issues, expert systems, and manuals. (EM)

  13. Annotation Method (AM): SE26_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available o compound name or compound category name can assign, predicted molecular formulas are used for the annotati...on. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAFT2012A ...

  14. Annotation Method (AM): SE34_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available o compound name or compound category name can assign, predicted molecular formulas are used for the annotati...on. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAFT2012A ...

  15. Annotation Method (AM): SE29_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available o compound name or compound category name can assign, predicted molecular formulas are used for the annotati...on. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAFT2013A ...

  16. Annotation Method (AM): SE36_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available o compound name or compound category name can assign, predicted molecular formulas are used for the annotati...on. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAFT2012A ...

  17. Annotation Method (AM): SE32_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available o compound name or compound category name can assign, predicted molecular formulas are used for the annotati...on. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAFT2012A ...

  18. Annotation Method (AM): SE17_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available o compound name or compound category name can assign, predicted molecular formulas are used for the annotati...on. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAFT2012A ...

  19. Annotation Method (AM): SE8_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available compound name or compound category name can assign, predicted molecular formulas are used for the annotatio...n. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAFT2012A ...

  20. Annotation Method (AM): SE5_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available compound name or compound category name can assign, predicted molecular formulas are used for the annotatio...n. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAFT2012A ...

  1. Annotation Method (AM): SE28_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available o compound name or compound category name can assign, predicted molecular formulas are used for the annotati...on. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAFT2013A ...

  2. Annotation Method (AM): SE7_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available compound name or compound category name can assign, predicted molecular formulas are used for the annotatio...n. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAOT2012A ...

  3. Annotation Method (AM): SE14_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available o compound name or compound category name can assign, predicted molecular formulas are used for the annotati...on. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAFT2012A ...

  4. Annotation Method (AM): SE13_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available o compound name or compound category name can assign, predicted molecular formulas are used for the annotati...on. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAFT2012A ...

  5. Annotation Method (AM): SE2_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available compound name or compound category name can assign, predicted molecular formulas are used for the annotatio...n. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAFT2011A ...

  6. Annotation Method (AM): SE12_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available o compound name or compound category name can assign, predicted molecular formulas are used for the annotati...on. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAFT2012A ...

  7. Annotation Method (AM): SE9_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available compound name or compound category name can assign, predicted molecular formulas are used for the annotatio...n. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAFT2012A ...

  8. Annotation Method (AM): SE1_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available compound name or compound category name can assign, predicted molecular formulas are used for the annotatio...n. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAFT2011A ...

  9. Annotation Method (AM): SE6_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available compound name or compound category name can assign, predicted molecular formulas are used for the annotatio...n. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAOT2012A ...

  10. Annotation Method (AM): SE3_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available compound name or compound category name can assign, predicted molecular formulas are used for the annotatio...n. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAFT2011A ...

  11. Annotation Method (AM): SE4_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available compound name or compound category name can assign, predicted molecular formulas are used for the annotatio...n. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAFT2012A ...

  12. Annotation Method (AM): SE16_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available o compound name or compound category name can assign, predicted molecular formulas are used for the annotati...on. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAFT2012A ...

  13. Annotation Method (AM): SE20_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available o compound name or compound category name can assign, predicted molecular formulas are used for the annotati...on. Peaks without predicted molecular formula are assigned as unidentified peak. Terms of chemical category

  14. Annotation Method (AM): SE15_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available o compound name or compound category name can assign, predicted molecular formulas are used for the annotati...on. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAOT201112 ...

  15. Annotation Method (AM): SE25_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available o compound name or compound category name can assign, predicted molecular formulas are used for the annotati...on. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAFT2012A ...

  16. Annotation Method (AM): SE11_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available o compound name or compound category name can assign, predicted molecular formulas are used for the annotati...on. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAFT2012A ...

  17. Annotation Method (AM): SE33_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available o compound name or compound category name can assign, predicted molecular formulas are used for the annotati...on. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAFT2012A ...

  18. Annotation Method (AM): SE27_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available o compound name or compound category name can assign, predicted molecular formulas are used for the annotati...on. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAFT2012A ...

  19. Annotation Method (AM): SE35_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available o compound name or compound category name can assign, predicted molecular formulas are used for the annotati...on. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAFT2012A ...

  20. Annotation Method (AM): SE30_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available o compound name or compound category name can assign, predicted molecular formulas are used for the annotati...on. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAFT2012A ...

  1. Annotation Method (AM): SE31_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available o compound name or compound category name can assign, predicted molecular formulas are used for the annotati...on. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAFT2012A ...

  2. Annotation Method (AM): SE10_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available o compound name or compound category name can assign, predicted molecular formulas are used for the annotati...on. Peaks without predicted molecular formula are assigned as unidentified peak. TogoAnalysisMethodID=TAFT2012A ...

  3. Effects of dehydration on performance in man: Annotated bibliography

    Science.gov (United States)

    Greenleaf, J. E.

    1973-01-01

    A compilation of studies on the effect of dehydration on human performance and related physiological mechanisms. The annotations are listed in alphabetical order by first author and cover material through June 1973.

  4. OntoELAN: An Ontology-based Linguistic Multimedia Annotator

    CERN Document Server

    Chebotko, Artem; Lu, Shiyong; Fotouhi, Farshad; Aristar, Anthony; Brugman, Hennie; Klassmann, Alexander; Sloetjes, Han; Russel, Albert; Wittenburg, Peter

    2009-01-01

    Despite its scientific, political, and practical value, comprehensive information about human languages, in all their variety and complexity, is not readily obtainable and searchable. One reason is that many language data are collected as audio and video recordings which imposes a challenge to document indexing and retrieval. Annotation of multimedia data provides an opportunity for making the semantics explicit and facilitates the searching of multimedia documents. We have developed OntoELAN, an ontology-based linguistic multimedia annotator that features: (1) support for loading and displaying ontologies specified in OWL; (2) creation of a language profile, which allows a user to choose a subset of terms from an ontology and conveniently rename them if needed; (3) creation of ontological tiers, which can be annotated with profile terms and, therefore, corresponding ontological terms; and (4) saving annotations in the XML format as Multimedia Ontology class instances and, linked to them, class instances of o...

  5. An Annotated Checklist of the Fishes of Samoa

    Data.gov (United States)

    US Fish and Wildlife Service, Department of the Interior — All fishes currently known from the Samoan Islands are listed by their scientific and Samoan names. Species entries are annotated to include the initial Samoan...

  6. Collaborative Semantic Annotation of Images : Ontology-Based Model

    Directory of Open Access Journals (Sweden)

    Damien E. ZOMAHOUN

    2015-12-01

    Full Text Available In the quest for models that could help to represen t the meaning of images, some approaches have used contextual knowledge by building semantic hierarchi es. Others have resorted to the integration of imag es analysis improvement knowledge and images interpret ation using ontologies. The images are often annotated with a set of keywords (or ontologies, w hose relevance remains highly subjective and relate d to only one interpretation (one annotator. However , an image can get many associated semantics because annotators can interpret it differently. Th e purpose of this paper is to propose a collaborati ve annotation system that brings out the meaning of im ages from the different interpretations of annotato rs. The different works carried out in this paper lead to a semantic model of an image, i.e. the different means that a picture may have. This method relies o n the different tools of the Semantic Web, especial ly ontologies.

  7. Annotated Bibliography of Recent Research Related to Academic Advising

    Science.gov (United States)

    Mottarella, Karen, Comp.

    2011-01-01

    This article presents an annotated bibliography of recent research related to academic advising. It includes research papers that focus on advising and a special section of the "Journal of Career Development" that is devoted to multicultural graduate advising relationships.

  8. A Machine Learning Based Analytical Framework for Semantic Annotation Requirements

    CERN Document Server

    Hassanzadeh, Hamed; 10.5121/ijwest.2011.2203

    2011-01-01

    The Semantic Web is an extension of the current web in which information is given well-defined meaning. The perspective of Semantic Web is to promote the quality and intelligence of the current web by changing its contents into machine understandable form. Therefore, semantic level information is one of the cornerstones of the Semantic Web. The process of adding semantic metadata to web resources is called Semantic Annotation. There are many obstacles against the Semantic Annotation, such as multilinguality, scalability, and issues which are related to diversity and inconsistency in content of different web pages. Due to the wide range of domains and the dynamic environments that the Semantic Annotation systems must be performed on, the problem of automating annotation process is one of the significant challenges in this domain. To overcome this problem, different machine learning approaches such as supervised learning, unsupervised learning and more recent ones like, semi-supervised learning and active learn...

  9. GIFtS: annotation landscape analysis with GeneCards

    Directory of Open Access Journals (Sweden)

    Dalah Irina

    2009-10-01

    Full Text Available Abstract Background Gene annotation is a pivotal component in computational genomics, encompassing prediction of gene function, expression analysis, and sequence scrutiny. Hence, quantitative measures of the annotation landscape constitute a pertinent bioinformatics tool. GeneCards® is a gene-centric compendium of rich annotative information for over 50,000 human gene entries, building upon 68 data sources, including Gene Ontology (GO, pathways, interactions, phenotypes, publications and many more. Results We present the GeneCards Inferred Functionality Score (GIFtS which allows a quantitative assessment of a gene's annotation status, by exploiting the unique wealth and diversity of GeneCards information. The GIFtS tool, linked from the GeneCards home page, facilitates browsing the human genome by searching for the annotation level of a specified gene, retrieving a list of genes within a specified range of GIFtS value, obtaining random genes with a specific GIFtS value, and experimenting with the GIFtS weighting algorithm for a variety of annotation categories. The bimodal shape of the GIFtS distribution suggests a division of the human gene repertoire into two main groups: the high-GIFtS peak consists almost entirely of protein-coding genes; the low-GIFtS peak consists of genes from all of the categories. Cluster analysis of GIFtS annotation vectors provides the classification of gene groups by detailed positioning in the annotation arena. GIFtS also provide measures which enable the evaluation of the databases that serve as GeneCards sources. An inverse correlation is found (for GIFtS>25 between the number of genes annotated by each source, and the average GIFtS value of genes associated with that source. Three typical source prototypes are revealed by their GIFtS distribution: genome-wide sources, sources comprising mainly highly annotated genes, and sources comprising mainly poorly annotated genes. The degree of accumulated knowledge for a

  10. Combined evidence annotation of transposable elements in genome sequences.

    Directory of Open Access Journals (Sweden)

    Hadi Quesneville

    2005-07-01

    Full Text Available Transposable elements (TEs are mobile, repetitive sequences that make up significant fractions of metazoan genomes. Despite their near ubiquity and importance in genome and chromosome biology, most efforts to annotate TEs in genome sequences rely on the results of a single computational program, RepeatMasker. In contrast, recent advances in gene annotation indicate that high-quality gene models can be produced from combining multiple independent sources of computational evidence. To elevate the quality of TE annotations to a level comparable to that of gene models, we have developed a combined evidence-model TE annotation pipeline, analogous to systems used for gene annotation, by integrating results from multiple homology-based and de novo TE identification methods. As proof of principle, we have annotated "TE models" in Drosophila melanogaster Release 4 genomic sequences using the combined computational evidence derived from RepeatMasker, BLASTER, TBLASTX, all-by-all BLASTN, RECON, TE-HMM and the previous Release 3.1 annotation. Our system is designed for use with the Apollo genome annotation tool, allowing automatic results to be curated manually to produce reliable annotations. The euchromatic TE fraction of D. melanogaster is now estimated at 5.3% (cf. 3.86% in Release 3.1, and we found a substantially higher number of TEs (n = 6,013 than previously identified (n = 1,572. Most of the new TEs derive from small fragments of a few hundred nucleotides long and highly abundant families not previously annotated (e.g., INE-1. We also estimated that 518 TE copies (8.6% are inserted into at least one other TE, forming a nest of elements. The pipeline allows rapid and thorough annotation of even the most complex TE models, including highly deleted and/or nested elements such as those often found in heterochromatic sequences. Our pipeline can be easily adapted to other genome sequences, such as those of the D. melanogaster heterochromatin or other

  11. TOPSAN: use of a collaborative environment for annotating, analyzing and disseminating data on JCSG and PSI structures

    International Nuclear Information System (INIS)

    Specific use cases of TOPSAN, an innovative collaborative platform for creating, sharing and distributing annotations and insights about protein structures, such as those determined by high-throughput structural genomics in the Protein Structure Initiative (PSI), are described. TOPSAN is the main annotation platform for JCSG structures and serves as a conduit for initiating collaborations with the biological community, as illustrated in this special issue of Acta Crystallographica Section F. Developed at the JCSG with the goal of opening a dialogue on the novel protein structures with the broader biological community, TOPSAN is a unique tool for fostering distributed collaborations and provides an efficient pathway to peer-reviewed publications. The NIH Protein Structure Initiative centers, such as the Joint Center for Structural Genomics (JCSG), have developed highly efficient technological platforms that are capable of experimentally determining the three-dimensional structures of hundreds of proteins per year. However, the overwhelming majority of the almost 5000 protein structures determined by these centers have yet to be described in the peer-reviewed literature. In a high-throughput structural genomics environment, the process of structure determination occurs independently of any associated experimental characterization of function, which creates a challenge for the annotation and analysis of structures and the publication of these results. This challenge has been addressed by developing TOPSAN (‘The Open Protein Structure Annotation Network’), which enables the generation of knowledge via collaborations among globally distributed contributors supported by automated amalgamation of available information. TOPSAN currently provides annotations for all protein structures determined by the JCSG in addition to preliminary annotations on a large number of structures from the other PSI production centers. TOPSAN-enabled collaborations have resulted in

  12. Transcriptator: An Automated Computational Pipeline to Annotate Assembled Reads and Identify Non Coding RNA.

    Directory of Open Access Journals (Sweden)

    Kumar Parijat Tripathi

    Full Text Available RNA-seq is a new tool to measure RNA transcript counts, using high-throughput sequencing at an extraordinary accuracy. It provides quantitative means to explore the transcriptome of an organism of interest. However, interpreting this extremely large data into biological knowledge is a problem, and biologist-friendly tools are lacking. In our lab, we developed Transcriptator, a web application based on a computational Python pipeline with a user-friendly Java interface. This pipeline uses the web services available for BLAST (Basis Local Search Alignment Tool, QuickGO and DAVID (Database for Annotation, Visualization and Integrated Discovery tools. It offers a report on statistical analysis of functional and Gene Ontology (GO annotation's enrichment. It helps users to identify enriched biological themes, particularly GO terms, pathways, domains, gene/proteins features and protein-protein interactions related informations. It clusters the transcripts based on functional annotations and generates a tabular report for functional and gene ontology annotations for each submitted transcript to the web server. The implementation of QuickGo web-services in our pipeline enable the users to carry out GO-Slim analysis, whereas the integration of PORTRAIT (Prediction of transcriptomic non coding RNA (ncRNA by ab initio methods helps to identify the non coding RNAs and their regulatory role in transcriptome. In summary, Transcriptator is a useful software for both NGS and array data. It helps the users to characterize the de-novo assembled reads, obtained from NGS experiments for non-referenced organisms, while it also performs the functional enrichment analysis of differentially expressed transcripts/genes for both RNA-seq and micro-array experiments. It generates easy to read tables and interactive charts for better understanding of the data. The pipeline is modular in nature, and provides an opportunity to add new plugins in the future. Web application is

  13. Experimental annotation of post-translational features and translated coding regions in the pathogen Salmonella Typhimurium

    Directory of Open Access Journals (Sweden)

    Smith Richard D

    2011-08-01

    Full Text Available Abstract Background Complete and accurate genome annotation is crucial for comprehensive and systematic studies of biological systems. However, determining protein-coding genes for most new genomes is almost completely performed by inference using computational predictions with significant documented error rates (> 15%. Furthermore, gene prediction programs provide no information on biologically important post-translational processing events critical for protein function. Results We experimentally annotated the bacterial pathogen Salmonella Typhimurium 14028, using "shotgun" proteomics to accurately uncover the translational landscape and post-translational features. The data provide protein-level experimental validation for approximately half of the predicted protein-coding genes in Salmonella and suggest revisions to several genes that appear to have incorrectly assigned translational start sites, including a potential novel alternate start codon. Additionally, we uncovered 12 non-annotated genes missed by gene prediction programs, as well as evidence suggesting a role for one of these novel ORFs in Salmonella pathogenesis. We also characterized post-translational features in the Salmonella genome, including chemical modifications and proteolytic cleavages. We find that bacteria have a much larger and more complex repertoire of chemical modifications than previously thought including several novel modifications. Our in vivo proteolysis data identified more than 130 signal peptide and N-terminal methionine cleavage events critical for protein function. Conclusion This work highlights several ways in which application of proteomics data can improve the quality of genome annotations to facilitate novel biological insights and provides a comprehensive proteome map of Salmonella as a resource for systems analysis.

  14. DATA ANNOTATION AND RELATIONS MODELING FOR INTEGRATED OMICS IN CLINICAL RESEARCH

    Directory of Open Access Journals (Sweden)

    Arno Lukas

    2010-07-01

    Full Text Available Omics has massively permeated translational clinical research with numerous diseases being covered by Omics studies from the genome to the metabolome level. Integrating these disease specific Omics tracks appears a logical next step for building the fundament of Systems Biology and Systems Medicine. Here, coherence of individual Omics tracks regarding clinical hypothesis, samples and clinical descriptors, and finally data handling and integration become pivotal. We present a data integration, annotation and relations modeling concept for heterogeneous Omics data and workflows. With molecular features at the center of all Omics we link the result profiles from different Omics tracks characterizing a specific disease phenotype to a common human molecular reference network for allowing a seamless integration and subsequent support in interpretation of Omics screening results. Our concept rests on data structures for representing objects specified by metadata and content. For handling diverse Omics tracks a flexible structure for content is proposed allowing data representation at different levels of granularity as demanded by the type of Omics and specific type of data. Content on the molecular level includes deep annotation of molecular features on gene and protein level. Based on this annotation pair-wise relations between molecular objects are built, traversing the molecular annotation into a network of relations (molecular feature graph. Such a relation network is also built on the Omics data level, combining explicit relations derived from study setup and implicit relations generated by mining metadata and content (Omics data graph. Finally both graphs are merged utilizing the molecular feature level as common denominator, enabling a persistent integration and subsequently interpretation of Omics profiling results in the realm of a given clinical hypothesis. We present a case study on integrating transcriptomics and proteomics data on chronic

  15. AmiGO: online access to ontology and annotation data

    Energy Technology Data Exchange (ETDEWEB)

    Carbon, Seth; Ireland, Amelia; Mungall, Christopher J.; Shu, ShengQiang; Marshall, Brad; Lewis, Suzanna

    2009-01-15

    AmiGO is a web application that allows users to query, browse, and visualize ontologies and related gene product annotation (association) data. AmiGO can be used online at the Gene Ontology (GO) website to access the data provided by the GO Consortium; it can also be downloaded and installed to browse local ontologies and annotations. AmiGO is free open source software developed and maintained by the GO Consortium.

  16. CATMAID: collaborative annotation toolkit for massive amounts of image data

    OpenAIRE

    Saalfeld, Stephan; Cardona, Albert; Hartenstein, Volker; Tomančák, Pavel

    2009-01-01

    Summary: High-resolution, three-dimensional (3D) imaging of large biological specimens generates massive image datasets that are difficult to navigate, annotate and share effectively. Inspired by online mapping applications like GoogleMaps™, we developed a decentralized web interface that allows seamless navigation of arbitrarily large image stacks. Our interface provides means for online, collaborative annotation of the biological image data and seamless sharing of regions of interest by boo...

  17. Semantic annotation of biological concepts interplaying microbial cellular responses

    Directory of Open Access Journals (Sweden)

    Carreira Rafael

    2011-11-01

    Full Text Available Abstract Background Automated extraction systems have become a time saving necessity in Systems Biology. Considerable human effort is needed to model, analyse and simulate biological networks. Thus, one of the challenges posed to Biomedical Text Mining tools is that of learning to recognise a wide variety of biological concepts with different functional roles to assist in these processes. Results Here, we present a novel corpus concerning the integrated cellular responses to nutrient starvation in the model-organism Escherichia coli. Our corpus is a unique resource in that it annotates biomedical concepts that play a functional role in expression, regulation and metabolism. Namely, it includes annotations for genetic information carriers (genes and DNA, RNA molecules, proteins (transcription factors, enzymes and transporters, small metabolites, physiological states and laboratory techniques. The corpus consists of 130 full-text papers with a total of 59043 annotations for 3649 different biomedical concepts; the two dominant classes are genes (highest number of unique concepts and compounds (most frequently annotated concepts, whereas other important cellular concepts such as proteins account for no more than 10% of the annotated concepts. Conclusions To the best of our knowledge, a corpus that details such a wide range of biological concepts has never been presented to the text mining community. The inter-annotator agreement statistics provide evidence of the importance of a consolidated background when dealing with such complex descriptions, the ambiguities naturally arising from the terminology and their impact for modelling purposes. Availability is granted for the full-text corpora of 130 freely accessible documents, the annotation scheme and the annotation guidelines. Also, we include a corpus of 340 abstracts.

  18. Developing a lexical resource annotated with semantic roles for Portuguese

    OpenAIRE

    Leonardo Zilio; Carlos Ramisch; Maria José Bocorny Finatto

    2014-01-01

    The objectives of this study are as follows: to present a methodology for the development of a lexical resource with semantic information; to compare semantic roles in specialized and non-specialized language; and to observe the semantic role labeling (SRL) made by a group of annotators. Two experiments revolving around SRL in Portuguese were developed: a comparison between data in specialized and non-specialized language corpora; and an annotation evaluation for verifying the agreement among...

  19. Experimental Polish-Lithuanian Corpus with the Semantic Annotation Elements

    Directory of Open Access Journals (Sweden)

    Danuta Roszko

    2015-06-01

    Full Text Available Experimental Polish-Lithuanian Corpus with the Semantic Annotation Elements In the article the authors present the experimental Polish-Lithuanian corpus (ECorpPL-LT formed for the idea of Polish-Lithuanian theoretical contrastive studies, a Polish-Lithuanian electronic dictionary, and as help for a sworn translator. The semantic annotation being brought into ECorpPL-LT is extremely useful in Polish-Lithuanian contrastive studies, and also proves helpful in translation work.

  20. Enriching a biomedical event corpus with meta-knowledge annotation

    OpenAIRE

    Thompson Paul; Nawaz Raheel; McNaught John; Ananiadou Sophia

    2011-01-01

    Abstract Background Biomedical papers contain rich information about entities, facts and events of biological relevance. To discover these automatically, we use text mining techniques, which rely on annotated corpora for training. In order to extract protein-protein interactions, genotype-phenotype/gene-disease associations, etc., we rely on event corpora that are annotated with classified, structured representations of important facts and findings contained within text. These provide an impo...

  1. Challenges in Whole-Genome Annotation of Pyrosequenced Eukaryotic Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Kuo, Alan; Grigoriev, Igor

    2009-04-17

    Pyrosequencing technologies such as 454/Roche and Solexa/Illumina vastly lower the cost of nucleotide sequencing compared to the traditional Sanger method, and thus promise to greatly expand the number of sequenced eukaryotic genomes. However, the new technologies also bring new challenges such as shorter reads and new kinds and higher rates of sequencing errors, which complicate genome assembly and gene prediction. At JGI we are deploying 454 technology for the sequencing and assembly of ever-larger eukaryotic genomes. Here we describe our first whole-genome annotation of a purely 454-sequenced fungal genome that is larger than a yeast (>30 Mbp). The pezizomycotine (filamentous ascomycote) Aspergillus carbonarius belongs to the Aspergillus section Nigri species complex, members of which are significant as platforms for bioenergy and bioindustrial technology, as members of soil microbial communities and players in the global carbon cycle, and as agricultural toxigens. Application of a modified version of the standard JGI Annotation Pipeline has so far predicted ~;;10k genes. ~;;12percent of these preliminary annotations suffer a potential frameshift error, which is somewhat higher than the ~;;9percent rate in the Sanger-sequenced and conventionally assembled and annotated genome of fellow Aspergillus section Nigri member A. niger. Also,>90percent of A. niger genes have potential homologs in the A. carbonarius preliminary annotation. Weconclude, and with further annotation and comparative analysis expect to confirm, that 454 sequencing strategies provide a promising substrate for annotation of modestly sized eukaryotic genomes. We will also present results of annotation of a number of other pyrosequenced fungal genomes of bioenergy interest.

  2. Computational evaluation of TIS annotation for prokaryotic genomes

    OpenAIRE

    Zhu Huaiqiu; Ju Li-Ning; Zheng Xiaobin; Hu Gang-Qing; She Zhen-Su

    2008-01-01

    Abstract Background Accurate annotation of translation initiation sites (TISs) is essential for understanding the translation initiation mechanism. However, the reliability of TIS annotation in widely used databases such as RefSeq is uncertain due to the lack of experimental benchmarks. Results Based on a homogeneity assumption that gene translation-related signals are uniformly distributed across a genome, we have established a computational method for a large-scale quantitative assessment o...

  3. Comparative omics-driven genome annotation refinement: application across Yersiniae.

    Directory of Open Access Journals (Sweden)

    Alexandra C Schrimpe-Rutledge

    Full Text Available Genome sequencing continues to be a rapidly evolving technology, yet most downstream aspects of genome annotation pipelines remain relatively stable or are even being abandoned. The annotation process is now performed almost exclusively in an automated fashion to balance the large number of sequences generated. One possible way of reducing errors inherent to automated computational annotations is to apply data from omics measurements (i.e. transcriptional and proteomic to the un-annotated genome with a proteogenomic-based approach. Here, the concept of annotation refinement has been extended to include a comparative assessment of genomes across closely related species. Transcriptomic and proteomic data derived from highly similar pathogenic Yersiniae (Y. pestis CO92, Y. pestis Pestoides F, and Y. pseudotuberculosis PB1/+ was used to demonstrate a comprehensive comparative omic-based annotation methodology. Peptide and oligo measurements experimentally validated the expression of nearly 40% of each strain's predicted proteome and revealed the identification of 28 novel and 68 incorrect (i.e., observed frameshifts, extended start sites, and translated pseudogenes protein-coding sequences within the three current genome annotations. Gene loss is presumed to play a major role in Y. pestis acquiring its niche as a virulent pathogen, thus the discovery of many translated pseudogenes, including the insertion-ablated argD, underscores a need for functional analyses to investigate hypotheses related to divergence. Refinements included the discovery of a seemingly essential ribosomal protein, several virulence-associated factors, a transcriptional regulator, and many hypothetical proteins that were missed during annotation.

  4. Experimental Polish-Lithuanian Corpus with the Semantic Annotation Elements

    OpenAIRE

    Danuta Roszko; Roman Roszko

    2015-01-01

    Experimental Polish-Lithuanian Corpus with the Semantic Annotation ElementsIn the article the authors present the experimental Polish-Lithuanian corpus (ECorpPL-LT) formed for the idea of Polish-Lithuanian theoretical contrastive studies, a Polish-Lithuanian electronic dictionary, and as help for a sworn translator. The semantic annotation being brought into ECorpPL-LT is extremely useful in Polish-Lithuanian contrastive studies, and also proves helpful in translation work.

  5. JAABA: interactive machine learning for automatic annotation of animal behavior

    OpenAIRE

    Kabra, Mayank; Robie, Alice A.; Rivera-Alba, Marta; Branson, Steven; Branson, Kristin

    2013-01-01

    We present a machine learning-based system for automatically computing interpretable, quantitative measures of animal behavior. Through our interactive system, users encode their intuition about behavior by annotating a small set of video frames. These manual labels are converted into classifiers that can automatically annotate behaviors in screen-scale data sets. Our general-purpose system can create a variety of accurate individual and social behavior classifiers for different organisms, in...

  6. MITOS: improved de novo metazoan mitochondrial genome annotation.

    Science.gov (United States)

    Bernt, Matthias; Donath, Alexander; Jühling, Frank; Externbrink, Fabian; Florentz, Catherine; Fritzsch, Guido; Pütz, Joern; Middendorf, Martin; Stadler, Peter F

    2013-11-01

    About 2000 completely sequenced mitochondrial genomes are available from the NCBI RefSeq data base together with manually curated annotations of their protein-coding genes, rRNAs, and tRNAs. This annotation information, which has accumulated over two decades, has been obtained with a diverse set of computational tools and annotation strategies. Despite all efforts of manual curation it is still plagued by misassignments of reading directions, erroneous gene names, and missing as well as false positive annotations in particular for the RNA genes. Taken together, this causes substantial problems for fully automatic pipelines that aim to use these data comprehensively for studies of animal phylogenetics and the molecular evolution of mitogenomes. The MITOS pipeline is designed to compute a consistent de novo annotation of the mitogenomic sequences. We show that the results of MITOS match RefSeq and MitoZoa in terms of annotation coverage and quality. At the same time we avoid biases, inconsistencies of nomenclature, and typos originating from manual curation strategies. The MITOS pipeline is accessible online at http://mitos.bioinf.uni-leipzig.de.

  7. MetaStorm: A Public Resource for Customizable Metagenomics Annotation.

    Science.gov (United States)

    Arango-Argoty, Gustavo; Singh, Gargi; Heath, Lenwood S; Pruden, Amy; Xiao, Weidong; Zhang, Liqing

    2016-01-01

    Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/), which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution. PMID:27632579

  8. A semi-automatic annotation tool for cooking video

    Science.gov (United States)

    Bianco, Simone; Ciocca, Gianluigi; Napoletano, Paolo; Schettini, Raimondo; Margherita, Roberto; Marini, Gianluca; Gianforme, Giorgio; Pantaleo, Giuseppe

    2013-03-01

    In order to create a cooking assistant application to guide the users in the preparation of the dishes relevant to their profile diets and food preferences, it is necessary to accurately annotate the video recipes, identifying and tracking the foods of the cook. These videos present particular annotation challenges such as frequent occlusions, food appearance changes, etc. Manually annotate the videos is a time-consuming, tedious and error-prone task. Fully automatic tools that integrate computer vision algorithms to extract and identify the elements of interest are not error free, and false positive and false negative detections need to be corrected in a post-processing stage. We present an interactive, semi-automatic tool for the annotation of cooking videos that integrates computer vision techniques under the supervision of the user. The annotation accuracy is increased with respect to completely automatic tools and the human effort is reduced with respect to completely manual ones. The performance and usability of the proposed tool are evaluated on the basis of the time and effort required to annotate the same video sequences.

  9. Metagenomic gene annotation by a homology-independent approach

    Energy Technology Data Exchange (ETDEWEB)

    Froula, Jeff; Zhang, Tao; Salmeen, Annette; Hess, Matthias; Kerfeld, Cheryl A.; Wang, Zhong; Du, Changbin

    2011-06-02

    Fully understanding the genetic potential of a microbial community requires functional annotation of all the genes it encodes. The recently developed deep metagenome sequencing approach has enabled rapid identification of millions of genes from a complex microbial community without cultivation. Current homology-based gene annotation fails to detect distantly-related or structural homologs. Furthermore, homology searches with millions of genes are very computational intensive. To overcome these limitations, we developed rhModeller, a homology-independent software pipeline to efficiently annotate genes from metagenomic sequencing projects. Using cellulases and carbonic anhydrases as two independent test cases, we demonstrated that rhModeller is much faster than HMMER but with comparable accuracy, at 94.5percent and 99.9percent accuracy, respectively. More importantly, rhModeller has the ability to detect novel proteins that do not share significant homology to any known protein families. As {approx}50percent of the 2 million genes derived from the cow rumen metagenome failed to be annotated based on sequence homology, we tested whether rhModeller could be used to annotate these genes. Preliminary results suggest that rhModeller is robust in the presence of missense and frameshift mutations, two common errors in metagenomic genes. Applying the pipeline to the cow rumen genes identified 4,990 novel cellulases candidates and 8,196 novel carbonic anhydrase candidates.In summary, we expect rhModeller to dramatically increase the speed and quality of metagnomic gene annotation.

  10. A Machine Learning Based Analytical Framework for Semantic Annotation Requirements

    Directory of Open Access Journals (Sweden)

    Hamed Hassanzadeh

    2011-04-01

    Full Text Available The Semantic Web is an extension of the current web in which information is given well-defined meaning.The perspective of Semantic Web is to promote the quality and intelligence of the current web by changingits contents into machine understandable form. Therefore, semantic level information is one of thecornerstones of the Semantic Web. The process of adding semantic metadata to web resources is calledSemantic Annotation. There are many obstacles against the Semantic Annotation, such as multilinguality,scalability, and issues which are related to diversity and inconsistency in content of different web pages.Due to the wide range of domains and the dynamic environments that the Semantic Annotation systemsmust be performed on, the problem of automating annotation process is one of the significant challenges inthis domain. To overcome this problem, different machine learning approaches such as supervisedlearning, unsupervised learning and more recent ones like, semi-supervised learning and active learninghave been utilized. In this paper we present an inclusive layered classification of Semantic Annotationchallenges and discuss the most important issues in this field. Also, we review and analyze machinelearning applications for solving semantic annotation problems. For this goal, the article tries to closelystudy and categorize related researches for better understanding and to reach a framework that can mapmachine learning techniques into the Semantic Annotation challenges and requirements.

  11. MetaStorm: A Public Resource for Customizable Metagenomics Annotation

    Science.gov (United States)

    Arango-Argoty, Gustavo; Singh, Gargi; Heath, Lenwood S.; Pruden, Amy; Xiao, Weidong; Zhang, Liqing

    2016-01-01

    Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/), which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution. PMID:27632579

  12. The Annotation Motivations and Characteristics of HUANG Jie ’s Annotation of XIE Kang - le’ s Poems%黄节《谢康乐诗注》的笺注动机和特色

    Institute of Scientific and Technical Information of China (English)

    李凤娇; 吉定

    2014-01-01

    HUANG Jie was devoted to the poetic annotations of the Han ,Wei and Six Dynasties and made a lot of achievements . Annotation of XIE Kang- le’ s Poems is among the classic ones .There are four reasons for HUANG Jie’s focus on poetic writings and poetic annotations :one is to educate and reform people and rectify public morals ;the second is to preserve the quintessence of Chinese culture ,to revive the ancient study ,and to revive“the special spirit of the country”;the third is to pay great attention to the decline of Kangle’s poetry and his worries which were always misunderstood ,so as to correct mistakes and commentate misun-derstandings ;the fourth is the result of the spiritual echo and the emotional alignment with XIE .Annotation of XIE Kang- le’ s Po-ems inherited the traditions of LI Shan’s annotation of Wen Xuan ,played up strengths and avoided weakness ;it commented on the poems with historical facts ,and annotated the poems in a poetic way ;this annotation had a very careful and objective language , pooled the good qualities of the masses .This annotation is also characterized by its annotation of the names of places and the annota-tion of consonants and vowels .Thus the above characteristics of HUANG’s annotation can considerably help people understand XIE’ s poems and the annotation has great academic values .%黄节先生曾致力于汉魏六朝诗注,成就斐然。《谢康乐诗注》便是其中的经典注释之一种。黄节先生之所以以诗为业,写诗注诗,一是为教化人心、匡正世风;二为保存国粹、复兴古学,重振“国家特别之精神”;三是心系康乐诗散亡、不易为人理解之忧而纠谬释误;四是与谢灵运精神相通、情感契合所致。《谢康乐诗注》继承李善《文选》注的传统,扬长避短;以史论诗、以诗注诗;案语严谨折衷、集众家所长。对于地名和声韵的注释,也颇具特点。以上笺注特色均为今人

  13. Experiments with crowdsourced re-annotation of a POS tagging data set

    DEFF Research Database (Denmark)

    Hovy, Dirk; Plank, Barbara; Søgaard, Anders

    2014-01-01

    Crowdsourcing lets us collect multiple annotations for an item from several annotators. Typically, these are annotations for non-sequential classification tasks. While there has been some work on crowdsourcing named entity annotations, researchers have assumed that syntactic tasks such as part......-of-speech (POS) tagging cannot be crowdsourced. This paper shows that workers can actually annotate sequential data almost as well as experts. Further, we show that the models learned from crowdsourced annotations fare as well as the models learned from expert annotations in downstream tasks....

  14. Enriching a biomedical event corpus with meta-knowledge annotation

    Directory of Open Access Journals (Sweden)

    Thompson Paul

    2011-10-01

    Full Text Available Abstract Background Biomedical papers contain rich information about entities, facts and events of biological relevance. To discover these automatically, we use text mining techniques, which rely on annotated corpora for training. In order to extract protein-protein interactions, genotype-phenotype/gene-disease associations, etc., we rely on event corpora that are annotated with classified, structured representations of important facts and findings contained within text. These provide an important resource for the training of domain-specific information extraction (IE systems, to facilitate semantic-based searching of documents. Correct interpretation of these events is not possible without additional information, e.g., does an event describe a fact, a hypothesis, an experimental result or an analysis of results? How confident is the author about the validity of her analyses? These and other types of information, which we collectively term meta-knowledge, can be derived from the context of the event. Results We have designed an annotation scheme for meta-knowledge enrichment of biomedical event corpora. The scheme is multi-dimensional, in that each event is annotated for 5 different aspects of meta-knowledge that can be derived from the textual context of the event. Textual clues used to determine the values are also annotated. The scheme is intended to be general enough to allow integration with different types of bio-event annotation, whilst being detailed enough to capture important subtleties in the nature of the meta-knowledge expressed in the text. We report here on both the main features of the annotation scheme, as well as its application to the GENIA event corpus (1000 abstracts with 36,858 events. High levels of inter-annotator agreement have been achieved, falling in the range of 0.84-0.93 Kappa. Conclusion By augmenting event annotations with meta-knowledge, more sophisticated IE systems can be trained, which allow interpretative

  15. Construction of an annotated corpus to support biomedical information extraction

    Directory of Open Access Journals (Sweden)

    McNaught John

    2009-10-01

    Full Text Available Abstract Background Information Extraction (IE is a component of text mining that facilitates knowledge discovery by automatically locating instances of interesting biomedical events from huge document collections. As events are usually centred on verbs and nominalised verbs, understanding the syntactic and semantic behaviour of these words is highly important. Corpora annotated with information concerning this behaviour can constitute a valuable resource in the training of IE components and resources. Results We have defined a new scheme for annotating sentence-bound gene regulation events, centred on both verbs and nominalised verbs. For each event instance, all participants (arguments in the same sentence are identified and assigned a semantic role from a rich set of 13 roles tailored to biomedical research articles, together with a biological concept type linked to the Gene Regulation Ontology. To our knowledge, our scheme is unique within the biomedical field in terms of the range of event arguments identified. Using the scheme, we have created the Gene Regulation Event Corpus (GREC, consisting of 240 MEDLINE abstracts, in which events relating to gene regulation and expression have been annotated by biologists. A novel method of evaluating various different facets of the annotation task showed that average inter-annotator agreement rates fall within the range of 66% - 90%. Conclusion The GREC is a unique resource within the biomedical field, in that it annotates not only core relationships between entities, but also a range of other important details about these relationships, e.g., location, temporal, manner and environmental conditions. As such, it is specifically designed to support bio-specific tool and resource development. It has already been used to acquire semantic frames for inclusion within the BioLexicon (a lexical, terminological resource to aid biomedical text mining. Initial experiments have also shown that the corpus may

  16. Development and Application of EST-SSR Markers in Pepper%辣椒EST-SSR标记的开发与应用

    Institute of Scientific and Technical Information of China (English)

    吴智明; 刘伟强; 唐鑫; 崔竣杰; 程蛟文; 胡开林

    2012-01-01

    通过对数据库中32 295条非冗余的辣椒EST序列进行搜索,发现3 396个SSR分布于3 277条EST序列中,EST-SSR频率为10.52%,平均分布距离为4.46 kb.在辣椒EST-SSR中,二核苷酸和三核苷酸重复基元占主导地位,分别占总SSR的43.02%和37.84%.优势重复基元为GA/TC、AG/CT和AT,分别占15.99%、11.98%和11.37%.用Primer Premier 5.0软件设计了420对引物,对辣椒抗疫病自交系‘B072’和感疫病自交系‘B088’进行PCR扩增,403对引物有扩增产物,引物有效扩增率为95.96%,其中76对引物有多态性,多态性引物占可扩增引物的18.86%.试验证明,利用辣椒EST序列开发SSR标记是可行的.%A total of 32 295 non-redundant expressed sequence tags (ESTs) in pepper were screened by using bioinformatics software to search for SSR motifs, 3 396 SSRs were sought out, distributing in 3 277 ESTs, corresponding to one SSR in every 4. 46 kb of the ESTs. Dinucleotide and trinucleotide repeats were major types among the obtained unigenes, accounting for 43. 02% and 37. 84% , respectively. GA/ TC (15. 99% ) , AG/CT( 11. 98% ) and AT( 11. 37% ) were the most abundant motifs. Based on the flanking sequences of these 3 396 SSRs, 420 primer pairs were designed by using Primer Premier 5. 0 software. 403 SSRs (95. 96% ) were successfully amplified and 76 of them (18. 86% ) showed polymorphism between ' B072' (the resistance inbred line) and ' B088' (the susceptible inbred line). These results proved that it is an effective and feasible approach to developing SSR markers based on ESTs in Capsicum annuum L.

  17. Development of EST-SSR markers by data mining in three species of shrimp: Litopenaeus vannamei, Litopenaeus stylirostris, and Trachypenaeus birdy.

    Science.gov (United States)

    Pérez, Franklin; Ortiz, Juan; Zhinaula, Mariuxi; Gonzabay, Cesar; Calderón, Jorge; Volckaert, Filip A M J

    2005-01-01

    We report on the data mining of publicly available Litopenaeus vannamei expressed sequence tags (ESTs) to generate simple sequence repeat (SSRs) markers and on their transferability between related Penaeid shrimp species. Repeat motifs were found in 3.8% of the evaluated ESTs at a frequency of one repeat every 7.8 kb of sequence data. A total of 206 primer pairs were designed, and 112 loci were amplified with the highest success in L. vannamei. A high percentage (69%) of EST-SSRs were transferable within the genus Litopenaeus. More than half of the amplified products were polymorphic in a small testing panel of L. vannamei. Evaluation of those primers in a larger testing panel showed that 72% of the markers fit Hardy-Weinberg equilibrium, which shows their utility for population genetic analysis. Additionally, a set of 26 of the EST-SSRs were evaluated for Mendelian segregation. A high percentage of monomorphic markers (46%) proved to be polymorphic by singles-stranded conformational polymorphism analysis. Because of the high number of ESTs available in public databases, a data mining approach similar to the one outlined here might yield high numbers of SSR markers in many animal taxa. PMID:16027992

  18. Analysis of Gene Expression Profiles in Leaf Tissues of Cultivated Peanuts and Development of EST-SSR Markers and Gene Discovery.

    Science.gov (United States)

    Guo, Baozhu; Chen, Xiaoping; Hong, Yanbin; Liang, Xuanqiang; Dang, Phat; Brenneman, Tim; Holbrook, Corley; Culbreath, Albert

    2009-01-01

    Peanut is vulnerable to a range of foliar diseases such as spotted wilt caused by Tomato spotted wilt virus (TSWV), early (Cercospora arachidicola) and late (Cercosporidium personatum) leaf spots, southern stem rot (Sclerotium rolfsii), and sclerotinia blight (Sclerotinia minor). In this study, we report the generation of 17,376 peanut expressed sequence tags (ESTs) from leaf tissues of a peanut cultivar (Tifrunner, resistant to TSWV and leaf spots) and a breeding line (GT-C20, susceptible to TSWV and leaf spots). After trimming vector and discarding low quality sequences, a total of 14,432 high-quality ESTs were selected for further analysis and deposition to GenBank. Sequence clustering resulted in 6,888 unique ESTs composed of 1,703 tentative consensus (TCs) sequences and 5185 singletons. A large number of ESTs (5717) representing genes of unknown functions were also identified. Among the unique sequences, there were 856 EST-SSRs identified. A total of 290 new EST-based SSR markers were developed and examined for amplification and polymorphism in cultivated peanut and wild species. Resequencing information of selected amplified alleles revealed that allelic diversity could be attributed mainly to differences in repeat type and length in the SSR regions. In addition, a few additional INDEL mutations and substitutions were observed in the regions flanking the microsatellite regions. In addition, some defense-related transcripts were also identified, such as putative oxalate oxidase (EU024476) and NBS-LRR domains. EST data in this study have provided a new source of information for gene discovery and development of SSR markers in cultivated peanut. A total of 16931 ESTs have been deposited to the NCBI GenBank database with accession numbers ES751523 to ES768453. PMID:19584933

  19. Effective and Efficient Multi-Facet Web Image Annotation

    Institute of Scientific and Technical Information of China (English)

    Jia Chen; Yi-He Zhu; Hao-Fen Wang; Wei Jin; Yong Yu

    2012-01-01

    The vast amount of images available on the Web request for an effective and efficient search service to help users find relevant images.The prevalent way is to provide a keyword interface for users to submit queries.However,the amount of images without any tags or annotations are beyond the reach of manual efforts.To overcome this,automatic image annotation techniques emerge,which are generally a process of selecting a suitable set of tags for a given image without user intervention.However,there are three main challenges with respect to Web-scale image annotation:scalability,noiseresistance and diversity.Scalability has a twofold meaning:first an automatic image annotation system should be scalable with respect to billions of images on the Web; second it should be able to automatically identify several relevant tags among a huge tag set for a given image within seconds or even faster.Noise-resistance means that the system should be robust enough against typos and ambiguous terms used in tags.Diversity represents that image content may include both scenes and objects,which are further described by multiple different image features constituting different facets in annotation.In this paper,we propose a unified framework to tackle the above three challenges for automatic Web image annotation.It mainly involves two components:tag candidate retrieval and multi-facet annotation.In the former content-based indexing and concept-based codebook are leveraged to solve scalability and noise-resistance issues.In the latter the joint feature map has been designed to describe different facets of tags in annotations and the relations between these facets.Tag graph is adopted to represent tags in the entire annotation and the structured learning technique is employed to construct a learning model on top of the tag graph based on the generated joint feature map.Millions of images from Flickr are used in our evaluation.Experimental results show that we have achieved 33% performance

  20. Using deep RNA sequencing for the structural annotation of the Laccaria bicolor mycorrhizal transcriptome.

    Directory of Open Access Journals (Sweden)

    Peter E Larsen

    Full Text Available BACKGROUND: Accurate structural annotation is important for prediction of function and required for in vitro approaches to characterize or validate the gene expression products. Despite significant efforts in the field, determination of the gene structure from genomic data alone is a challenging and inaccurate process. The ease of acquisition of transcriptomic sequence provides a direct route to identify expressed sequences and determine the correct gene structure. METHODOLOGY: We developed methods to utilize RNA-seq data to correct errors in the structural annotation and extend the boundaries of current gene models using assembly approaches. The methods were validated with a transcriptomic data set derived from the fungus Laccaria bicolor, which develops a mycorrhizal symbiotic association with the roots of many tree species. Our analysis focused on the subset of 1501 gene models that are differentially expressed in the free living vs. mycorrhizal transcriptome and are expected to be important elements related to carbon metabolism, membrane permeability and transport, and intracellular signaling. Of the set of 1501 gene models, 1439 (96% successfully generated modified gene models in which all error flags were successfully resolved and the sequences aligned to the genomic sequence. The remaining 4% (62 gene models either had deviations from transcriptomic data that could not be spanned or generated sequence that did not align to genomic sequence. The outcome of this process is a set of high confidence gene models that can be reliably used for experimental characterization of protein function. CONCLUSIONS: 69% of expressed mycorrhizal JGI "best" gene models deviated from the transcript sequence derived by this method. The transcriptomic sequence enabled correction of a majority of the structural inconsistencies and resulted in a set of validated models for 96% of the mycorrhizal genes. The method described here can be applied to improve gene

  1. Manual Whisker Annotator (MWA: A Modular Open-Source Tool

    Directory of Open Access Journals (Sweden)

    Brett M. Hewitt

    2016-04-01

    Full Text Available Rodents are key to generating translational data for healthcare research. Behavioural analyses, in particular, are integral to the non-invasive monitoring of rodent health and welfare. Finding quantitative behavioural measures mitigates stress, allowing for the animal behave freely while also enabling the same animal to be studied over the time-course of its life. Locomotion and whisking are both such quantitative behavioural measures, and have been found to be significantly impacted in rodent models of neurodegenerative disease. While automatic trackers of whiskers and locomotion exist, a manual tracker is required to validate these approaches, and also to annotate complex videos where these automatic versions fail. Manually annotating whiskers for research purposes is a long and tedious task and current software does little to provide an intuitive and simple interface to carry out this task. This led to the creation of the Manual Whisker Annotator (MWA. MWA is an open source, portable whisker annotation tool developed for the Windows platform. Not only does MWA make the process much quicker, it also provides added statistical tools to analyse the data. MWA was developed in C# and WPF using the .NET framework, and could be used in any situation where annotating or tracking multiple targets is desired.

  2. Fast Arc-Annotated Subsequence Matching in Linear Space

    DEFF Research Database (Denmark)

    Bille, Philip; Gørtz, Inge Li

    2010-01-01

    An arc-annotated string is a string of characters, called bases, augmented with a set; of pairs, called arcs, each connecting two bases. Given arc-annotated strings P and Q the arc-preserving subsequence problem is to determine if P can be obtained from Q by deleting bases from Q. Whenever a base...... is deleted any arc with an endpoint in that base is also deleted. Arc-annotated strings where the arcs are "nested" are a natural model of RNA molecules that captures both the primary and secondary structure of these. The arc-preserving subsequence problem for nested arc-annotated strings is basic primitive...... the previous time bound while significantly reducing the space from a quadratic term to linear. This is essential to process large RNA molecules where the space is a likely to be a bottleneck. To obtain our result we introduce several novel ideas which may be of independent interest for related problems on arc-annotated...

  3. Incorporating Feature-Based Annotations into Automatically Generated Knowledge Representations

    Science.gov (United States)

    Lumb, L. I.; Lederman, J. I.; Aldridge, K. D.

    2006-12-01

    Earth Science Markup Language (ESML) is efficient and effective in representing scientific data in an XML- based formalism. However, features of the data being represented are not accounted for in ESML. Such features might derive from events (e.g., a gap in data collection due to instrument servicing), identifications (e.g., a scientifically interesting area/volume in an image), or some other source. In order to account for features in an ESML context, we consider them from the perspective of annotation, i.e., the addition of information to existing documents without changing the originals. Although it is possible to extend ESML to incorporate feature-based annotations internally (e.g., by extending the XML schema for ESML), there are a number of complicating factors that we identify. Rather than pursuing the ESML-extension approach, we focus on an external representation for feature-based annotations via XML Pointer Language (XPointer). In previous work (Lumb &Aldridge, HPCS 2006, IEEE, doi:10.1109/HPCS.2006.26), we have shown that it is possible to extract relationships from ESML-based representations, and capture the results in the Resource Description Format (RDF). Thus we explore and report on this same requirement for XPointer-based annotations of ESML representations. As in our past efforts, the Global Geodynamics Project (GGP) allows us to illustrate with a real-world example this approach for introducing annotations into automatically generated knowledge representations.

  4. Annotations of Mexican bullfighting videos for semantic index

    Science.gov (United States)

    Montoya Obeso, Abraham; Oropesa Morales, Lester Arturo; Fernando Vázquez, Luis; Cocolán Almeda, Sara Ivonne; Stoian, Andrei; García Vázquez, Mireya Saraí; Zamudio Fuentes, Luis Miguel; Montiel Perez, Jesús Yalja; de la O Torres, Saul; Ramírez Acosta, Alejandro Alvaro

    2015-09-01

    The video annotation is important for web indexing and browsing systems. Indeed, in order to evaluate the performance of video query and mining techniques, databases with concept annotations are required. Therefore, it is necessary generate a database with a semantic indexing that represents the digital content of the Mexican bullfighting atmosphere. This paper proposes a scheme to make complex annotations in a video in the frame of multimedia search engine project. Each video is partitioned using our segmentation algorithm that creates shots of different length and different number of frames. In order to make complex annotations about the video, we use ELAN software. The annotations are done in two steps: First, we take note about the whole content in each shot. Second, we describe the actions as parameters of the camera like direction, position and deepness. As a consequence, we obtain a more complete descriptor of every action. In both cases we use the concepts of the TRECVid 2014 dataset. We also propose new concepts. This methodology allows to generate a database with the necessary information to create descriptors and algorithms capable to detect actions to automatically index and classify new bullfighting multimedia content.

  5. ESLO: from transcription to speakers' personal information annotation

    CERN Document Server

    Eshkol, Iris; Friburger, Nathalie

    2011-01-01

    This paper presents the preliminary works to put online a French oral corpus and its transcription. This corpus is the Socio-Linguistic Survey in Orleans, realized in 1968. First, we numerized the corpus, then we handwritten transcribed it with the Transcriber software adding different tags about speakers, time, noise, etc. Each document (audio file and XML file of the transcription) was described by a set of metadata stored in an XML format to allow an easy consultation. Second, we added different levels of annotations, recognition of named entities and annotation of personal information about speakers. This two annotation tasks used the CasSys system of transducer cascades. We used and modified a first cascade to recognize named entities. Then we built a second cascade to annote the designating entities, i.e. information about the speaker. These second cascade parsed the named entity annotated corpus. The objective is to locate information about the speaker and, also, what kind of information can designate ...

  6. Semantic annotation for live and posterity logging of video documents

    Science.gov (United States)

    Bertini, Marco; Del Bimbo, Alberto; Nunziati, W.

    2003-06-01

    Broadcasters usually envision two basic applications for video databases: Live Logging and Posterity Logging. The former aims at providing effective annotation of video in quasi-real time and supports extraction of meaningful clips from the live stream; it is usually performed by assistant producers working at the same location of the event. The latter provides annotation for later reuse of video material and is the prerequisite for retrieval by content from video digital libraries; it is performed by trained librarians. Both require that annotation is performed, at a great extent, automatically. Video information structure must encompass both low-intermediate level video organization and event relationships that define specific highlights and situations. Analysis of the visual data of the video stream permits to extract hints, identify events and detect highlights. All of this must be supported by a-priori knowledge of the video domain and effective reasoning engines capable to capture the inherent semantics of the visual events.

  7. Arc-preserving subsequences of arc-annotated sequences

    CERN Document Server

    Popov, Vladimir Yu

    2011-01-01

    Arc-annotated sequences are useful in representing the structural information of RNA and protein sequences. The longest arc-preserving common subsequence problem has been introduced as a framework for studying the similarity of arc-annotated sequences. In this paper, we consider arc-annotated sequences with various arc structures. We consider the longest arc preserving common subsequence problem. In particular, we show that the decision version of the 1-{\\sc fragment LAPCS(crossing,chain)} and the decision version of the 0-{\\sc diagonal LAPCS(crossing,chain)} are {\\bf NP}-complete for some fixed alphabet $\\Sigma$ such that $|\\Sigma| = 2$. Also we show that if $|\\Sigma| = 1$, then the decision version of the 1-{\\sc fragment LAPCS(unlimited, plain)} and the decision version of the 0-{\\sc diagonal LAPCS(unlimited, plain)} are {\\bf NP}-complete.

  8. ProSAT+: visualizing sequence annotations on 3D structure.

    Science.gov (United States)

    Stank, Antonia; Richter, Stefan; Wade, Rebecca C

    2016-08-01

    PRO: tein S: tructure A: nnotation T: ool-plus (ProSAT(+)) is a new web server for mapping protein sequence annotations onto a protein structure and visualizing them simultaneously with the structure. ProSAT(+) incorporates many of the features of the preceding ProSAT and ProSAT2 tools but also provides new options for the visualization and sharing of protein annotations. Data are extracted from the UniProt KnowledgeBase, the RCSB PDB and the PDBe SIFTS resource, and visualization is performed using JSmol. User-defined sequence annotations can be added directly to the URL, thus enabling visualization and easy data sharing. ProSAT(+) is available at http://prosat.h-its.org. PMID:27284084

  9. Use of Annotations for Component and Framework Interoperability

    Science.gov (United States)

    David, O.; Lloyd, W.; Carlson, J.; Leavesley, G. H.; Geter, F.

    2009-12-01

    The popular programming languages Java and C# provide annotations, a form of meta-data construct. Software frameworks for web integration, web services, database access, and unit testing now take advantage of annotations to reduce the complexity of APIs and the quantity of integration code between the application and framework infrastructure. Adopting annotation features in frameworks has been observed to lead to cleaner and leaner application code. The USDA Object Modeling System (OMS) version 3.0 fully embraces the annotation approach and additionally defines a meta-data standard for components and models. In version 3.0 framework/model integration previously accomplished using API calls is now achieved using descriptive annotations. This enables the framework to provide additional functionality non-invasively such as implicit multithreading, and auto-documenting capabilities while achieving a significant reduction in the size of the model source code. Using a non-invasive methodology leads to models and modeling components with only minimal dependencies on the modeling framework. Since models and modeling components are not directly bound to framework by the use of specific APIs and/or data types they can more easily be reused both within the framework as well as outside of it. To study the effectiveness of an annotation based framework approach with other modeling frameworks, a framework-invasiveness study was conducted to evaluate the effects of framework design on model code quality. A monthly water balance model was implemented across several modeling frameworks and several software metrics were collected. The metrics selected were measures of non-invasive design methods for modeling frameworks from a software engineering perspective. It appears that the use of annotations positively impacts several software quality measures. In a next step, the PRMS model was implemented in OMS 3.0 and is currently being implemented for water supply forecasting in the

  10. SNP annotation-based whole genomic prediction and selection

    DEFF Research Database (Denmark)

    Do, Duy Ngoc; Janss, Luc; Jensen, Just;

    2015-01-01

    into a training (968 pigs) and a validation dataset (304 pigs) by assigning records as before and after January 1, 2012, respectively. SNP were annotated by 14 different classes using Ensembl variant effect prediction. Predictive accuracy and prediction bias were calculated using Bayesian Power LASSO...... SNP to total genomic variance was similar among annotated classes across different traits. Predictive performance of SNP classes did not significantly differ from randomized SNP groups. Genomic prediction has accuracy comparable to observed phenotype, and use of genomic prediction can be cost...... effective by replacing feed intake measurement. Genomic annotation had less impact on predictive accuracy traits considered here but may be different for other traits. It is the first study to provide useful insights into biological classes of SNP driving the whole genomic prediction for complex traits in...

  11. Code Generation for Protocols from CPN models Annotated with Pragmatics

    DEFF Research Database (Denmark)

    Simonsen, Kent Inge; Kristensen, Lars Michael; Kindler, Ekkart

    of the same model and sufficiently detailed to serve as a basis for automated code generation when annotated with code generation pragmatics. Pragmatics are syntactical annotations designed to make the CPN models descriptive and to address the problem that models with enough details for generating code from...... them tend to be verbose and cluttered. Our code generation approach consists of three main steps, starting from a CPN model that the modeller has annotated with a set of pragmatics that make the protocol structure and the control-flow explicit. The first step is to compute for the CPN model, a set...... of derived pragmatics that identify control-flow structures and operations, e. g., for sending and receiving packets, and for manipulating the state. In the second step, an abstract template tree (ATT) is constructed providing an association between pragmatics and code generation templates. The ATT...

  12. Image Annotation by Latent Community Detection and Multikernel Learning.

    Science.gov (United States)

    Gu, Yun; Qian, Xueming; Li, Qing; Wang, Meng; Hong, Richang; Tian, Qi

    2015-11-01

    Automatic image annotation is an attractive service for users and administrators of online photo sharing websites. In this paper, we propose an image annotation approach that exploits latent semantic community of labels and multikernel learning (LCMKL). First, a concept graph is constructed for labels indicating the relationship between the concepts. Based on the concept graph, semantic communities are explored using an automatic community detection method. For an image to be annotated, a multikernel support vector machine is used to determine the image's latent community from its visual features. Then, a candidate label ranking based approach is determined by intracommunity and intercommunity ranking. Experiments on the NUS-WIDE database and IAPR TC-12 data set demonstrate that LCMKL outperforms some state-of-the-art approaches. PMID:26068319

  13. Representative proteomes: a stable, scalable and unbiased proteome set for sequence analysis and functional annotation.

    Directory of Open Access Journals (Sweden)

    Chuming Chen

    Full Text Available The accelerating growth in the number of protein sequences taxes both the computational and manual resources needed to analyze them. One approach to dealing with this problem is to minimize the number of proteins subjected to such analysis in a way that minimizes loss of information. To this end we have developed a set of Representative Proteomes (RPs, each selected from a Representative Proteome Group (RPG containing similar proteomes calculated based on co-membership in UniRef50 clusters. A Representative Proteome is the proteome that can best represent all the proteomes in its group in terms of the majority of the sequence space and information. RPs at 75%, 55%, 35% and 15% co-membership threshold (CMT are provided to allow users to decrease or increase the granularity of the sequence space based on their requirements. We find that a CMT of 55% (RP55 most closely follows standard taxonomic classifications. Further analysis of this set reveals that sequence space is reduced by more than 80% relative to UniProtKB, while retaining both sequence diversity (over 95% of InterPro domains and annotation information (93% of experimentally characterized proteins. All sets can be browsed and are available for sequence similarity searches and download at http://www.proteininformationresource.org/rps, while the set of 637 RPs determined using a 55% CMT are also available for text searches. Potential applications include sequence similarity searches, protein classification and targeted protein annotation and characterization.

  14. Representative proteomes: a stable, scalable and unbiased proteome set for sequence analysis and functional annotation.

    Science.gov (United States)

    Chen, Chuming; Natale, Darren A; Finn, Robert D; Huang, Hongzhan; Zhang, Jian; Wu, Cathy H; Mazumder, Raja

    2011-01-01

    The accelerating growth in the number of protein sequences taxes both the computational and manual resources needed to analyze them. One approach to dealing with this problem is to minimize the number of proteins subjected to such analysis in a way that minimizes loss of information. To this end we have developed a set of Representative Proteomes (RPs), each selected from a Representative Proteome Group (RPG) containing similar proteomes calculated based on co-membership in UniRef50 clusters. A Representative Proteome is the proteome that can best represent all the proteomes in its group in terms of the majority of the sequence space and information. RPs at 75%, 55%, 35% and 15% co-membership threshold (CMT) are provided to allow users to decrease or increase the granularity of the sequence space based on their requirements. We find that a CMT of 55% (RP55) most closely follows standard taxonomic classifications. Further analysis of this set reveals that sequence space is reduced by more than 80% relative to UniProtKB, while retaining both sequence diversity (over 95% of InterPro domains) and annotation information (93% of experimentally characterized proteins). All sets can be browsed and are available for sequence similarity searches and download at http://www.proteininformationresource.org/rps, while the set of 637 RPs determined using a 55% CMT are also available for text searches. Potential applications include sequence similarity searches, protein classification and targeted protein annotation and characterization. PMID:21556138

  15. Aldo-keto reductase (AKR) superfamily: genomics and annotation.

    Science.gov (United States)

    Mindnich, Rebekka D; Penning, Trevor M

    2009-07-01

    Aldo-keto reductases (AKRs) are phase I metabolising enzymes that catalyse the reduced nicotinamide adenine dinucleotide (phosphate) (NAD(P)H)-dependent reduction of carbonyl groups to yield primary and secondary alcohols on a wide range of substrates, including aliphatic and aromatic aldehydes and ketones, ketoprostaglandins, ketosteroids and xenobiotics. In so doing they functionalise the carbonyl group for conjugation (phase II enzyme reactions). Although functionally diverse, AKRs form a protein superfamily based on their high sequence identity and common protein fold, the (alpha/beta) 8 -barrel structure. Well over 150 AKR enzymes, from diverse organisms, have been annotated so far and given systematic names according to a nomenclature that is based on multiple protein sequence alignment and degree of identity. Annotation of non-vertebrate AKRs at the National Center for Biotechnology Information or Vertebrate Genome Annotation (vega) database does not often include the systematic nomenclature name, so the most comprehensive overview of all annotated AKRs is found on the AKR website (http://www.med.upenn.edu/akr/). This site also hosts links to more detailed and specialised information (eg on crystal structures, gene expression and single nucleotide polymorphisms [SNPs]). The protein-based AKR nomenclature allows unambiguous identification of a given enzyme but does not reflect the wealth of genomic and transcriptomic variation that exists in the various databases. In this context, identification of putative new AKRs and their distinction from pseudogenes are challenging. This review provides a short summary of the characteristic features of AKR biochemistry and structure that have been reviewed in great detail elsewhere, and focuses mainly on nomenclature and database entries of human AKRs that so far have not been subject to systematic annotation. Recent developments in the annotation of SNP and transcript variance in AKRs are also summarised. PMID:19706366

  16. Aldo-keto reductase (AKR superfamily: Genomics and annotation

    Directory of Open Access Journals (Sweden)

    Mindnich Rebekka D

    2009-07-01

    Full Text Available Abstract Aldo-keto reductases (AKRs are phase I metabolising enzymes that catalyse the reduced nicotinamide adenine dinucleotide (phosphate (NAD(PH-dependent reduction of carbonyl groups to yield primary and secondary alcohols on a wide range of substrates, including aliphatic and aromatic aldehydes and ketones, ketoprostaglan-dins, ketosteroids and xenobiotics. In so doing they functionalise the carbonyl group for conjugation (phase II enzyme reactions. Although functionally diverse, AKRs form a protein superfamily based on their high sequence identity and common protein fold, the (α/(β8-barrel structure. Well over 150 AKR enzymes, from diverse organisms, have been annotated so far and given systematic names according to a nomenclature that is based on multiple protein sequence alignment and degree of identity. Annotation of non-vertebrate AKRs at the National Center for Biotechnology Information or Vertebrate Genome Annotation (vega database does not often include the systematic nomenclature name, so the most comprehensive overview of all annotated AKRs is found on the AKR website (http://www.med.upenn.edu/akr/. This site also hosts links to more detailed and specialised information (eg on crystal structures, gene expression and single nucleotide polymorphisms [SNPs]. The protein-based AKR nomenclature allows unambiguous identification of a given enzyme but does not reflect the wealth of genomic and transcriptomic variation that exists in the various databases. In this context, identification of putative new AKRs and their distinction from pseudogenes are challenging. This review provides a short summary of the characteristic features of AKR biochemistry and structure that have been reviewed in great detail elsewhere, and focuses mainly on nomenclature and database entries of human AKRs that so far have not been subject to systematic annotation. Recent developments in the annotation of SNP and transcript variance in AKRs are also summarised.

  17. 3D annotation and manipulation of medical anatomical structures

    Science.gov (United States)

    Vitanovski, Dime; Schaller, Christian; Hahn, Dieter; Daum, Volker; Hornegger, Joachim

    2009-02-01

    Although the medical scanners are rapidly moving towards a three-dimensional paradigm, the manipulation and annotation/labeling of the acquired data is still performed in a standard 2D environment. Editing and annotation of three-dimensional medical structures is currently a complex task and rather time-consuming, as it is carried out in 2D projections of the original object. A major problem in 2D annotation is the depth ambiguity, which requires 3D landmarks to be identified and localized in at least two of the cutting planes. Operating directly in a three-dimensional space enables the implicit consideration of the full 3D local context, which significantly increases accuracy and speed. A three-dimensional environment is as well more natural optimizing the user's comfort and acceptance. The 3D annotation environment requires the three-dimensional manipulation device and display. By means of two novel and advanced technologies, Wii Nintendo Controller and Philips 3D WoWvx display, we define an appropriate 3D annotation tool and a suitable 3D visualization monitor. We define non-coplanar setting of four Infrared LEDs with a known and exact position, which are tracked by the Wii and from which we compute the pose of the device by applying a standard pose estimation algorithm. The novel 3D renderer developed by Philips uses either the Z-value of a 3D volume, or it computes the depth information out of a 2D image, to provide a real 3D experience without having some special glasses. Within this paper we present a new framework for manipulation and annotation of medical landmarks directly in three-dimensional volume.

  18. Hypertext Annotation: Effects of Presentation Formats and Learner Proficiency on Reading Comprehension and Vocabulary Learning in Foreign Languages

    Science.gov (United States)

    Chen, I-Jung; Yen, Jung-Chuan

    2013-01-01

    This study extends current knowledge by exploring the effect of different annotation formats, namely in-text annotation, glossary annotation, and pop-up annotation, on hypertext reading comprehension in a foreign language and vocabulary acquisition across student proficiencies. User attitudes toward the annotation presentation were also…

  19. Web Image Retrieval Search Engine based on Semantically Shared Annotation

    Directory of Open Access Journals (Sweden)

    Alaa Riad

    2012-03-01

    Full Text Available This paper presents a new majority voting technique that combines the two basic modalities of Web images textual and visual features of image in a re-annotation and search based framework. The proposed framework considers each web page as a voter to vote the relatedness of keyword to the web image, the proposed approach is not only pure combination between image low level feature and textual feature but it take into consideration the semantic meaning of each keyword that expected to enhance the retrieval accuracy. The proposed approach is not used only to enhance the retrieval accuracy of web images; but also able to annotated the unlabeled images.

  20. ONEMercury: Towards Automatic Annotation of Earth Science Metadata

    Science.gov (United States)

    Tuarob, S.; Pouchard, L. C.; Noy, N.; Horsburgh, J. S.; Palanisamy, G.

    2012-12-01

    Earth sciences have become more data-intensive, requiring access to heterogeneous data collected from multiple places, times, and thematic scales. For example, research on climate change may involve exploring and analyzing observational data such as the migration of animals and temperature shifts across the earth, as well as various model-observation inter-comparison studies. Recently, DataONE, a federated data network built to facilitate access to and preservation of environmental and ecological data, has come to exist. ONEMercury has recently been implemented as part of the DataONE project to serve as a portal for discovering and accessing environmental and observational data across the globe. ONEMercury harvests metadata from the data hosted by multiple data repositories and makes it searchable via a common search interface built upon cutting edge search engine technology, allowing users to interact with the system, intelligently filter the search results on the fly, and fetch the data from distributed data sources. Linking data from heterogeneous sources always has a cost. A problem that ONEMercury faces is the different levels of annotation in the harvested metadata records. Poorly annotated records tend to be missed during the search process as they lack meaningful keywords. Furthermore, such records would not be compatible with the advanced search functionality offered by ONEMercury as the interface requires a metadata record be semantically annotated. The explosion of the number of metadata records harvested from an increasing number of data repositories makes it impossible to annotate the harvested records manually, urging the need for a tool capable of automatically annotating poorly curated metadata records. In this paper, we propose a topic-model (TM) based approach for automatic metadata annotation. Our approach mines topics in the set of well annotated records and suggests keywords for poorly annotated records based on topic similarity. We utilize the

  1. Annotation-Based Whole Genomic Prediction and Selection

    DEFF Research Database (Denmark)

    Kadarmideen, Haja; Do, Duy Ngoc; Janss, Luc;

    using the BayesCπ method and applied to 1,272 Duroc pigs with both genotypic and phenotypic records including residual (RFI) and daily feed intake (DFI), average daily gain (ADG) and back fat (BF)). Records were split into a training (968 pigs) and a validation dataset (304 pigs). SNPs were annotated by...... 14 different classes. Predictive accuracy was 0.531, 0.532, 0.302, and 0.344 for DFI, RFI, ADG and BF, respectively. The contribution per SNP to total genomic variance was similar among annotated classes across different traits. Predictive performance of SNP classes did not significantly differ from...

  2. Context, Dependency and Annotation Analysis in Java EE

    OpenAIRE

    Božidar, Darko

    2012-01-01

    The goal of this bachelor’s thesis is to analyze two of Java EE’s features, CDI and annotations, and to use the acquired knowledge to build a simple web application based on CDI and developed annotations. For this purpose it was necessary to clarify what CDI does and what it offers. Previously mentioned features were therefore firstly thoroughly examined to find out what improvements to the Java EE platform, if any, they provide. The main purpose of this thesis is to explore and analyse how t...

  3. Biocuration of functional annotation at the European nucleotide archive.

    Science.gov (United States)

    Gibson, Richard; Alako, Blaise; Amid, Clara; Cerdeño-Tárraga, Ana; Cleland, Iain; Goodgame, Neil; Ten Hoopen, Petra; Jayathilaka, Suran; Kay, Simon; Leinonen, Rasko; Liu, Xin; Pallreddy, Swapna; Pakseresht, Nima; Rajan, Jeena; Rosselló, Marc; Silvester, Nicole; Smirnov, Dmitriy; Toribio, Ana Luisa; Vaughan, Daniel; Zalunin, Vadim; Cochrane, Guy

    2016-01-01

    The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is a repository for the submission, maintenance and presentation of nucleotide sequence data and related sample and experimental information. In this article we report on ENA in 2015 regarding general activity, notable published data sets and major achievements. This is followed by a focus on sustainable biocuration of functional annotation, an area which has particularly felt the pressure of sequencing growth. The importance of functional annotation, how it can be submitted and the shifting role of the biocurator in the context of increasing volumes of data are all discussed. PMID:26615190

  4. Comparative genomic analysis of the family Iridoviridae: re-annotating and defining the core set of iridovirus genes

    Directory of Open Access Journals (Sweden)

    Upton Chris

    2007-01-01

    Full Text Available Abstract Background Members of the family Iridoviridae can cause severe diseases resulting in significant economic and environmental losses. Very little is known about how iridoviruses cause disease in their host. In the present study, we describe the re-analysis of the Iridoviridae family of complex DNA viruses using a variety of comparative genomic tools to yield a greater consensus among the annotated sequences of its members. Results A series of genomic sequence comparisons were made among, and between the Ranavirus and Megalocytivirus genera in order to identify novel conserved ORFs. Of these two genera, the Megalocytivirus genomes required the greatest number of altered annotations. Prior to our re-analysis, the Megalocytivirus species orange-spotted grouper iridovirus and rock bream iridovirus shared 99% sequence identity, but only 82 out of 118 potential ORFs were annotated; in contrast, we predict that these species share an identical complement of genes. These annotation changes allowed the redefinition of the group of core genes shared by all iridoviruses. Seven new core genes were identified, bringing the total number to 26. Conclusion Our re-analysis of genomes within the Iridoviridae family provides a unifying framework to understand the biology of these viruses. Further re-defining the core set of iridovirus genes will continue to lead us to a better understanding of the phylogenetic relationships between individual iridoviruses as well as giving us a much deeper understanding of iridovirus replication. In addition, this analysis will provide a better framework for characterizing and annotating currently unclassified iridoviruses.

  5. EuCAP, a Eukaryotic Community Annotation Package, and its application to the rice genome

    Directory of Open Access Journals (Sweden)

    Hamilton John P

    2007-10-01

    Full Text Available Abstract Background Despite the improvements of tools for automated annotation of genome sequences, manual curation at the structural and functional level can provide an increased level of refinement to genome annotation. The Institute for Genomic Research Rice Genome Annotation (hereafter named the Osa1 Genome Annotation is the product of an automated pipeline and, for this reason, will benefit from the input of biologists with expertise in rice and/or particular gene families. Leveraging knowledge from a dispersed community of scientists is a demonstrated way of improving a genome annotation. This requires tools that facilitate 1 the submission of gene annotation to an annotation project, 2 the review of the submitted models by project annotators, and 3 the incorporation of the submitted models in the ongoing annotation effort. Results We have developed the Eukaryotic Community Annotation Package (EuCAP, an annotation tool, and have applied it to the rice genome. The primary level of curation by community annotators (CA has been the annotation of gene families. Annotation can be submitted by email or through the EuCAP Web Tool. The CA models are aligned to the rice pseudomolecules and the coordinates of these alignments, along with functional annotation, are stored in the MySQL EuCAP Gene Model database. Web pages displaying the alignments of the CA models to the Osa1 Genome models are automatically generated from the EuCAP Gene Model database. The alignments are reviewed by the project annotators (PAs in the context of experimental evidence. Upon approval by the PAs, the CA models, along with the corresponding functional annotations, are integrated into the Osa1 Genome Annotation. The CA annotations, grouped by family, are displayed on the Community Annotation pages of the project website http://rice.tigr.org, as well as in the Community Annotation track of the Genome Browser. Conclusion We have applied EuCAP to rice. As of July 2007, the

  6. Child Drama: A Selected and Annotated Bibliography, 1974-1979.

    Science.gov (United States)

    Kennedy, Carol Jean

    The more than 200 entries in this annotated bibliography deal with drama, creative dramatics, and children's theatre. The entries for articles, papers, and books are arranged according to the following categories: (1) bibliographies and references in child drama; (2) theory, research, and methods in preschool dramatic play; (3) theory and research…

  7. Believe It or Not: Adding Belief Annotations to Databases

    CERN Document Server

    Gatterbauer, Wolfgang; Khoussainova, Nodira; Suciu, Dan

    2009-01-01

    We propose a database model that allows users to annotate data with belief statements. Our motivation comes from scientific database applications where a community of users is working together to assemble, revise, and curate a shared data repository. As the community accumulates knowledge and the database content evolves over time, it may contain conflicting information and members can disagree on the information it should store. For example, Alice may believe that a tuple should be in the database, whereas Bob disagrees. He may also insert the reason why he thinks Alice believes the tuple should be in the database, and explain what he thinks the correct tuple should be instead. We propose a formal model for Belief Databases that interprets users' annotations as belief statements. These annotations can refer both to the base data and to other annotations. We give a formal semantics based on a fragment of multi-agent epistemic logic and define a query language over belief databases. We then prove a key technic...

  8. Nutrition and Mental Retardation. An Annotated Bibliography, 1964-1970.

    Science.gov (United States)

    Springer, Ninfa Saturnino

    This annotated bibliography is primarily organized for nutritionists. It presents selected articles published from 1964 to the present. All aspects of nutrition in mental retardation are covered excepting inborn errors of metabolism. Sections are included on: (1) nutrition, birthweight, and mental retardation; (2) nutrition, growth, and mental…

  9. Functional annotation of the human retinal pigment epithelium transcriptome

    NARCIS (Netherlands)

    J.C. Booij (Judith); S. van Soest (Simone); S.M.A. Swagemakers (Sigrid); A.H.W. Essing (Anke); J.H.M. Verkerk (Annemieke); P.J. van der Spek (Peter); T.G.M.F. Gorgels (Theo); A.A.B. Bergen (Arthur)

    2009-01-01

    textabstractBackground: To determine level, variability and functional annotation of gene expression of the human retinal pigment epithelium (RPE), the key tissue involved in retinal diseases like age-related macular degeneration and retinitis pigmentosa. Macular RPE cells from six selected healthy

  10. The Dartmouth/Rassias Method: An Annotated Bibliography.

    Science.gov (United States)

    Horner, Jeanne; Stansfield, Charles

    The Dartmouth/Rassias method of foreign language instruction, which is used in many American colleges and universities, has inspired much comment in the media. This annotated bibliography describes 17 books, articles, and monographs, as well as a film, which focus on the method. (JB)

  11. Asian American Literature of Hawaii: An Annotated Bibliography.

    Science.gov (United States)

    Hiura, Arnold T.; Sumida, Stephen H.

    This annotated bibllography focuses on the drama, prose fiction, and poetry of people of Chinese, Japanese, Korean, and Filipino descent in Hawaii. All works cited were written in English, between the 1920s and 1970, with the exception of poems translated into English by their authors. The bibliography begins with an overview of the cultural and…

  12. The DNA sequence, annotation and analysis of human chromosome 3

    DEFF Research Database (Denmark)

    Muzny, Donna M; Scherer, Steven E; Kaul, Rajinder;

    2006-01-01

    After the completion of a draft human genome sequence, the International Human Genome Sequencing Consortium has proceeded to finish and annotate each of the 24 chromosomes comprising the human genome. Here we describe the sequencing and analysis of human chromosome 3, one of the largest human chr...

  13. An Annotated Bibliography of Gay and Lesbian Communication Studies.

    Science.gov (United States)

    Park, Jan Carl

    1979-01-01

    The 22 entries in this annotated bibliography represent articles that have lesbian women or gay men as subjects and that deal with a specific verbal or nonverbal communication factor. Topics covered in the entries include patterns of self-disclosure in homosexual and heterosexual college students, interpersonal conflict in homosexual relations,…

  14. Pertinent Discussions Toward Modeling the Social Edition: Annotated Bibliographies

    NARCIS (Netherlands)

    R. Siemens; M. Timney; C. Leitch; C. Koolen; A. Garnett

    2012-01-01

    The two annotated bibliographies present in this publication document and feature pertinent discussions toward the activity of modeling the social edition, first exploring reading devices, tools and social media issues and, second, social networking tools for professional readers in the Humanities.

  15. Genome annotations - KOME | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us ....zip File URL: ftp://ftp.biosciencedbc.jp/archive/kome/LATEST/kome_genome_annotat...e Update History of This Database Site Policy | Contact Us Genome annotations - KOME | LSDB Archive ...

  16. The Performance Career of Charles Dickens: An Annotated Bibliography.

    Science.gov (United States)

    Gentile, John Samuel

    Offered in response to the broad appeal of Charles Dickens's performance career to various disciplines, this annotated bibliography lists 40 resources concerned with Dickens's success as a performer interpreting his literary works. The resources are categorized under books, theses and dissertations, articles in scholarly journals, nineteenth…

  17. Effects of Teaching Strategies in Annotated Bibliography Writing

    Science.gov (United States)

    Tan-de Ramos, Jennifer

    2015-01-01

    The study examines the effect of teaching strategies to improved writing of students in the tertiary level. Specifically, three teaching approaches--the use of modelling, grammar-based, and information element-focused--were tested on their effect on the writing of annotated bibliography in three research classes at a university in Manila.…

  18. Resources for Teaching about Human Rights: An Annotated List.

    Science.gov (United States)

    Totten, Samuel

    1985-01-01

    The following resources are cited in this annotated bibliography dealing with human rights: general references (background readings for teachers and students); classroom materials; fiction; audiovisuals; periodicals; and organizations and associations dedicated to the investigation of human rights infractions or education and communication on…

  19. Annotated Bibliography on Return Migration to Puerto Rico.

    Science.gov (United States)

    Carrasquillo, Angela; Carrasquillo, Ceferino

    This paper is an annotated bibliography on return migration from the mainland United States to Puerto Rico. An introduction defines the term "return migration" in the specific context of the Puerto Rican community. The introduction is followed by the bibliography, which lists and summarizes research studies and works dealing with demographic data…

  20. Generating Protocol Software from CPN Models Annotated with Pragmatics

    DEFF Research Database (Denmark)

    Simonsen, Kent Inge; Kristensen, Lars M.; Kindler, Ekkart

    2013-01-01

    and verify protocol software, but limited work exists on using CPN models of protocols as a basis for automated code generation. The contribution of this paper is a method for generating protocol software from a class of CPN models annotated with code generation pragmatics. Our code generation method...

  1. Recall Oriented Search on the Web using Semantic Annotations

    NARCIS (Netherlands)

    Kaptein, A.M.; Broek, E.L. van den; Koot, G.; Huis in't Veld, M.A.A.

    2013-01-01

    Web search engines are optimized for early precision, which makes it difficult to perform recall oriented tasks with them. In this article, we propose several ways to leverage semantic annotations and, thereby, increase the efficiency of recall oriented search tasks, with a focus on forensic investi

  2. Social Sciences in the People's Republic of China: Annotated Bibliography.

    Science.gov (United States)

    Parker, Franklin

    This annotated bibliography includes 18 journal articles, books, newspaper stories, and confernce papers focusing on official Chinese policy toward the role of the social sciences. The impact of the Chinese Cultural Revolution and the establishment of the Chinese Academy of Social Sciences in 1977 are the subjects of most of the listed sources.…

  3. Annotation: Neurofeedback--Train Your Brain to Train Behaviour

    Science.gov (United States)

    Heinrich, Hartmut; Gevensleben, Holger; Strehl, Ute

    2007-01-01

    Background: Neurofeedback (NF) is a form of behavioural training aimed at developing skills for self-regulation of brain activity. Within the past decade, several NF studies have been published that tend to overcome the methodological shortcomings of earlier studies. This annotation describes the methodical basis of NF and reviews the evidence…

  4. Optimizing high performance computing workflow for protein functional annotation.

    Science.gov (United States)

    Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene

    2014-09-10

    Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data.

  5. The Challenges of Blended Learning Using a Media Annotation Tool

    Science.gov (United States)

    Douglas, Kathy A.; Lang, Josephine; Colasante, Meg

    2014-01-01

    Blended learning has been evolving as an important approach to learning and teaching in tertiary education. This approach incorporates learning in both online and face-to-face modes and promotes deep learning by incorporating the best of both approaches. An innovation in blended learning is the use of an online media annotation tool (MAT) in…

  6. Annotated Bibliography on Ethology in Education. Ecological Theory of Teaching.

    Science.gov (United States)

    Miller, Kevin; Frey, Susan

    This annotated bibliography focuses on the ethological study of child development and the educational process. Topics covered include: (1) evolution; (2) dominance hierarchies and social organization; (3) agonistic, affiliative, and epistemic behaviors; (4) nonverbal communication; (5) play; (6) biological constraints on learning; and (7) relevant…

  7. An annotated bibliography of parasitic Isopoda (Crustacea of Chondrichthyes

    Directory of Open Access Journals (Sweden)

    Plínio Soares Moreira

    1978-01-01

    Full Text Available This annotated bibliography is an attempt to bring together all available published records on the parasitic isopods of Chondrichthian fishes as a basic reference source. An effort was made to synonymise old names according to the presently accepted scientific names.

  8. An Annotated Bibliography on the Severely and Profoundly Mentally Retarded.

    Science.gov (United States)

    Cass, Michael, Comp.; Schilit, Jeffrey, Comp.

    Presented is an annotated bibliography with approximately 250 entries relating to the severely and profoundly retarded. Citations are listed alphabetically by author under the following categories: assessments, measurements, evaluations; associations; attending behavior; behavior modification; books; classical conditioning; cognitive development;…

  9. Annotating Evidence Based Clinical Guidelines: A Lightweight Ontology

    NARCIS (Netherlands)

    R. Hoekstra; A. de Waard; R. Vdovjak

    2012-01-01

    This paper describes a lightweight ontology for representing annotations of declarative evidence based clinical guidelines. We present the motivation and requirements for this representation, based on an analysis of several guidelines. The ontology provides the means to connect clinical questions an

  10. Reliability and effectiveness of clickthrough data for automatic image annotation

    NARCIS (Netherlands)

    Tsikrika, T.; Diou, C.; Vries, A.P. de; Delopoulos, A.

    2010-01-01

    Automatic image annotation using supervised learning is performed by concept classifiers trained on labelled example images. This work proposes the use of clickthrough data collected from search logs as a source for the automatic generation of concept training data, thus avoiding the expensive manua

  11. Humanitarian Curriculum and Psychosocial Interventions: An Annotated Bibliography

    Science.gov (United States)

    Retamal, Gonzalo; Low, Maria

    2010-01-01

    This paper proposes an analytical description of the impact of violence and natural disasters on schoolchildren. It attempts to explore the present state of the art in psychosocial aspects of education and the curriculum in humanitarian settings. This is carried out through a compilation and a brief annotated bibliography of existing literature…

  12. Annotation in School English: A Social Semiotic Historical Account

    Science.gov (United States)

    Jewitt, Carey; Bezemer, Jeff; Kress, Gunther

    2011-01-01

    What exactly has changed in the production of secondary school English over the last decade? To provide one part of an answer to that question, this paper takes the practice of annotation--a defining activity of the subject English in the UK seldom researched--and uses it as a device for uncovering aspects of changes in the subject. The…

  13. Functional annotation of the human retinal pigment epithelium transcriptome

    NARCIS (Netherlands)

    J.C. Booij; S. van Soest; S.M.A. Swagemakers; A.H.W. Essing; A.J.M.H. Verkerk; P.J. van der Spek; T.G.M.F. Gorgels; A.A.B. Bergen

    2009-01-01

    ABSTRACT: BACKGROUND: To determine level, variability and functional annotation of gene expression of the human retinal pigment epithelium (RPE), the key tissue involved in retinal diseases like age-related macular degeneration and retinitis pigmentosa. Macular RPE cells from six selected healthy hu

  14. Intra-species sequence comparisons for annotating genomes

    Energy Technology Data Exchange (ETDEWEB)

    Boffelli, Dario; Weer, Claire V.; Weng, Li; Lewis, Keith D.; Shoukry, Malak I.; Pachter, Lior; Keys, David N.; Rubin, Edward M.

    2004-07-15

    Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its high rate of allelic polymorphism and ease of genetic manipulability, we chose the sea squirt, Ciona intestinalis, to explore intra-species sequence comparisons for genome annotation. A large number of C. intestinalis specimens were collected from four continents and a set of genomic intervals amplified, resequenced and analyzed to determine the mutation rates at each nucleotide in the sequence. We found that regions with low mutation rates efficiently demarcated functionally constrained sequences: these include a set of noncoding elements, which we showed in C intestinalis transgenic assays to act as tissue-specific enhancers, as well as the location of coding sequences. This illustrates that comparisons of multiple members of a species can be used for genome annotation, suggesting a path for the annotation of the sequenced genomes of organisms occupying uncharacterized phylogenetic branches of the animal kingdom and raises the possibility that the resequencing of a large number of Homo sapiens individuals might be used to annotate the human genome and identify sequences defining traits unique to our species. The sequence data from this study has been submitted to GenBank under accession nos. AY667278-AY667407.

  15. Annotated Bibliography on the Teaching of Psychology: 1999.

    Science.gov (United States)

    Johnson, David E.; Schroder, Simone I.

    2000-01-01

    Presents an annotated bibliography covering awards, computers and technology, critical thinking, developmental psychology and aging, ethics, graduate education and training issues, high school psychology, history, introductory psychology, learning and cognition, perception/physiological/comparative psychology, research methods and research-related…

  16. An Annotated Guide to Periodical Literature: Higher Education.

    Science.gov (United States)

    Diener, Thomas J., Ed.; Trower, David L., Ed.

    Ninety-six periodicals selected for their pertinence to an understanding of higher education are listed in alphabetical order in this annotated guide. These publications either focus on higher education in the United States and other countries, or frequently publish information about colleges and universities. The bibliography is designed as a…

  17. Postsecondary Peer Cooperative Learning Programs: Annotated Bibliography 2016

    Science.gov (United States)

    Arendale, David R., Comp.

    2016-01-01

    Purpose: This 2016 annotated bibliography reviews seven postsecondary peer cooperative learning programs that have been implemented nationally to increase student achievement. Methodology: An extensive literature search was conducted of published journal articles, newspaper accounts, book chapters, books, ERIC documents, thesis and dissertations,…

  18. Gender, Science, and Technology: A Selected Annotated Bibliography.

    Science.gov (United States)

    Eldredge, Mary; And Others

    1990-01-01

    Presents 196 annotated listings of works on science, technology, and gender, under 9 headings: Biography and History; Women Scientists; Science Education; Feminists Look at Science and Technology; Effects of Technology on Women; Medicine and Reproductive Technologies in Women's Lives; Women and Evolution; Women and Agriculture; and Gender,…

  19. An Annotated Bibliography of Resources for Humanistic and Psychological Education.

    Science.gov (United States)

    Pine, Gerald J.

    1979-01-01

    Compiled for New England Teacher Corps Conference on Interpersonal Relations (March 1978), this annotated bibliography offers suggestions and examples of how humanistic and psychological education theories have been and can be put into practice. Sections include (1) classroom exercises; (2) resource directories; (3) films; and (4) humanistic…

  20. Personalization in crowd-driven annotation for cultural heritage collections

    NARCIS (Netherlands)

    Dijkshoorn, C.; Oosterman, J.; Aroyo, L.; Houben, G.J.

    2012-01-01

    Many cultural heritage institutions are confronted with a big challenge when it comes to adapting the process of registration, annotation and digitization of their collections to meet the new technological demands for providing their collections online with Web and mobile technologies. With limited

  1. Education and Training. Annotated Bibliography. Author and Subject Index.

    Science.gov (United States)

    United Nations Food and Agriculture Organization, Rome (Italy).

    Food and Agriculture Organization (FAO) publications and documents issued by the Human Resources and Institutions division and by other technical divisions in the technical, economic, and social fields are selected, annotated and indexed in this bibliography. Documents issued prior to 1967 are not included but can be found in the Rural…

  2. Chicano and Chicana Literature, 1992-96: An Annotated Bibliography.

    Science.gov (United States)

    Salazar, Carmen; Herrera-Sobek, Maria

    1997-01-01

    Presents an updated annotated bibliography on recent Chicano publications that includes anthologies focusing on Latino works. This bibliography contains literary works and criticism written in English and Spanish. Its limited scope does not allow the inclusion of entries for articles published in magazines and journals or for unpublished doctoral…

  3. Higher Education Finance. An Annotated Bibliography, Report 96-2.

    Science.gov (United States)

    Doyle, William

    This annotated bibliography on higher education finance lists 79 journal articles, books, conference papers, and reports originally published from 1973 through 1995 with most published in the 1990s. Citations include lengthy analytical summaries and critiques. The bibliography is presented in six sections which cover the following topics: (1)…

  4. Automatic image annotation and retrieval using group sparsity.

    Science.gov (United States)

    Zhang, Shaoting; Huang, Junzhou; Li, Hongsheng; Metaxas, Dimitris N

    2012-06-01

    Automatically assigning relevant text keywords to images is an important problem. Many algorithms have been proposed in the past decade and achieved good performance. Efforts have focused upon model representations of keywords, whereas properties of features have not been well investigated. In most cases, a group of features is preselected, yet important feature properties are not well used to select features. In this paper, we introduce a regularization-based feature selection algorithm to leverage both the sparsity and clustering properties of features, and incorporate it into the image annotation task. Using this group-sparsity-based method, the whole group of features [e.g., red green blue (RGB) or hue, saturation, and value (HSV)] is either selected or removed. Thus, we do not need to extract this group of features when new data comes. A novel approach is also proposed to iteratively obtain similar and dissimilar pairs from both the keyword similarity and the relevance feedback. Thus, keyword similarity is modeled in the annotation framework. We also show that our framework can be employed in image retrieval tasks by selecting different image pairs. Extensive experiments are designed to compare the performance between features, feature combinations, and regularization-based feature selection methods applied on the image annotation task, which gives insight into the properties of features in the image annotation task. The experimental results demonstrate that the group-sparsity-based method is more accurate and stable than others. PMID:22249744

  5. An Annotated Bibliography of the Gestalt Methods, Techniques, and Therapy

    Science.gov (United States)

    Prewitt-Diaz, Joseph O.

    The purpose of this annotated bibliography is to provide the reader with a guide to relevant research in the area of Gestalt therapy, techniques, and methods. The majority of the references are journal articles written within the last 5 years or documents easily obtained through interlibrary loans from local libraries. These references were…

  6. An Annotated Bibliography of Recent Prison Library Literature.

    Science.gov (United States)

    Akey, Sharon Ann

    This annotated bibliography on prison library literature is the result of research for a master's degree. The only prior source for this work was "Prison Libraries--Bibliography" by David Gillespie, and from that and journal sources the bibliography was compiled. The works cited are divided into three categories: California prison libraries (15…

  7. Classic Religious Books for Children: An Annotated Bibliography.

    Science.gov (United States)

    Campbell, Carol, Comp.

    This annotated bibliography of religious books for children contains approximately 450 books, one-fifth of which are Judaic. The books' current availability has been verified using Web sites such as those of individual publishers, the Library of Congress, Amazon.com, or Barnes&Noble.com. New subject headings have been added, such as Kwanza,…

  8. Effects of E-Textbook Instructor Annotations on Learner Performance

    Science.gov (United States)

    Dennis, Alan R.; Abaci, Serdar; Morrone, Anastasia S.; Plaskoff, Joshua; McNamara, Kelly O.

    2016-01-01

    With additional features and increasing cost advantages, e-textbooks are becoming a viable alternative to paper textbooks. One important feature offered by enhanced e-textbooks (e-textbooks with interactive functionality) is the ability for instructors to annotate passages with additional insights. This paper describes a pilot study that examines…

  9. Analysis of LYSA-calculus with explicit confidentiality annotations

    DEFF Research Database (Denmark)

    Gao, Han; Nielson, Hanne Riis

    2006-01-01

    Recently there has been an increased research interest in applying process calculi in the verification of cryptographic protocols due to their ability to formally model protocols. This work presents LYSA with explicit confidentiality annotations for indicating the expected behavior of target prot...

  10. Systematic Functional Annotation and Visualization of Biological Networks.

    Science.gov (United States)

    Baryshnikova, Anastasia

    2016-06-22

    Large-scale biological networks represent relationships between genes, but our understanding of how networks are functionally organized is limited. Here, I describe spatial analysis of functional enrichment (SAFE), a systematic method for annotating biological networks and examining their functional organization. SAFE visualizes the network in 2D space and measures the continuous distribution of functional enrichment across local neighborhoods, producing a list of the associated functions and a map of their relative positioning. I applied SAFE to annotate the Saccharomyces cerevisiae genetic interaction similarity network and protein-protein interaction network with gene ontology terms. SAFE annotations of the genetic network matched manually derived annotations, while taking less than 1% of the time, and proved robust to noise and sensitive to biological signal. Integration of genetic interaction and chemical genomics data using SAFE revealed a link between vesicle-mediate transport and resistance to the anti-cancer drug bortezomib. These results demonstrate the utility of SAFE for examining biological networks and understanding their functional organization. PMID:27237738

  11. Annotation of Articles from Scientific American and Student Understanding

    Science.gov (United States)

    Knapp, John, II

    1976-01-01

    Reports on a study in which high school biology students were divided into two groups: one read "Scientific American" articles and the other group read annotated "Scientific American" articles. Although there was no significant difference between means on an achievement measure of the groups, the author reports that students preferred the…

  12. A Selected, Annotated Bibliography on Employment of Minority Engineers.

    Science.gov (United States)

    National Academy of Sciences - National Research Council, Washington, DC. Assembly of Engineering.

    The annotated bibliography is intended to inform those concerned with personnel guidance, recruiting, and hiring in industry, research, education, and government about the available publications relating to the employment of Black, Mexican American, Puerto Rican, and American Indian engineers. To facilitate its usefulness, the bibliography is…

  13. ACOUSTICS IN ARCHITECTURAL DESIGN, AN ANNOTATED BIBLIOGRAPHY ON ARCHITECTURAL ACOUSTICS.

    Science.gov (United States)

    DOELLE, LESLIE L.

    THE PURPOSE OF THIS ANNOTATED BIBLIOGRAPHY ON ARCHITECTURAL ACOUSTICS WAS--(1) TO COMPILE A CLASSIFIED BIBLIOGRAPHY, INCLUDING MOST OF THOSE PUBLICATIONS ON ARCHITECTURAL ACOUSTICS, PUBLISHED IN ENGLISH, FRENCH, AND GERMAN WHICH CAN SUPPLY A USEFUL AND UP-TO-DATE SOURCE OF INFORMATION FOR THOSE ENCOUNTERING ANY ARCHITECTURAL-ACOUSTIC DESIGN…

  14. The Olympic Games and World Politics: A Select Annotated Bibliography.

    Science.gov (United States)

    Meyer, Evelyn S.

    1984-01-01

    This 62-item annotated bibliography lists books and journal articles published over last decade (historical guides and surveys, memoirs, speeches, essays, biographies, government documents, critical analyses) on the history of politics in modern Olympic games and use of games in politics. A brief history of the games is included with 24…

  15. An annotated corpus for the analysis of VP ellipsis

    NARCIS (Netherlands)

    Bos, Johan; Spenader, J.

    2011-01-01

    Verb Phrase Ellipsis (VPE) has been studied in great depth in theoretical linguistics, but empirical studies of VPE are rare. We extend the few previous corpus studies with an annotated corpus of VPE in all 25 sections of the Wall Street Journal corpus (WSJ) distributed with the Penn Treebank. We an

  16. Annotated Bibliography of Research in the Teaching of English

    Science.gov (United States)

    Beach, Richard; Bigelow, Martha; Dillon, Deborah; Dockter, Jessie; Galda, Lee; Helman, Lori; Kapoor, Richa; Ngo, Bic; O'Brien, David; Sato, Mistilina; Scharber, Cassie; Jorgensen, Karen; Liang, Lauren; Braaksma, Martine; Janssen, Tanja

    2009-01-01

    This article presents an annotated bibliography of research works about digital/technology tools for literacy instruction, discourse/cultural analysis, literacy, literary response/literature/narrative, media-information literacy/media use, professional development/teacher education related to English/language arts, reading, second language…

  17. History of American Communication Education: A Selected, Annotated Basic Bibliography.

    Science.gov (United States)

    Friedrich, Gustav W.

    Noting that only a fraction of the articles in speech journals have been concerned with the history of speech education in the United States, this annotated bibliography provides a broad guide to the materials necessary for understanding that history. The 45 citations are organized in six sections concerned with: (1) historical background; (2)…

  18. Bibliographies and Annotations from Speakers and Class Participants.

    Science.gov (United States)

    California Univ., Berkeley. Office of Resources for International and Area Studies.

    This annotated bibliography focuses on recent children's literature that deals with other cultures. The books in the bibliography are set in or about Africa (29 selections), Ancient Egypt (3 selections), East Asia (20 selections), India (7 selections), Latin America (3 selections), Middle East (5 selections), and Russia (9 selections). The…

  19. An Annotated Bibliography of Games and Simulations in Consumer Education.

    Science.gov (United States)

    Blucker, Gwen

    Thirty-two games and simulations relating to consumer education comprise this annotated bibliography designed to aid the teacher of adult basic education students and others in their search for teaching devices. Topics covered in the various simulations include money management, insurance, credit, credit unions, consumer law, consumer frauds,…

  20. Automatic image annotation and retrieval using group sparsity.

    Science.gov (United States)

    Zhang, Shaoting; Huang, Junzhou; Li, Hongsheng; Metaxas, Dimitris N

    2012-06-01

    Automatically assigning relevant text keywords to images is an important problem. Many algorithms have been proposed in the past decade and achieved good performance. Efforts have focused upon model representations of keywords, whereas properties of features have not been well investigated. In most cases, a group of features is preselected, yet important feature properties are not well used to select features. In this paper, we introduce a regularization-based feature selection algorithm to leverage both the sparsity and clustering properties of features, and incorporate it into the image annotation task. Using this group-sparsity-based method, the whole group of features [e.g., red green blue (RGB) or hue, saturation, and value (HSV)] is either selected or removed. Thus, we do not need to extract this group of features when new data comes. A novel approach is also proposed to iteratively obtain similar and dissimilar pairs from both the keyword similarity and the relevance feedback. Thus, keyword similarity is modeled in the annotation framework. We also show that our framework can be employed in image retrieval tasks by selecting different image pairs. Extensive experiments are designed to compare the performance between features, feature combinations, and regularization-based feature selection methods applied on the image annotation task, which gives insight into the properties of features in the image annotation task. The experimental results demonstrate that the group-sparsity-based method is more accurate and stable than others.

  1. Automatically Annotated Mapping for Indoor Mobile Robot Applications

    DEFF Research Database (Denmark)

    Özkil, Ali Gürcan; Howard, Thomas J.

    2012-01-01

    This paper presents a new and practical method for mapping and annotating indoor environments for mobile robot use. The method makes use of 2D occupancy grid maps for metric representation, and topology maps to indicate the connectivity of the ‘places-of-interests’ in the environment. Novel use...

  2. Genome Annotation in a Community College Cell Biology Lab

    Science.gov (United States)

    Beagley, C. Timothy

    2013-01-01

    The Biology Department at Salt Lake Community College has used the IMG-ACT toolbox to introduce a genome mapping and annotation exercise into the laboratory portion of its Cell Biology course. This project provides students with an authentic inquiry-based learning experience while introducing them to computational biology and contemporary learning…

  3. Semi-automatic Annotation System for OWL-based Semantic Search

    Directory of Open Access Journals (Sweden)

    C.-H. Liu

    2009-11-01

    Full Text Available Current keyword search by Google, Yahoo, and so on gives enormous unsuitable results. A solution to this perhaps is to annotate semantics to textual web data to enable semantic search, rather than keyword search. However, pure manual annotation is very time-consuming. Further, searching high level concept such as metaphor cannot be done if the annotation is done at a low abstraction level. We, thus, present a semi-automatic annotation system, i.e. an automatic annotator and a manual annotator. Against the web ontology language (OWL terms defined by Protégé, the former annotates the textual web data using the Knuth-Morris-Pratt (KMP algorithm, while the latter allows a user to use the terms to annotate metaphors with high abstraction. The resulting semantically-enhanced textual web document can be semantically processed by other web services such as the information retrieval system and the recommendation system shown in our example.

  4. Evaluation of Three Automated Genome Annotations for Halorhabdus utahensis

    DEFF Research Database (Denmark)

    Bakke, Peter; Carney, Nick; DeLoache, Will;

    2009-01-01

    Genome annotations are accumulating rapidly and depend heavily on automated annotation systems. Many genome centers offer annotation systems but no one has compared their output in a systematic way to determine accuracy and inherent errors. Errors in the annotations are routinely deposited in...... databases such as NCBI and used to validate subsequent annotation errors. We submitted the genome sequence of halophilic archaeon Halorhabdus utahensis to be analyzed by three genome annotation services. We have examined the output from each service in a variety of ways in order to compare the methodology...... and effectiveness of the annotations, as well as to explore the genes, pathways, and physiology of the previously unannotated genome. The annotation services differ considerably in gene calls, features, and ease of use. We had to manually identify the origin of replication and the species...

  5. ChIP-Seq-Annotated Heliconius erato Genome Highlights Patterns of cis-Regulatory Evolution in Lepidoptera.

    Science.gov (United States)

    Lewis, James J; van der Burg, Karin R L; Mazo-Vargas, Anyi; Reed, Robert D

    2016-09-13

    Uncovering phylogenetic patterns of cis-regulatory evolution remains a fundamental goal for evolutionary and developmental biology. Here, we characterize the evolution of regulatory loci in butterflies and moths using chromatin immunoprecipitation sequencing (ChIP-seq) annotation of regulatory elements across three stages of head development. In the process we provide a high-quality, functionally annotated genome assembly for the butterfly, Heliconius erato. Comparing cis-regulatory element conservation across six lepidopteran genomes, we find that regulatory sequences evolve at a pace similar to that of protein-coding regions. We also observe that elements active at multiple developmental stages are markedly more conserved than elements with stage-specific activity. Surprisingly, we also find that stage-specific proximal and distal regulatory elements evolve at nearly identical rates. Our study provides a benchmark for genome-wide patterns of regulatory element evolution in insects, and it shows that developmental timing of activity strongly predicts patterns of regulatory sequence evolution.

  6. Semantic Annotation Framework For Intelligent Information Retrieval Using KIM Architecture

    Directory of Open Access Journals (Sweden)

    Sanjay Kumar Malik

    2010-11-01

    Full Text Available Due to the explosion of information/knowledge on the web and wide use of search engines for desiredinformation,the role of knowledge management(KM is becoming more significant in an organization.Knowledge Management in an Organization is used to create ,capture, store, share, retrieve and manageinformation efficiently. The semantic web, an intelligent and meaningful web, tend to provide a promisingplatform for knowledge management systems and vice versa, since they have the potential to give eachother the real substance for machine-understandable web resources which in turn will lead to anintelligent, meaningful and efficient information retrieval on web. Today,the challenge for web communityis to integrate the distributed heterogeneous resources on web with an objective of an intelligent webenvironment focusing on data semantics and user requirements. Semantic Annotation(SA is being widelyused which is about assigning to the entities in the text and links to their semantic descriptions. Varioustools like KIM, Amaya etc may be used for semantic Annotation.In this paper, we introduce semantic annotation as one of the key technology in an intelligent webenvironment , then revisit and review, discuss and explore about Knowledge Management and SemanticAnnotation. A Knowledge Management Framework and a Framework for Semantic Annotation andSemantic Search with Knowledge Base(GATE and Ontology have been presented. Then KIM Annotationplatform architecture including KIM Ontology(KIMO, KIM Knowledge Base and KIM front ends havebeen highlighted. Finally, intelligent pattern search and concerned GATE framework with a KIMAnnotation Example have been illiustrated towards an intelligent information retrieval

  7. SAS- Semantic Annotation Service for Geoscience resources on the web

    Science.gov (United States)

    Elag, M.; Kumar, P.; Marini, L.; Li, R.; Jiang, P.

    2015-12-01

    There is a growing need for increased integration across the data and model resources that are disseminated on the web to advance their reuse across different earth science applications. Meaningful reuse of resources requires semantic metadata to realize the semantic web vision for allowing pragmatic linkage and integration among resources. Semantic metadata associates standard metadata with resources to turn them into semantically-enabled resources on the web. However, the lack of a common standardized metadata framework as well as the uncoordinated use of metadata fields across different geo-information systems, has led to a situation in which standards and related Standard Names abound. To address this need, we have designed SAS to provide a bridge between the core ontologies required to annotate resources and information systems in order to enable queries and analysis over annotation from a single environment (web). SAS is one of the services that are provided by the Geosematnic framework, which is a decentralized semantic framework to support the integration between models and data and allow semantically heterogeneous to interact with minimum human intervention. Here we present the design of SAS and demonstrate its application for annotating data and models. First we describe how predicates and their attributes are extracted from standards and ingested in the knowledge-base of the Geosemantic framework. Then we illustrate the application of SAS in annotating data managed by SEAD and annotating simulation models that have web interface. SAS is a step in a broader approach to raise the quality of geoscience data and models that are published on the web and allow users to better search, access, and use of the existing resources based on standard vocabularies that are encoded and published using semantic technologies.

  8. EuCAP, a Eukaryotic Community Annotation Package, and its application to the rice genome

    OpenAIRE

    Hamilton John P; Campbell Matthew; Thibaud-Nissen Françoise; Zhu Wei; Buell C

    2007-01-01

    Abstract Background Despite the improvements of tools for automated annotation of genome sequences, manual curation at the structural and functional level can provide an increased level of refinement to genome annotation. The Institute for Genomic Research Rice Genome Annotation (hereafter named the Osa1 Genome Annotation) is the product of an automated pipeline and, for this reason, will benefit from the input of biologists with expertise in rice and/or particular gene families. Leveraging k...

  9. Bovine Genome Database: supporting community annotation and analysis of the Bos taurus genome

    Directory of Open Access Journals (Sweden)

    Childs Kevin L

    2010-11-01

    Full Text Available Abstract Background A goal of the Bovine Genome Database (BGD; http://BovineGenome.org has been to support the Bovine Genome Sequencing and Analysis Consortium (BGSAC in the annotation and analysis of the bovine genome. We were faced with several challenges, including the need to maintain consistent quality despite diversity in annotation expertise in the research community, the need to maintain consistent data formats, and the need to minimize the potential duplication of annotation effort. With new sequencing technologies allowing many more eukaryotic genomes to be sequenced, the demand for collaborative annotation is likely to increase. Here we present our approach, challenges and solutions facilitating a large distributed annotation project. Results and Discussion BGD has provided annotation tools that supported 147 members of the BGSAC in contributing 3,871 gene models over a fifteen-week period, and these annotations have been integrated into the bovine Official Gene Set. Our approach has been to provide an annotation system, which includes a BLAST site, multiple genome browsers, an annotation portal, and the Apollo Annotation Editor configured to connect directly to our Chado database. In addition to implementing and integrating components of the annotation system, we have performed computational analyses to create gene evidence tracks and a consensus gene set, which can be viewed on individual gene pages at BGD. Conclusions We have provided annotation tools that alleviate challenges associated with distributed annotation. Our system provides a consistent set of data to all annotators and eliminates the need for annotators to format data. Involving the bovine research community in genome annotation has allowed us to leverage expertise in various areas of bovine biology to provide biological insight into the genome sequence.

  10. Aspekte der bioinformatischen Analyse und Annotation des Genoms von Rhodopirellula baltica

    OpenAIRE

    Teeling, Hanno

    2004-01-01

    This thesis focuses on the bioinformatic analysis and annotation of the genome of the marine planctomycete Rhodopirellula baltica. A comprehensive bioinformatic pipeline was set up and established that comprises gene prediction, annotation and visualization tools. Considerable effort was put into the manual annotation process.The annotation of the genome of Rhodopirellula baltica revealed that this organism is specialized on the aerobic degradation of complex carbohydrates. Its genome harbors...

  11. Creating reference gene annotation for the mouse C57BL6/J genome assembly

    OpenAIRE

    Mudge, Jonathan M; Harrow, Jennifer

    2015-01-01

    Annotation on the reference genome of the C57BL6/J mouse has been an ongoing project ever since the draft genome was first published. Initially, the principle focus was on the identification of all protein-coding genes, although today the importance of describing long non-coding RNAs, small RNAs, and pseudogenes is recognized. Here, we describe the progress of the GENCODE mouse annotation project, which combines manual annotation from the HAVANA group with Ensembl computational annotation, al...

  12. Protein function annotation with Structurally Aligned Local Sites of Activity (SALSAs

    Directory of Open Access Journals (Sweden)

    Wang Zhouxi

    2013-02-01

    Full Text Available Abstract Background The prediction of biochemical function from the 3D structure of a protein has proved to be much more difficult than was originally foreseen. A reliable method to test the likelihood of putative annotations and to predict function from structure would add tremendous value to structural genomics data. We report on a new method, Structurally Aligned Local Sites of Activity (SALSA, for the prediction of biochemical function based on a local structural match at the predicted catalytic or binding site. Results Implementation of the SALSA method is described. For the structural genomics protein PY01515 (PDB ID 2aqw from Plasmodium yoelii, it is shown that the putative annotation, Orotidine 5'-monophosphate decarboxylase (OMPDC, is most likely correct. SALSA analysis of YP_001304206.1 (PDB ID 3h3l, a putative sugar hydrolase from Parabacteroides distasonis, shows that its active site does not bear close resemblance to any previously characterized member of its superfamily, the Concanavalin A-like lectins/glucanases. It is noted that three residues in the active site of the thermophilic beta-1,4-xylanase from Nonomuraea flexuosa (PDB ID 1m4w, Y78, E87, and E176, overlap with POOL-predicted residues of similar type, Y168, D153, and E232, in YP_001304206.1. The substrate recognition regions of the two proteins are rather different, suggesting that YP_001304206.1 is a new functional type within the superfamily. A structural genomics protein from Mycobacterium avium (PDB ID 3q1t has been reported to be an enoyl-CoA hydratase (ECH, but SALSA analysis shows a poor match between the predicted residues for the SG protein and those of known ECHs. A better local structural match is obtained with Anabaena beta-diketone hydrolase (ABDH, a known β-diketone hydrolase from Cyanobacterium anabaena (PDB ID 2j5s. This suggests that the reported ECH function of the SG protein is incorrect and that it is more likely a β-diketone hydrolase. Conclusions

  13. Effects of Annotations and Homework on Learning Achievement: An Empirical Study of Scratch Programming Pedagogy

    Science.gov (United States)

    Su, Addison Y. S.; Huang, Chester S. J.; Yang, Stephen J. H.; Ding, T. J.; Hsieh, Y. Z.

    2015-01-01

    In Taiwan elementary schools, Scratch programming has been taught for more than four years. Previous studies have shown that personal annotations is a useful learning method that improve learning performance. An annotation-based Scratch programming (ASP) system provides for the creation, share, and review of annotations and homework solutions in…

  14. Effects of Reviewing Annotations and Homework Solutions on Math Learning Achievement

    Science.gov (United States)

    Hwang, Wu-Yuin; Chen, Nian-Shing; Shadiev, Rustam; Li, Jin-Sing

    2011-01-01

    Previous studies have demonstrated that making annotations can be a meaningful and useful learning method that promote metacognition and enhance learning achievement. A web-based annotation system, Virtual Pen (VPEN), which provides for the creation and review of annotations and homework solutions, has been developed to foster learning process…

  15. Sequencing, analysis, and annotation of expressed sequence tags for Camelus dromedarius.

    Directory of Open Access Journals (Sweden)

    Abdulaziz M Al-Swailem

    Full Text Available Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and approximately 40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/, hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism.

  16. Rapid identification of sequences for orphan enzymes to power accurate protein annotation.

    Science.gov (United States)

    Ramkissoon, Kevin R; Miller, Jennifer K; Ojha, Sunil; Watson, Douglas S; Bomar, Martha G; Galande, Amit K; Shearer, Alexander G

    2013-01-01

    The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the "back catalog" of enzymology--"orphan enzymes," those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme "back catalog" is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology's "back catalog" another powerful tool to drive accurate genome annotation.

  17. Incorporating evolution of transcription factor binding sites into annotated alignments

    Indian Academy of Sciences (India)

    Abha S Bais; Steffen Grossmann; Martin Vingron

    2007-08-01

    Identifying transcription factor binding sites (TFBSs) is essential to elucidate putative regulatory mechanisms. A common strategy is to combine cross-species conservation with single sequence TFBS annotation to yield ``conserved TFBSs”. Most current methods in this field adopt a multi-step approach that segregates the two aspects. Again, it is widely accepted that the evolutionary dynamics of binding sites differ from those of the surrounding sequence. Hence, it is desirable to have an approach that explicitly takes this factor into account. Although a plethora of approaches have been proposed for the prediction of conserved TFBSs, very few explicitly model TFBS evolutionary properties, while additionally being multi-step. Recently, we introduced a novel approach to simultaneously align and annotate conserved TFBSs in a pair of sequences. Building upon the standard Smith-Waterman algorithm for local alignments, SimAnn introduces additional states for profiles to output extended alignments or annotated alignments. That is, alignments with parts annotated as gaplessly aligned TFBSs (pair-profile hits) are generated. Moreover, the pair-profile related parameters are derived in a sound statistical framework. In this article, we extend this approach to explicitly incorporate evolution of binding sites in the SimAnn framework. We demonstrate the extension in the theoretical derivations through two position-specific evolutionary models, previously used for modelling TFBS evolution. In a simulated setting, we provide a proof of concept that the approach works given the underlying assumptions, as compared to the original work. Finally, using a real dataset of experimentally verified binding sites in human-mouse sequence pairs, we compare the new approach (eSimAnn) to an existing multi-step tool that also considers TFBS evolution. Although it is widely accepted that binding sites evolve differently from the surrounding sequences, most comparative TFBS identification

  18. Code Generation from Pragmatics Annotated Coloured Petri Nets

    DEFF Research Database (Denmark)

    Simonsen, Kent Inge

    using a sub-class of CPNs, called Pragmatics Annotated CPNs (PACPNs). PA-CPNs give structure to the protocol models and allows the models to be annotated with code generation pragmatics. These pragmatics are used by our code generation approach to identify and execute the appropriate code generation...... third party libraries and the code should be easily usable by third party code. Finally, the code should be readable by developers with expertise on the considered platforms. In this thesis, we show that our code generation approach is able to generate code for a wide range of platforms without altering...... such as games and rich web applications. Finally, we conclude the evaluation of the criteria of our approach by using the WebSocket PA-CPN model to show that we are able to verify fairly large protocols....

  19. Using hypermedia annotations to teach vocabulary on the Web

    Directory of Open Access Journals (Sweden)

    Bahman Gorjian

    2011-02-01

    Full Text Available This project measured the effect of using hypermedia annotations on short and long-term vocabulary retention in teaching vocabulary through Web-based language learning activities. A total of 62 university students were randomly assigned into two homogeneous groups; and then both groups were given a pretest. Both groups covered 12 expository passages selected by the researchers from the BBC website. The subjects had to sit for an immediate quiz to measure the short-term effect of the treatment and finally, at the end of the course and a twoweek interval, subjects sat for their post-test. Findings revealed that there was a significant effect of the hypermedia annotations on the retention of vocabulary in the short term ( p<0.05. However, the post-test results indicated that the effect of the treatment in the long term faded away, and the significance of the means was not sufficiently high to reject the null hypothesis.

  20. High-fidelity data embedding for image annotation.

    Science.gov (United States)

    He, Shan; Kirovski, Darko; Wu, Min

    2009-02-01

    High fidelity is a demanding requirement for data hiding, especially for images with artistic or medical value. This correspondence proposes a high-fidelity image watermarking for annotation with robustness to moderate distortion. To achieve the high fidelity of the embedded image, we introduce a visual perception model that aims at quantifying the local tolerance to noise for arbitrary imagery. Based on this model, we embed two kinds of watermarks: a pilot watermark that indicates the existence of the watermark and an information watermark that conveys a payload of several dozen bits. The objective is to embed 32 bits of metadata into a single image in such a way that it is robust to JPEG compression and cropping. We demonstrate the effectiveness of the visual model and the application of the proposed annotation technology using a database of challenging photographic and medical images that contain a large amount of smooth regions.

  1. Genome annotation of a Saccharomyces sp. lager brewer's yeast

    Directory of Open Access Journals (Sweden)

    Patricia Marcela De León-Medina

    2016-09-01

    Full Text Available The genome of lager brewer's yeast is a hybrid, with Saccharomyces eubayanus and Saccharomyces cerevisiae as sub-genomes. Due to their specific use in the beer industry, relatively little information is available. The genome of brewing yeast was sequenced and annotated in this study. We obtained a genome size of 22.7 Mbp that consisted of 133 scaffolds, with 65 scaffolds larger than 10 kbp. With respect to the annotation, 9939 genes were obtained, and when they were submitted to a local alignment, we found that 53.93% of these genes corresponded to S. cerevisiae, while another 42.86% originated from S. eubayanus. Our results confirm that our strain is a hybrid of at least two different genomes.

  2. Genome annotation of a Saccharomyces sp. lager brewer's yeast.

    Science.gov (United States)

    De León-Medina, Patricia Marcela; Elizondo-González, Ramiro; Damas-Buenrostro, Luis Cástulo; Geertman, Jan-Maarten; Van den Broek, Marcel; Galán-Wong, Luis Jesús; Ortiz-López, Rocío; Pereyra-Alférez, Benito

    2016-09-01

    The genome of lager brewer's yeast is a hybrid, with Saccharomyces eubayanus and Saccharomyces cerevisiae as sub-genomes. Due to their specific use in the beer industry, relatively little information is available. The genome of brewing yeast was sequenced and annotated in this study. We obtained a genome size of 22.7 Mbp that consisted of 133 scaffolds, with 65 scaffolds larger than 10 kbp. With respect to the annotation, 9939 genes were obtained, and when they were submitted to a local alignment, we found that 53.93% of these genes corresponded to S. cerevisiae, while another 42.86% originated from S. eubayanus. Our results confirm that our strain is a hybrid of at least two different genomes. PMID:27330999

  3. Automation and Validation of Annotation for Hindi Anaphora Resolution

    Directory of Open Access Journals (Sweden)

    Pardeep Singh

    2015-10-01

    Full Text Available The process of labelling any language genre by which one can extract useful information is called annotation. This provides syntactic information about a word or a word phrase. In this paper, an effort has been made to provide the algorithm for semiautomatic annotation for Hindi text to cater anaphora resolution only. The study was conducted on twelve files of Ranchi Express available in EMILLE corpus. The corpus is originally tagged for demonstrative pronouns. The detection of the pronouns is supported by the incorporation of seven tags. However the semantic interpretation of the demonstrative pronoun is not supported in the original corpus. In this paper an effort has been made to automate the process of tagging as well as the handling of semantic information through addition tags. It was conducted on 1485 demonstrative pronouns. The average accuracy of precision, recall and F measure is 74, 71 and 72 respectively.

  4. Using social annotation and web log to enhance search engine

    CERN Document Server

    Nguyen, Vu Thanh

    2009-01-01

    Search services have been developed rapidly in social Internet. It can help web users easily to find their documents. So that, finding a best method search is always an imagine. This paper would like introduce hybrid method of LPageRank algorithm and Social Sim Rank algorithm. LPageRank is the method using link structure to rank priority of page. It doesn't care content of page and content of query. Therefore, we want to use benefit of social annotations to create the latent semantic association between queries and annotations. This model, we use algorithm SocialPageRank and LPageRank to enhance accuracy of search system. To experiment and evaluate the proposed of the new model, we have used this model for Music Machine Website with their web logs.

  5. Feedback Driven Annotation and Refactoring of Parallel Programs

    DEFF Research Database (Denmark)

    Larsen, Per

    This thesis combines programmer knowledge and feedback to improve modeling and optimization of software. The research is motivated by two observations. First, there is a great need for automatic analysis of software for embedded systems - to expose and model parallelism inherent in programs. Second...... are not effective unless programmers are told how and when they are benecial. A prototype compilation feedback system was developed in collaboration with IBM Haifa Research Labs. It reports issues that prevent further analysis to the programmer. Performance evaluation shows that three programs performes signicantly......, some program properties are beyond reach of such analysis for theoretical and practical reasons - but can be described by programmers. Three aspects are explored. The first is annotation of the source code. Two annotations are introduced. These allow more accurate modeling of parallelism...

  6. Ontology Based Document Annotation: Trends and Open Research Problems

    OpenAIRE

    Corcho, Oscar

    2006-01-01

    Metadata is used to describe documents and applications, improving information seeking and retrieval and its understanding and use. Metadata can be expressed in a wide variety of vocabularies and languages, and can be created and maintained with a variety of tools. Ontology based annotation refers to the process of creating metadata using ontologies as their vocabularies. We present similarities and differences with respect to other approaches for metadata creation, and describe languages and...

  7. Annotated bibliography on developmental states, political settlements and citizen formation

    OpenAIRE

    Laura Routley

    2012-01-01

    ESID Research Associate Laura Routley has produced an annotated bibliography as part of ESID's inception phase. This publication offers a starting point for investigating some of ESID research themes, drawing together what is known about the politics of what works, and laying out current insights into the key political processes which operate to build effective states and enable inclusive development. The bibliography concentrates on scholarship focused on three areas; Developmental States, P...

  8. 06491 Abstracts Collection -- Digital Historical Corpora- Architecture, Annotation, and Retrieval

    OpenAIRE

    Burnard, Lou; Dobreva, Milena; Fuhr, Norbert; Lüdeling, Anke

    2007-01-01

    From 03.12.06 to 08.12.06, the Dagstuhl Seminar 06491 ``Digital Historical Corpora - Architecture, Annotation, and Retrieval'' was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. T...

  9. Annotation Method (AM): SE41_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available -MS Fragment Viewer (http://webs2.kazusa.or.jp/msmsfragmentviewer/) are used for annotation and identification of the compounds. ... ...e used for primary database search. Peaks with no hit to these databases are then selected to secondary sear...ch using EX-HR2 (http://webs2.kazusa.or.jp/mfsearcher/) databases. After the database search processes, each

  10. GANESH: Software for Customized Annotation of Genome Regions

    OpenAIRE

    Huntley, Derek; Hummerich, Holger; Smedley, Damian; Kittivoravitkul, Sasivimol; McCarthy, Mark; Little, Peter; Sergot, Marek

    2003-01-01

    GANESH is a software package designed to support the genetic analysis of regions of human and other genomes. It provides a set of components that may be assembled to construct a self-updating database of DNA sequence, mapping data, and annotations of possible genome features. Once one or more remote sources of data for the target region have been identified, all sequences for that region are downloaded, assimilated, and subjected to a (configurable) set of standard database-searching an...

  11. AUTOMATIC ANNOTATION OF QUERY RESULTS FROM DEEP WEB DATABASE

    OpenAIRE

    Chaitanya Bhosale

    2015-01-01

    In recent years, web database extraction and annotation has received more attention from the database . When search query is submitted to the interface the search result page is generated. Search Result Records (SRRs) are the result pages obtained from web database (WDB) and these SRRs are used to display the result for each query. Every SRRs contains multiple data units similar to one semantic. These sea rch results can be used in many web applic...

  12. Grass buffers for playas in agricultural landscapes: An annotated bibliography

    Science.gov (United States)

    Melcher, Cynthia P.; Skagen, Susan K.

    2005-01-01

    This bibliography and associated literature synthesis (Melcher and Skagen, 2005) was developed for the Playa Lakes Joint Venture (PLJV). The PLJV sought compilation and annotation of the literature on grass buffers for protecting playas from runoff containing sediments, nutrients, pesticides, and other contaminants. In addition, PLJV sought information regarding the extent to which buffers may attenuate the precipitation runoff needed to fill playas, and avian use of buffers. We emphasize grass buffers, but we also provide information on other buffer types.

  13. Learning a Hybrid Architecture for Sequence Regression and Annotation

    OpenAIRE

    Zhang, Yizhe; Henao, Ricardo; Carin, Lawrence; Zhong, Jianling; Hartemink, Alexander J.

    2015-01-01

    When learning a hidden Markov model (HMM), sequen- tial observations can often be complemented by real-valued summary response variables generated from the path of hid- den states. Such settings arise in numerous domains, includ- ing many applications in biology, like motif discovery and genome annotation. In this paper, we present a flexible frame- work for jointly modeling both latent sequence features and the functional mapping that relates the summary response variables to the hidden stat...

  14. In depth annotation of the Anopheles gambiae mosquito midgut transcriptome

    OpenAIRE

    Padrón, Alejandro; Molina-Cruz, Alvaro; Quinones, Mariam; Ribeiro, José MC; Ramphul, Urvashi; Rodrigues, Janneth; Shen, Kui; Haile, Ashley; Ramirez, José Luis; Barillas-Mury, Carolina

    2014-01-01

    Background Genome sequencing of Anopheles gambiae was completed more than ten years ago and has accelerated research on malaria transmission. However, annotation needs to be refined and verified experimentally, as most predicted transcripts have been identified by comparative analysis with genomes from other species. The mosquito midgut—the first organ to interact with Plasmodium parasites—mounts effective antiplasmodial responses that limit parasite survival and disease transmission. High-th...

  15. An annotated checklist of the Cladocera (Crustacea: Branchiopoda) of Colombia.

    Science.gov (United States)

    Kotov, Alexey A; Fuentes-Reinés, Juan M

    2015-11-20

    Based on the revision of available literature on the Colombian Cladocera (Crustacea: Branchiopoda), we present an annotated checklist, with taxonomical comments for all taxa recorded since the start of research on this group in the country in 1913. We have listed 101 valid taxa, of which most records belong to the Caribbean region of Colombia. The situation in Colombian Cladocera taxonomy is, at present, unfavorable for any realistic conclusions on biodiversity, ecology and biogeography.

  16. svm PRAT: SVM-based Protein Residue Annotation Toolkit

    OpenAIRE

    Rangwala, Huzefa; Kauffman, Christopher; Karypis, George

    2009-01-01

    Background Over the last decade several prediction methods have been developed for determining the structural and functional properties of individual protein residues using sequence and sequence-derived information. Most of these methods are based on support vector machines as they provide accurate and generalizable prediction models. Results We present a general purpose protein residue annotation toolkit (svm PRAT) to allow biologists to formulate residue-wise prediction problems. svm PRAT f...

  17. Towards an hybrid system for annotating brain MRI images

    OpenAIRE

    Mechouche, Ammar; Golbreich, Christine; Gibaud, Bernard

    2006-01-01

    This paper describes a method combining symbolic and numerical techniques for annotating brain Magnetic Resonance images. The goal is to assist existing automatic labelling methods which are mostly statistical in nature and do not work very well in certain situations such as the presence of lesions. The system uses existing statistical methods for generating ABox facts that constitute a set of initial information sufficient for fruitful reasoning. The reasoning is supported by an OWL ontology...

  18. Trans-ethnic Meta-analysis and Functional Annotation Illuminates the Genetic Architecture of Fasting Glucose and Insulin.

    Science.gov (United States)

    Liu, Ching-Ti; Raghavan, Sridharan; Maruthur, Nisa; Kabagambe, Edmond Kato; Hong, Jaeyoung; Ng, Maggie C Y; Hivert, Marie-France; Lu, Yingchang; An, Ping; Bentley, Amy R; Drolet, Anne M; Gaulton, Kyle J; Guo, Xiuqing; Armstrong, Loren L; Irvin, Marguerite R; Li, Man; Lipovich, Leonard; Rybin, Denis V; Taylor, Kent D; Agyemang, Charles; Palmer, Nicholette D; Cade, Brian E; Chen, Wei-Min; Dauriz, Marco; Delaney, Joseph A C; Edwards, Todd L; Evans, Daniel S; Evans, Michele K; Lange, Leslie A; Leong, Aaron; Liu, Jingmin; Liu, Yongmei; Nayak, Uma; Patel, Sanjay R; Porneala, Bianca C; Rasmussen-Torvik, Laura J; Snijder, Marieke B; Stallings, Sarah C; Tanaka, Toshiko; Yanek, Lisa R; Zhao, Wei; Becker, Diane M; Bielak, Lawrence F; Biggs, Mary L; Bottinger, Erwin P; Bowden, Donald W; Chen, Guanjie; Correa, Adolfo; Couper, David J; Crawford, Dana C; Cushman, Mary; Eicher, John D; Fornage, Myriam; Franceschini, Nora; Fu, Yi-Ping; Goodarzi, Mark O; Gottesman, Omri; Hara, Kazuo; Harris, Tamara B; Jensen, Richard A; Johnson, Andrew D; Jhun, Min A; Karter, Andrew J; Keller, Margaux F; Kho, Abel N; Kizer, Jorge R; Krauss, Ronald M; Langefeld, Carl D; Li, Xiaohui; Liang, Jingling; Liu, Simin; Lowe, William L; Mosley, Thomas H; North, Kari E; Pacheco, Jennifer A; Peyser, Patricia A; Patrick, Alan L; Rice, Kenneth M; Selvin, Elizabeth; Sims, Mario; Smith, Jennifer A; Tajuddin, Salman M; Vaidya, Dhananjay; Wren, Mary P; Yao, Jie; Zhu, Xiaofeng; Ziegler, Julie T; Zmuda, Joseph M; Zonderman, Alan B; Zwinderman, Aeilko H; Adeyemo, Adebowale; Boerwinkle, Eric; Ferrucci, Luigi; Hayes, M Geoffrey; Kardia, Sharon L R; Miljkovic, Iva; Pankow, James S; Rotimi, Charles N; Sale, Michele M; Wagenknecht, Lynne E; Arnett, Donna K; Chen, Yii-Der Ida; Nalls, Michael A; Province, Michael A; Kao, W H Linda; Siscovick, David S; Psaty, Bruce M; Wilson, James G; Loos, Ruth J F; Dupuis, Josée; Rich, Stephen S; Florez, Jose C; Rotter, Jerome I; Morris, Andrew P; Meigs, James B

    2016-07-01

    Knowledge of the genetic basis of the type 2 diabetes (T2D)-related quantitative traits fasting glucose (FG) and insulin (FI) in African ancestry (AA) individuals has been limited. In non-diabetic subjects of AA (n = 20,209) and European ancestry (EA; n = 57,292), we performed trans-ethnic (AA+EA) fine-mapping of 54 established EA FG or FI loci with detailed functional annotation, assessed their relevance in AA individuals, and sought previously undescribed loci through trans-ethnic (AA+EA) meta-analysis. We narrowed credible sets of variants driving association signals for 22/54 EA-associated loci; 18/22 credible sets overlapped with active islet-specific enhancers or transcription factor (TF) binding sites, and 21/22 contained at least one TF motif. Of the 54 EA-associated loci, 23 were shared between EA and AA. Replication with an additional 10,096 AA individuals identified two previously undescribed FI loci, chrX FAM133A (rs213676) and chr5 PELO (rs6450057). Trans-ethnic analyses with regulatory annotation illuminate the genetic architecture of glycemic traits and suggest gene regulation as a target to advance precision medicine for T2D. Our approach to utilize state-of-the-art functional annotation and implement trans-ethnic association analysis for discovery and fine-mapping offers a framework for further follow-up and characterization of GWAS signals of complex trait loci. PMID:27321945

  19. Multilingual Twitter Sentiment Classification: The Role of Human Annotators.

    Directory of Open Access Journals (Sweden)

    Igor Mozetič

    Full Text Available What are the limits of automated Twitter sentiment classification? We analyze a large set of manually labeled tweets in different languages, use them as training data, and construct automated classification models. It turns out that the quality of classification models depends much more on the quality and size of training data than on the type of the model trained. Experimental results indicate that there is no statistically significant difference between the performance of the top classification models. We quantify the quality of training data by applying various annotator agreement measures, and identify the weakest points of different datasets. We show that the model performance approaches the inter-annotator agreement when the size of the training set is sufficiently large. However, it is crucial to regularly monitor the self- and inter-annotator agreements since this improves the training datasets and consequently the model performance. Finally, we show that there is strong evidence that humans perceive the sentiment classes (negative, neutral, and positive as ordered.

  20. Fire-induced water-repellent soils, an annotated bibliography

    Science.gov (United States)

    Kalendovsky, M.A.; Cannon, S.H.

    1997-01-01

    The development and nature of water-repellent, or hydrophobic, soils are important issues in evaluating hillslope response to fire. The following annotated bibliography was compiled to consolidate existing published research on the topic. Emphasis was placed on the types, causes, effects and measurement techniques of water repellency, particularly with respect to wildfires and prescribed burns. Each annotation includes a general summary of the respective publication, as well as highlights of interest to this focus. Although some references on the development of water repellency without fires, the chemistry of hydrophobic substances, and remediation of water-repellent conditions are included, coverage of these topics is not intended to be comprehensive. To develop this database, the GeoRef, Agricola, and Water Resources Abstracts databases were searched for appropriate references, and the bibliographies of each reference were then reviewed for additional entries. Additional references will be added to this bibliography as they become available. The annotated bibliography can be accessed on the Web at http://geohazards.cr.usgs.gov/html_files/landslides/ofr97-720/biblio.html. A database consisting of the references and keywords is available through a link at the above address. This database was compiled using EndNote2 plus software by Niles and Associates, and is necessary to search the database.

  1. Annotating Cancer Variants and Anti-Cancer Therapeutics in Reactome

    Energy Technology Data Exchange (ETDEWEB)

    Milacic, Marija; Haw, Robin, E-mail: robin.haw@oicr.on.ca; Rothfels, Karen; Wu, Guanming [Informatics and Bio-computing Platform, Ontario Institute for Cancer Research, Toronto, ON, M5G0A3 (Canada); Croft, David; Hermjakob, Henning [European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD (United Kingdom); D’Eustachio, Peter [Department of Biochemistry, NYU School of Medicine, New York, NY 10016 (United States); Stein, Lincoln [Informatics and Bio-computing Platform, Ontario Institute for Cancer Research, Toronto, ON, M5G0A3 (Canada)

    2012-11-08

    Reactome describes biological pathways as chemical reactions that closely mirror the actual physical interactions that occur in the cell. Recent extensions of our data model accommodate the annotation of cancer and other disease processes. First, we have extended our class of protein modifications to accommodate annotation of changes in amino acid sequence and the formation of fusion proteins to describe the proteins involved in disease processes. Second, we have added a disease attribute to reaction, pathway, and physical entity classes that uses disease ontology terms. To support the graphical representation of “cancer” pathways, we have adapted our Pathway Browser to display disease variants and events in a way that allows comparison with the wild type pathway, and shows connections between perturbations in cancer and other biological pathways. The curation of pathways associated with cancer, coupled with our efforts to create other disease-specific pathways, will interoperate with our existing pathway and network analysis tools. Using the Epidermal Growth Factor Receptor (EGFR) signaling pathway as an example, we show how Reactome annotates and presents the altered biological behavior of EGFR variants due to their altered kinase and ligand-binding properties, and the mode of action and specificity of anti-cancer therapeutics.

  2. Multilingual Twitter Sentiment Classification: The Role of Human Annotators.

    Science.gov (United States)

    Mozetič, Igor; Grčar, Miha; Smailović, Jasmina

    2016-01-01

    What are the limits of automated Twitter sentiment classification? We analyze a large set of manually labeled tweets in different languages, use them as training data, and construct automated classification models. It turns out that the quality of classification models depends much more on the quality and size of training data than on the type of the model trained. Experimental results indicate that there is no statistically significant difference between the performance of the top classification models. We quantify the quality of training data by applying various annotator agreement measures, and identify the weakest points of different datasets. We show that the model performance approaches the inter-annotator agreement when the size of the training set is sufficiently large. However, it is crucial to regularly monitor the self- and inter-annotator agreements since this improves the training datasets and consequently the model performance. Finally, we show that there is strong evidence that humans perceive the sentiment classes (negative, neutral, and positive) as ordered. PMID:27149621

  3. A topic modeling approach for web service annotation

    Directory of Open Access Journals (Sweden)

    Leandro Ordóñez-Ante

    2014-06-01

    Full Text Available The actual implementation of semantic-based mechanisms for service retrieval has been restricted, given the resource-intensive procedure involved in the formal specification of services, which generally comprises associating semantic annotations to their documentation sources. Typically, developer performs such a procedure by hand, requiring specialized knowledge on models for semantic description of services (e.g. OWL-S, WSMO, SAWSDL, as well as formal specifications of knowledge. Thus, this semantic-based service description procedure turns out to be a cumbersome and error-prone task. This paper introduces a proposal for service annotation, based on processing web service documentation for extracting information regarding its offered capabilities. By uncovering the hidden semantic structure of such information through statistical analysis techniques, we are able to associate meaningful annotations to the services operations/resources, while grouping those operations into non-exclusive semantic related categories. This research paper belongs to the TelComp 2.0 project, which Colciencas and University of Cauca founded in cooperation.

  4. Development and annotation of perennial Triticeae ESTs and SSR markers.

    Science.gov (United States)

    Bushman, B Shaun; Larson, Steve R; Mott, Ivan W; Cliften, Paul F; Wang, Richard R-C; Chatterton, N Jerry; Hernandez, Alvaro G; Ali, Shahjahan; Kim, Ryan W; Thimmapuram, Jyothi; Gong, George; Liu, Lei; Mikel, Mark A

    2008-10-01

    Triticeae contains hundreds of species of both annual and perennial types. Although substantial genomic tools are available for annual Triticeae cereals such as wheat and barley, the perennial Triticeae lack sufficient genomic resources for genetic mapping or diversity research. To increase the amount of sequence information available in the perennial Triticeae, three expressed sequence tag (EST) libraries were developed and annotated for Pseudoroegneria spicata, a mixture of both Elymus wawawaiensis and E. lanceolatus, and a Leymus cinereus x L. triticoides interspecific hybrid. The ESTs were combined into unigene sets of 8 780 unigenes for P. spicata, 11 281 unigenes for Leymus, and 7 212 unigenes for Elymus. Unigenes were annotated based on putative orthology to genes from rice, wheat, barley, other Poaceae, Arabidopsis, and the non-redundant database of the NCBI. Simple sequence repeat (SSR) markers were developed, tested for amplification and polymorphism, and aligned to the rice genome. Leymus EST markers homologous to rice chromosome 2 genes were syntenous on Leymus homeologous groups 6a and 6b (previously 1b), demonstrating promise for in silico comparative mapping. All ESTs and SSR markers are available on an EST information management and annotation database (http://titan.biotec.uiuc.edu/triticeae/). PMID:18923529

  5. Semantic annotation of Web data applied to risk in food.

    Science.gov (United States)

    Hignette, Gaëlle; Buche, Patrice; Couvert, Olivier; Dibie-Barthélemy, Juliette; Doussot, David; Haemmerlé, Ollivier; Mettler, Eric; Soler, Lydie

    2008-11-30

    A preliminary step to risk in food assessment is the gathering of experimental data. In the framework of the Sym'Previus project (http://www.symprevius.org), a complete data integration system has been designed, grouping data provided by industrial partners and data extracted from papers published in the main scientific journals of the domain. Those data have been classified by means of a predefined vocabulary, called ontology. Our aim is to complement the database with data extracted from the Web. In the framework of the WebContent project (www.webcontent.fr), we have designed a semi-automatic acquisition tool, called @WEB, which retrieves scientific documents from the Web. During the @WEB process, data tables are extracted from the documents and then annotated with the ontology. We focus on the data tables as they contain, in general, a synthesis of data published in the documents. In this paper, we explain how the columns of the data tables are automatically annotated with data types of the ontology and how the relations represented by the table are recognised. We also give the results of our experimentation to assess the quality of such an annotation.

  6. AnnoTALE: bioinformatics tools for identification, annotation, and nomenclature of TALEs from Xanthomonas genomic sequences.

    Science.gov (United States)

    Grau, Jan; Reschke, Maik; Erkes, Annett; Streubel, Jana; Morgan, Richard D; Wilson, Geoffrey G; Koebnik, Ralf; Boch, Jens

    2016-01-01

    Transcription activator-like effectors (TALEs) are virulence factors, produced by the bacterial plant-pathogen Xanthomonas, that function as gene activators inside plant cells. Although the contribution of individual TALEs to infectivity has been shown, the specific roles of most TALEs, and the overall TALE diversity in Xanthomonas spp. is not known. TALEs possess a highly repetitive DNA-binding domain, which is notoriously difficult to sequence. Here, we describe an improved method for characterizing TALE genes by the use of PacBio sequencing. We present 'AnnoTALE', a suite of applications for the analysis and annotation of TALE genes from Xanthomonas genomes, and for grouping similar TALEs into classes. Based on these classes, we propose a unified nomenclature for Xanthomonas TALEs that reveals similarities pointing to related functionalities. This new classification enables us to compare related TALEs and to identify base substitutions responsible for the evolution of TALE specificities. PMID:26876161

  7. Experimental Evidence for a Revision in the Annotation of Putative Pyridoxamine 5'-Phosphate Oxidases P(N/MP from Fungi.

    Directory of Open Access Journals (Sweden)

    Tatiana Domitrovic

    Full Text Available Pyridoxinamine 5'-phosphate oxidases (P(N/MP oxidases that bind flavin mononucleotide (FMN and oxidize pyridoxine 5'-phosphate or pyridoxamine 5'-phosphate to form pyridoxal 5'-phosphate (PLP are an important class of enzymes that play a central role in cell metabolism. Failure to generate an adequate supply of PLP is very detrimental to most organisms and is often clinically manifested as a neurological disorder in mammals. In this study, we analyzed the function of YLR456W and YPR172W, two homologous genes of unknown function from S. cerevisiae that have been annotated as putative P(N/MP oxidases based on sequence homology. Different experimental approaches indicated that neither protein catalyzes PLP formation nor binds FMN. On the other hand, our analysis confirmed the enzymatic activity of Pdx3, the S. cerevisiae protein previously implicated in PLP biosynthesis by genetic and structural characterization. After a careful sequence analysis comparing the putative and confirmed P(N/MP oxidases, we found that the protein domain (PF01243 that led to the YLR456W and YPR172W annotation is a poor indicator of P(N/MP oxidase activity. We suggest that a combination of two Pfam domains (PF01243 and PF10590 present in Pdx3 and other confirmed P(N/MP oxidases would be a stronger predictor of this molecular function. This work exemplifies the importance of experimental validation to rectify genome annotation and proposes a revision in the annotation of at least 400 sequences from a wide variety of fungal species that are homologous to YLR456W and are currently misrepresented as putative P(N/MP oxidases.

  8. Identification of novel endogenous antisense transcripts by DNA microarray analysis targeting complementary strand of annotated genes

    Directory of Open Access Journals (Sweden)

    Kohama Chihiro

    2009-08-01

    Full Text Available Abstract Background Recent transcriptomic analyses in mammals have uncovered the widespread occurrence of endogenous antisense transcripts, termed natural antisense transcripts (NATs. NATs are transcribed from the opposite strand of the gene locus and are thought to control sense gene expression, but the mechanism of such regulation is as yet unknown. Although several thousand potential sense-antisense pairs have been identified in mammals, examples of functionally characterized NATs remain limited. To identify NAT candidates suitable for further functional analyses, we performed DNA microarray-based NAT screening using mouse adult normal tissues and mammary tumors to target not only the sense orientation but also the complementary strand of the annotated genes. Results First, we designed microarray probes to target the complementary strand of genes for which an antisense counterpart had been identified only in human public cDNA sources, but not in the mouse. We observed a prominent expression signal from 66.1% of 635 target genes, and 58 genes of these showed tissue-specific expression. Expression analyses of selected examples (Acaa1b and Aard confirmed their dynamic transcription in vivo. Although interspecies conservation of NAT expression was previously investigated by the presence of cDNA sources in both species, our results suggest that there are more examples of human-mouse conserved NATs that could not be identified by cDNA sources. We also designed probes to target the complementary strand of well-characterized genes, including oncogenes, and compared the expression of these genes between mammary cancerous tissues and non-pathological tissues. We found that antisense expression of 95 genes of 404 well-annotated genes was markedly altered in tumor tissue compared with that in normal tissue and that 19 of these genes also exhibited changes in sense gene expression. These results highlight the importance of NAT expression in the regulation

  9. Supporting Keyword Search for Image Retrieval with Integration of Probabilistic Annotation

    Directory of Open Access Journals (Sweden)

    Tie Hua Zhou

    2015-05-01

    Full Text Available The ever-increasing quantities of digital photo resources are annotated with enriching vocabularies to form semantic annotations. Photo-sharing social networks have boosted the need for efficient and intuitive querying to respond to user requirements in large-scale image collections. In order to help users formulate efficient and effective image retrieval, we present a novel integration of a probabilistic model based on keyword query architecture that models the probability distribution of image annotations: allowing users to obtain satisfactory results from image retrieval via the integration of multiple annotations. We focus on the annotation integration step in order to specify the meaning of each image annotation, thus leading to the most representative annotations of the intent of a keyword search. For this demonstration, we show how a probabilistic model has been integrated to semantic annotations to allow users to intuitively define explicit and precise keyword queries in order to retrieve satisfactory image results distributed in heterogeneous large data sources. Our experiments on SBU (collected by Stony Brook University database show that (i our integrated annotation contains higher quality representatives and semantic matches; and (ii the results indicating annotation integration can indeed improve image search result quality.

  10. An approach to describing and analysing bulk biological annotation quality: a case study using UniProtKB

    OpenAIRE

    Bell, Michael J; Colin S Gillespie; Swan, Daniel; Lord, Phillip

    2012-01-01

    Motivation: Annotations are a key feature of many biological databases, used to convey our knowledge of a sequence to the reader. Ideally, annotations are curated manually, however manual curation is costly, time consuming and requires expert knowledge and training. Given these issues and the exponential increase of data, many databases implement automated annotation pipelines in an attempt to avoid un-annotated entries. Both manual and automated annotations vary in quality between databases ...

  11. nGASP - the nematode genome annotation assessment project

    Energy Technology Data Exchange (ETDEWEB)

    Coghlan, A; Fiedler, T J; McKay, S J; Flicek, P; Harris, T W; Blasiar, D; Allen, J; Stein, L D

    2008-12-19

    While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets for 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. The most accurate gene-finders were 'combiner' algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second place. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy as reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs were the most challenging for gene-finders. While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets for 10 Mb of the C

  12. Sequencing and annotated analysis of an Estonian human genome.

    Science.gov (United States)

    Lilleoja, Rutt; Sarapik, Aili; Reimann, Ene; Reemann, Paula; Jaakma, Ülle; Vasar, Eero; Kõks, Sulev

    2012-02-01

    In present study we describe the sequencing and annotated analysis of the individual genome of Estonian. Using SOLID technology we generated 2,449,441,916 of 50-bp reads. The Bioscope version 1.3 was used for mapping and pairing of reads to the NCBI human genome reference (build 36, hg18). Bioscope enables also the annotation of the results of variant (tertiary) analysis. The average mapping of reads was 75.5% with total coverage of 107.72 Gb. resulting in mean fold coverage of 34.6. We found 3,482,975 SNPs out of which 352,492 were novel. 21,222 SNPs were in coding region: 10,649 were synonymous SNPs, 10,360 were nonsynonymous missense SNPs, 155 were nonsynonymous nonsense SNPs and 58 were nonsynonymous frameshifts. We identified 219 CNVs with total base pair coverage of 37,326,300 bp and 87,451 large insertion/deletion polymorphisms covering 10,152,256 bp of the genome. In addition, we found 285,864 small size insertion/deletion polymorphisms out of which 133,969 were novel. Finally, we identified 53 inversions, 19 overlapped genes and 2 overlapped exons. Interestingly, we found the region in chromosome 6 to be enriched with the coding SNPs and CNVs. This study confirms previous findings, that our genomes are more complex and variable as thought before. Therefore, sequencing of the personal genomes followed by annotation would improve the analysis of heritability of phenotypes and our understandings on the functions of genome.

  13. A Method for Producing Reminiscence Videos by Using Photo Annotations

    Science.gov (United States)

    Kuwahara, Noriaki; Kuwabara, Kazuhiro; Abe, Shinji; Susami, Kenji; Yasuda, Kiyoshi

    Providing good home-based care to people with dementia is becoming an important issue as the size of the elderly population increases. One of the main problems in providing such care is that it must be constantly provided without interruption, and this puts a great burden on caregivers, who are often family members. Networked Interaction Therapy is the name we call our methods designed to relieve the stress of people suffering from dementia as well as that of their family members. This therapy aims to provide a system that interacts with people with dementia by utilizing various engaging stimuli. One such stimulus is a reminiscence video created from old photo albums, which is a promising way to hold a dementia sufferer's attention for a long time. In this paper, we present an authoring tool to assist in the production of a reminiscence video by using photo annotations. We conducted interviews with several video creators on how they used photo annotations such as date, title and subject of photos when they produced the reminiscence videos. According to the creators' comments, we have defined an ontology for representing the creators' knowledge of how to add visual effects to a reminiscence video. Subsequently, we developed an authoring tool that automatically produces a reminiscence video from the annotated photos. Subjective evaluation of the quality of reminiscence videos produced with our tool indicates that they give impressions similar to those produced by creators using conventional video editing software. The effectiveness of presenting such a video to people with dementia is also discussed.

  14. Construction of coffee transcriptome networks based on gene annotation semantics.

    Science.gov (United States)

    Castillo, Luis F; Galeano, Narmer; Isaza, Gustavo A; Gaitán, Alvaro

    2012-07-24

    Gene annotation is a process that encompasses multiple approaches on the analysis of nucleic acids or protein sequences in order to assign structural and functional characteristics to gene models. When thousands of gene models are being described in an organism genome, construction and visualization of gene networks impose novel challenges in the understanding of complex expression patterns and the generation of new knowledge in genomics research. In order to take advantage of accumulated text data after conventional gene sequence analysis, this work applied semantics in combination with visualization tools to build transcriptome networks from a set of coffee gene annotations. A set of selected coffee transcriptome sequences, chosen by the quality of the sequence comparison reported by Basic Local Alignment Search Tool (BLAST) and Interproscan, were filtered out by coverage, identity, length of the query, and e-values. Meanwhile, term descriptors for molecular biology and biochemistry were obtained along the Wordnet dictionary in order to construct a Resource Description Framework (RDF) using Ruby scripts and Methontology to find associations between concepts. Relationships between sequence annotations and semantic concepts were graphically represented through a total of 6845 oriented vectors, which were reduced to 745 non-redundant associations. A large gene network connecting transcripts by way of relational concepts was created where detailed connections remain to be validated for biological significance based on current biochemical and genetics frameworks. Besides reusing text information in the generation of gene connections and for data mining purposes, this tool development opens the possibility to visualize complex and abundant transcriptome data, and triggers the formulation of new hypotheses in metabolic pathways analysis.

  15. BBP: Brucella genome annotation with literature mining and curation

    Directory of Open Access Journals (Sweden)

    He Yongqun

    2006-07-01

    Full Text Available Abstract Background Brucella species are Gram-negative, facultative intracellular bacteria that cause brucellosis in humans and animals. Sequences of four Brucella genomes have been published, and various Brucella gene and genome data and analysis resources exist. A web gateway to integrate these resources will greatly facilitate Brucella research. Brucella genome data in current databases is largely derived from computational analysis without experimental validation typically found in peer-reviewed publications. It is partially due to the lack of a literature mining and curation system able to efficiently incorporate the large amount of literature data into genome annotation. It is further hypothesized that literature-based Brucella gene annotation would increase understanding of complicated Brucella pathogenesis mechanisms. Results The Brucella Bioinformatics Portal (BBP is developed to integrate existing Brucella genome data and analysis tools with literature mining and curation. The BBP InterBru database and Brucella Genome Browser allow users to search and analyze genes of 4 currently available Brucella genomes and link to more than 20 existing databases and analysis programs. Brucella literature publications in PubMed are extracted and can be searched by a TextPresso-powered natural language processing method, a MeSH browser, a keywords search, and an automatic literature update service. To efficiently annotate Brucella genes using the large amount of literature publications, a literature mining and curation system coined Limix is developed to integrate computational literature mining methods with a PubSearch-powered manual curation and management system. The Limix system is used to quickly find and confirm 107 Brucella gene mutations including 75 genes shown to be essential for Brucella virulence. The 75 genes are further clustered using COG. In addition, 62 Brucella genetic interactions are extracted from literature publications. These

  16. Automatisierte semantische Annotation von Fußballspielen aus Fernsehen

    OpenAIRE

    Siles Canales, Francisco

    2014-01-01

    Das Hauptziel dieser Dissertation ist die Erforschung von Mechanismen zur Erstellung eines computationalen Systems, für die automatisierte semantische Annotation von Fußballspielen im Fernsehen. Ein abstraktes Modell ist für die Darstellung des Fußballs verwendet, und für die Speicherung und den Abruf von wichtigen Informationen für die Beantwortung von fußballbezogenen Fragen. Die wichtigste Hypothese ist, dass das Modell basierend auf den Trajektorien der Ziele auf dem Spielfeld, mit Daten ...

  17. An annotated checklist of the Greek Stonefly Fauna (Insecta: Plecoptera).

    Science.gov (United States)

    Karaouzas, Ioannis; Andriopoulou, Argyro; Kouvarda, Theodora; Murányi, Dávid

    2016-05-17

    An overview of the Greek stonefly (Plecoptera) fauna is presented as an annotated index of all available published records. These records have resulted in an updated species list reflecting current taxonomy and species distributions of the Greek peninsula and islands. Currently, a total of 71 species and seven subspecies belonging to seven families and 19 genera are reported from Greece. There is high species endemicity of the Leuctridae and Nemouridae, particularly on the Greek islands. The endemics known from Greece comprise thirty species representing 42% of the Greek stonefly fauna. The remaining taxa are typical Balkan and Mediterranean species.

  18. An Annotation Scheme for Reichenbach's Verbal Tense Structure

    CERN Document Server

    Derczynski, Leon

    2012-01-01

    In this paper we present RTMML, a markup language for the tenses of verbs and temporal relations between verbs. There is a richness to tense in language that is not fully captured by existing temporal annotation schemata. Following Reichenbach we present an analysis of tense in terms of abstract time points, with the aim of supporting automated processing of tense and temporal relations in language. This allows for precise reasoning about tense in documents, and the deduction of temporal relations between the times and verbal events in a discourse. We define the syntax of RTMML, and demonstrate the markup in a range of situations.

  19. An annotated checklist of the Greek Stonefly Fauna (Insecta: Plecoptera).

    Science.gov (United States)

    Karaouzas, Ioannis; Andriopoulou, Argyro; Kouvarda, Theodora; Murányi, Dávid

    2016-01-01

    An overview of the Greek stonefly (Plecoptera) fauna is presented as an annotated index of all available published records. These records have resulted in an updated species list reflecting current taxonomy and species distributions of the Greek peninsula and islands. Currently, a total of 71 species and seven subspecies belonging to seven families and 19 genera are reported from Greece. There is high species endemicity of the Leuctridae and Nemouridae, particularly on the Greek islands. The endemics known from Greece comprise thirty species representing 42% of the Greek stonefly fauna. The remaining taxa are typical Balkan and Mediterranean species. PMID:27395093

  20. Computational analyses and annotations of the Arabidopsis peroxidasegene family

    DEFF Research Database (Denmark)

    Østergaard, Lars; Pedersen, Anders Gorm; Jespersen, Hans M.;

    1998-01-01

    Classical heme-containing plant peroxidases have been ascribed a wide variety of functional roles related to development, defense, lignification and hormonal signaling. More than 40 peroxidase genes are now known in Arabidopsis thaliana for which functional association is complicated by a general...... lack of peroxidase substrate specificity. Computational analysis was performed on 30 near full-length Arabidopsis peroxidase cDNAs for annotation of start codons and signal peptide cleavage sites. A compositional analysis revealed that 23 of the 30 peroxidase cDNAs have 5' untranslated regions...

  1. SmashCommunity: A metagenomic annotation and analysis tool

    DEFF Research Database (Denmark)

    Arumugam, Manimozhiyan; Harrington, Eoghan D; Foerstner, Konrad U;

    2010-01-01

    SUMMARY: SmashCommunity is a stand-alone metagenomic annotation and analysis pipeline suitable for data from Sanger and 454 sequencing technologies. It supports state-of-the-art software for essential metagenomic tasks such as assembly and gene prediction. It provides tools to estimate...... the quantitative phylogenetic and functional compositions of metagenomes, to compare compositions of multiple metagenomes and to produce intuitive visual representations of such analyses. AVAILABILITY: SmashCommunity is freely available at http://www.bork.embl.de/software/smash CONTACT: bork@embl.de....

  2. Modeling Multiple Annotator Expertise in the Semi-Supervised Learning Scenario

    CERN Document Server

    Yan, Yan; Fung, Glenn; Dy, Jennifer

    2012-01-01

    Learning algorithms normally assume that there is at most one annotation or label per data point. However, in some scenarios, such as medical diagnosis and on-line collaboration,multiple annotations may be available. In either case, obtaining labels for data points can be expensive and time-consuming (in some circumstances ground-truth may not exist). Semi-supervised learning approaches have shown that utilizing the unlabeled data is often beneficial in these cases. This paper presents a probabilistic semi-supervised model and algorithm that allows for learning from both unlabeled and labeled data in the presence of multiple annotators. We assume that it is known what annotator labeled which data points. The proposed approach produces annotator models that allow us to provide (1) estimates of the true label and (2) annotator variable expertise for both labeled and unlabeled data. We provide numerical comparisons under various scenarios and with respect to standard semi-supervised learning. Experiments showed ...

  3. Large-scale prokaryotic gene prediction and comparison to genome annotation

    DEFF Research Database (Denmark)

    Nielsen, Pernille; Krogh, Anders Stærmose

    2005-01-01

    Motivation: Prokaryotic genomes are sequenced and annotated at an increasing rate. The methods of annotation vary between sequencing groups. It makes genome comparison difficult and may lead to propagation of errors when questionable assignments are adapted from one genome to another. Genome...... comparison either on a large or small scale would be facilitated by using a single standard for annotation, which incorporates a transparency of why an open reading frame (ORF) is considered to be a gene. Results: A total of 143 prokaryotic genomes were scored with an updated version of the prokaryotic...... genefinder EasyGene. Comparison of the GenBank and RefSeq annotations with the EasyGene predictions reveals that in some genomes up to 60% of the genes may have been annotated with a wrong start codon, especially in the GC-rich genomes. The fractional difference between annotated and predicted confirms...

  4. A General Framework for Representing, Reasoning and Querying with Annotated Semantic Web Data

    CERN Document Server

    Zimmermann, Antoine; Polleres, Axel; Straccia, Umberto

    2011-01-01

    We describe a generic framework for representing and reasoning with annotated Semantic Web data, a task becoming more important with the recent increased amount of inconsistent and non-reliable meta-data on the web. We formalise the annotated language, the corresponding deductive system and address the query answering problem. Previous contributions on specific RDF annotation domains are encompassed by our unified reasoning formalism as we show by instantiating it on (i) temporal, (ii) fuzzy, and (iii) provenance annotations. Moreover, we provide a generic method for combining multiple annotation domains allowing to represent, e.g. temporally-annotated fuzzy RDF. Furthermore, we address the development of a query language -- AnQL -- that is inspired by SPARQL, including several features of SPARQL 1.1 (subqueries, aggregates, assignment, solution modifiers) along with the formal definitions of their semantics.

  5. Re-annotation and re-analysis of the Campylobacter jejuni NCTC11168 genome sequence

    Directory of Open Access Journals (Sweden)

    Dorrell Nick

    2007-06-01

    Full Text Available Abstract Background Campylobacter jejuni is the leading bacterial cause of human gastroenteritis in the developed world. To improve our understanding of this important human pathogen, the C. jejuni NCTC11168 genome was sequenced and published in 2000. The original annotation was a milestone in Campylobacter research, but is outdated. We now describe the complete re-annotation and re-analysis of the C. jejuni NCTC11168 genome using current database information, novel tools and annotation techniques not used during the original annotation. Results Re-annotation was carried out using sequence database searches such as FASTA, along with programs such as TMHMM for additional support. The re-annotation also utilises sequence data from additional Campylobacter strains and species not available during the original annotation. Re-annotation was accompanied by a full literature search that was incorporated into the updated EMBL file [EMBL: AL111168]. The C. jejuni NCTC11168 re-annotation reduced the total number of coding sequences from 1654 to 1643, of which 90.0% have additional information regarding the identification of new motifs and/or relevant literature. Re-annotation has led to 18.2% of coding sequence product functions being revised. Conclusions Major updates were made to genes involved in the biosynthesis of important surface structures such as lipooligosaccharide, capsule and both O- and N-linked glycosylation. This re-annotation will be a key resource for Campylobacter research and will also provide a prototype for the re-annotation and re-interpretation of other bacterial genomes.

  6. Automated Eukaryotic Gene Structure Annotation Using EVidenceModeler and the Program to Assemble Spliced Alignments

    Energy Technology Data Exchange (ETDEWEB)

    Haas, B J; Salzberg, S L; Zhu, W; Pertea, M; Allen, J E; Orvis, J; White, O; Buell, C R; Wortman, J R

    2007-12-10

    EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation.

  7. Semantic annotation of multilingual learning objects based on a domain ontology

    OpenAIRE

    Knoth, Petr

    2009-01-01

    One of the important tasks in the use of learning resources in e-learning is the necessity to annotate learning objects with appropriate metadata. However, annotating resources by hand is time consuming and difficult. Here we explore the problem of automatic extraction of metadata for description of learning resources. First, theoretical constraints for gathering certain types of metadata important for e-learning systems are discussed. Our approach to annotation is then outlined. This is base...

  8. VCF-Miner: GUI-based application for mining variants and annotations stored in VCF files

    OpenAIRE

    Steven N Hart; Duffy, Patrick; Quest, Daniel J.; Hossain, Asif; Meiners, Mike A; Kocher, Jean-Pierre

    2015-01-01

    Next-generation sequencing platforms are widely used to discover variants associated with disease. The processing of sequencing data involves read alignment, variant calling, variant annotation and variant filtering. The standard file format to hold variant calls is the variant call format (VCF) file. According to the format specifications, any arbitrary annotation can be added to the VCF file for downstream processing. However, most downstream analysis programs disregard annotations already ...

  9. A General Framework for Representing, Reasoning and Querying with Annotated Semantic Web Data

    OpenAIRE

    Zimmermann, Antoine; Lopes, Nuno; Polleres, Axel; Straccia, Umberto

    2011-01-01

    We describe a generic framework for representing and reasoning with annotated Semantic Web data, a task becoming more important with the recent increased amount of inconsistent and non-reliable meta-data on the web. We formalise the annotated language, the corresponding deductive system and address the query answering problem. Previous contributions on specific RDF annotation domains are encompassed by our unified reasoning formalism as we show by instantiating it on (i) temporal, (ii) fuzzy,...

  10. Image Auto-annotation using 'Easy' and 'More Challenging' Training Sets

    OpenAIRE

    Tang, Jiayu; Lewis, Paul H.

    2006-01-01

    The Corel Image set is widely used for image annotation performance evaluation although it has been claimed that the set is easy to annotate. The aim of this paper is to demonstrate some of the disadvantages of sets like the Corel set for effective auto-annotation evaluation. We first compare the performanace of several annoatation algorithms using the Corel set and find that simple near neighbour propagation techniques perform almost as well as the best of the more sophisticated algorithms. ...

  11. Comparative validation of the D. melanogaster modENCODE transcriptome annotation

    OpenAIRE

    Chen, Zhen-Xia; Sturgill, David; Qu, Jiaxin; Jiang, Huaiyang; Park, Soo; Boley, Nathan; Suzuki, Ana Maria; Anthony R. Fletcher; David C Plachetzki; FitzGerald, Peter C.; Artieri, Carlo G.; Atallah, Joel; Barmina, Olga; Brown, James B.; Blankenburg, Kerstin P

    2014-01-01

    Accurate gene model annotation of reference genomes is critical for making them useful. The modENCODE project has improved the D. melanogaster genome annotation by using deep and diverse high-throughput data. Since transcriptional activity that has been evolutionarily conserved is likely to have an advantageous function, we have performed large-scale interspecific comparisons to increase confidence in predicted annotations. To support comparative genomics, we filled in divergence gaps in the ...

  12. User friendly signal processing web services for annotators in AVATecH and AUVIS

    OpenAIRE

    Auer, E.

    2013-01-01

    User friendly signal processing web services: The joint Max Planck Fraunhofer project AVATecH aims to support the very time intensive work of annotating audio and video recordings, letting signal processing modules (recognizers) assist annotators. -*- We designed a small, flexible framework where XML metadata describes input, output and settings of recognizers. Building blocks are audio and video files, annotation tiers and numerical data, packaged in simple formats. Text pipes allow flexibil...

  13. The discrepancies in the results of bioinformatics tools for genomic structural annotation

    Science.gov (United States)

    Pawełkowicz, Magdalena; Nowak, Robert; Osipowski, Paweł; Rymuszka, Jacek; Świerkula, Katarzyna; Wojcieszek, Michał; Przybecki, Zbigniew

    2014-11-01

    A major focus of sequencing project is to identify genes in genomes. However it is necessary to define the variety of genes and the criteria for identifying them. In this work we present discrepancies and dependencies from the application of different bioinformatic programs for structural annotation performed on the cucumber data set from Polish Consortium of Cucumber Genome Sequencing. We use Fgenesh, GenScan and GeneMark to automated structural annotation, the results have been compared to reference annotation.

  14. Annotation and sequence diversity of transposable elements in common bean (Phaseolus vulgaris

    Directory of Open Access Journals (Sweden)

    Scott eJackson

    2014-07-01

    Full Text Available Common bean (Phaseolus vulgaris is an important legume crop grown and consumed worldwide. With the availability of the common bean genome sequence, the next challenge is to annotate the genome and characterize functional DNA elements. Transposable elements (TEs are the most abundant component of plant genomes and can dramatically affect genome evolution and genetic variation. Thus, it is pivotal to identify TEs in the common bean genome. In this study, we performed a genome-wide transposon annotation in common bean using a combination of homology and sequence structure-based methods. We developed a 2.12-Mb transposon database which includes 791 representative transposon sequences and is available upon request or from www.phytozome.org. Of note, nearly all transposons in the database are previously unrecognized TEs. More than 5,000 transposon-related expressed sequence tags (ESTs were detected which indicates that some transposons may be transcriptionally active. Two Ty1-copia retrotransposon families were found to encode the envelope-like protein which has rarely been identified in plant genomes. Also, we identified an extra open reading frame (ORF termed ORF2 from 15 Ty3-gypsy families that was located between the ORF encoding the retrotransposase and the 3’LTR. The ORF2 was in opposite transcriptional orientation to retrotransposase. Sequence homology searches and phylogenetic analysis suggested that the ORF2 may have an ancient origin, but its function is not clear. This transposon data provides a useful resource for understanding the genome organization and evolution and may be used to identify active TEs for developing transposon-tagging system in common bean and other related genomes.

  15. Generation, annotation and analysis of ESTs from Trichoderma harzianum CECT 2413

    Directory of Open Access Journals (Sweden)

    Gutiérrez Santiago

    2006-07-01

    Full Text Available Abstract Background The filamentous fungus Trichoderma harzianum is used as biological control agent of several plant-pathogenic fungi. In order to study the genome of this fungus, a functional genomics project called "TrichoEST" was developed to give insights into genes involved in biological control activities using an approach based on the generation of expressed sequence tags (ESTs. Results Eight different cDNA libraries from T. harzianum strain CECT 2413 were constructed. Different growth conditions involving mainly different nutrient conditions and/or stresses were used. We here present the analysis of the 8,710 ESTs generated. A total of 3,478 unique sequences were identified of which 81.4% had sequence similarity with GenBank entries, using the BLASTX algorithm. Using the Gene Ontology hierarchy, we performed the annotation of 51.1% of the unique sequences and compared its distribution among the gene libraries. Additionally, the InterProScan algorithm was used in order to further characterize the sequences. The identification of the putatively secreted proteins was also carried out. Later, based on the EST abundance, we examined the highly expressed genes and a hydrophobin was identified as the gene expressed at the highest level. We compared our collection of ESTs with the previous collections obtained from Trichoderma species and we also compared our sequence set with different complete eukaryotic genomes from several animals, plants and fungi. Accordingly, the presence of similar sequences in different kingdoms was also studied. Conclusion This EST collection and its annotation provide a significant resource for basic and applied research on T. harzianum, a fungus with a high biotechnological interest.

  16. Ontology-Based Semantic Annotation for Problem Set Archives in the Web

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    Aimming at the difficulty in getting semantic information from each problem in problem set archives, We propose a new method of ontology-based semantic annotation for problem set archives, which utilizes programming knowledge domain ontology to add semantic annotations to problems in the Web. The system we developed adds semantic annotation for each problem in the form of Extensible Makeup Language. Our method overcomes the difficulty of extracting semantics from problem set archives and the efficiency of this method is demonstrated through a case study. Having semantic annotations of problems, a student can efficiently locate the problems that logically correspond to his knowledge.

  17. Design and Evaluation of Data Annotation Workflows for CAVE-like Virtual Environments.

    Science.gov (United States)

    Pick, Sebastian; Weyers, Benjamin; Hentschel, Bernd; Kuhlen, Torsten W

    2016-04-01

    Data annotation finds increasing use in Virtual Reality applications with the goal to support the data analysis process, such as architectural reviews. In this context, a variety of different annotation systems for application to immersive virtual environments have been presented. While many interesting interaction designs for the data annotation workflow have emerged from them, important details and evaluations are often omitted. In particular, we observe that the process of handling metadata to interactively create and manage complex annotations is often not covered in detail. In this paper, we strive to improve this situation by focusing on the design of data annotation workflows and their evaluation. We propose a workflow design that facilitates the most important annotation operations, i.e., annotation creation, review, and modification. Our workflow design is easily extensible in terms of supported annotation and metadata types as well as interaction techniques, which makes it suitable for a variety of application scenarios. To evaluate it, we have conducted a user study in a CAVE-like virtual environment in which we compared our design to two alternatives in terms of a realistic annotation creation task. Our design obtained good results in terms of task performance and user experience.

  18. IMG ER: A System for Microbial Genome Annotation Expert Review and Curation

    Energy Technology Data Exchange (ETDEWEB)

    Markowitz, Victor M.; Mavromatis, Konstantinos; Ivanova, Natalia N.; Chen, I-Min A.; Chu, Ken; Kyrpides, Nikos C.

    2009-05-25

    A rapidly increasing number of microbial genomes are sequenced by organizations worldwide and are eventually included into various public genome data resources. The quality of the annotations depends largely on the original dataset providers, with erroneous or incomplete annotations often carried over into the public resources and difficult to correct. We have developed an Expert Review (ER) version of the Integrated Microbial Genomes (IMG) system, with the goal of supporting systematic and efficient revision of microbial genome annotations. IMG ER provides tools for the review and curation of annotations of both new and publicly available microbial genomes within IMG's rich integrated genome framework. New genome datasets are included into IMG ER prior to their public release either with their native annotations or with annotations generated by IMG ER's annotation pipeline. IMG ER tools allow addressing annotation problems detected with IMG's comparative analysis tools, such as genes missed by gene prediction pipelines or genes without an associated function. Over the past year, IMG ER was used for improving the annotations of about 150 microbial genomes.

  19. The DOE-JGI Standard Operating Procedure for the Annotations of the Microbial Genomes

    OpenAIRE

    Mavromatis, Konstantinos

    2010-01-01

    The DOE-JGI Microbial Annotation Pipeline (DOE-JGI MAP) supports gene prediction and/or functional annotation of microbial genomes towards comparative analysis with the Integrated Microbial Genome (IMG) system. DOE-JGI MAP annotation is applied on nucleotide sequence datasets included in the IMG-ER (Expert Review) version of IMG via the IMG ER submission site. Users can submit the sequence datasets consisting of one or more contigs in a multi-fasta file. DOE-JGI MAP annotation includes predic...

  20. The DOE-JGI Standard Operating Procedure for the Annotations of Microbial Genomes.

    Science.gov (United States)

    Mavromatis, Konstantinos; Ivanova, Natalia N; Chen, I-Min A; Szeto, Ernest; Markowitz, Victor M; Kyrpides, Nikos C

    2009-01-01

    The DOE-JGI Microbial Annotation Pipeline (DOE-JGI MAP) supports gene prediction and/or functional annotation of microbial genomes towards comparative analysis with the Integrated Microbial Genome (IMG) system. DOE-JGI MAP annotation is applied on nucleotide sequence datasets included in the IMG-ER (Expert Review) version of IMG via the IMG ER submission site. Users can submit the sequence datasets consisting of one or more contigs in a multi-fasta file. DOE-JGI MAP annotation includes prediction of protein coding and RNA genes, as well as repeats and assignment of product names to these genes. PMID:21304638