WorldWideScience

Sample records for annotation est-ssr characterization

  1. De novo assembly of transcriptome sequencing in Caragana korshinskii Kom. and characterization of EST-SSR markers.

    Directory of Open Access Journals (Sweden)

    Yan Long

    Full Text Available Caragana korshinskii Kom. is widely distributed in various habitats, including gravel desert, clay desert, fixed and semi-fixed sand, and saline land in the Asian and African deserts. To date, no previous genomic information or EST-SSR marker has been reported in Caragana Fabr. genus. In this study, more than two billion bases of high-quality sequence of C. korshinskii were generated by using illumina sequencing technology and demonstrated the de novo assembly and annotation of genes without prior genome information. These reads were assembled into 86,265 unigenes (mean length = 709 bp. The similarity search indicated that 33,955 and 21,978 unigenes showed significant similarities to known proteins from NCBI non-redundant and Swissprot protein databases, respectively. Among these annotated unigenes, 26,232 a unigenes were separately assigned to Gene Ontology (GO database. When 22,756 unigenes searched against the Kyoto Encyclopedia of Genes and Genomes Pathway (KEGG database, 5,598 unigenes were assigned to 5 main categories including 32 KEGG pathways. Among the main KEGG categories, metabolism was the biggest category (2,862, 43.7%, suggesting the active metabolic processes in the desert tree. In addition, a total of 19,150 EST-SSRs were identified from 15,484 unigenes, and the characterizations of EST-SSRs were further compared with other four species in Fabraceae. 126 potential marker sites were randomly selected to validate the assembly quality and develop EST-SSR markers. Among the 9 germplasms in Caranaga Fabr. genus, PCR success rate were 93.7% and the phylogenic tree was constructed based on the genotypic data. This research generated a substantial fraction of transcriptome sequences, which were very useful resources for gene annotation and discovery, molecular markers development, genome assembly and annotation. The EST-SSR markers identified and developed in this study will facilitate marker-assisted selection breeding.

  2. Characterization and Development of EST-SSR Markers Derived from Transcriptome of Yellow Catfish

    Directory of Open Access Journals (Sweden)

    Jin Zhang

    2014-10-01

    Full Text Available Yellow catfish (Pelteobagrus fulvidraco is one of the most important freshwater fish due to its delicious flesh and high nutritional value. However, lack of sufficient simple sequence repeat (SSR markers has hampered the progress of genetic selection breeding and molecular research for yellow catfish. To this end, we aimed to develop and characterize polymorphic expressed sequence tag (EST–SSRs from the 454 pyrosequencing transcriptome of yellow catfish. Totally, 82,794 potential EST-SSR markers were identified and distributed in the coding and non-coding regions. Di-nucleotide (53,933 is the most abundant motif type, and AC/GT, AAT/ATT, AAAT/ATTT are respective the most frequent di-, tri-, tetra-nucleotide repeats. We designed primer pairs for all of the identified EST-SSRs and randomly selected 300 of these pairs for further validation. Finally, 263 primer pairs were successfully amplified and 57 primer pairs were found to be consistently polymorphic when four populations of 48 individuals were tested. The number of alleles for the 57 loci ranged from 2 to 17, with an average of 8.23. The observed heterozygosity (HO, expected heterozygosity (HE, polymorphism information content (PIC and fixation index (fis values ranged from 0.04 to 1.00, 0.12 to 0.92, 0.12 to 0.91 and −0.83 to 0.93, respectively. These EST-SSR markers generated in this study could greatly facilitate future studies of genetic diversity and molecular breeding in yellow catfish.

  3. Characterization of variable EST SSR markers for Norway spruce (Picea abies L.

    Directory of Open Access Journals (Sweden)

    Spiess Nadine

    2011-10-01

    Full Text Available Abstract Background Norway spruce is widely distributed across Europe and the predominant tree of the Alpine region. Fast growth and the fact that timber can be harvested cost-effectively in relatively young populations define its status as one of the economically most important tree species of Northern Europe. In this study, EST derived simple sequence repeat (SSR markers were developed for the assessment of putative functional diversity in Austrian Norway spruce stands. Results SSR sequences were identified by analyzing 14,022 publicly available EST sequences. Tri-nucleotide repeat motifs were most abundant in the data set followed by penta- and hexa-nucleotide repeats. Specific primer pairs were designed for sixty loci. Among these, 27 displayed polymorphism in a testing population of 16 P. abies individuals sampled across Austria and in an additional screening population of 96 P. abies individuals from two geographically distinct Austrian populations. Allele numbers per locus ranged from two to 17 with observed heterozygosity ranging from 0.075 to 0.99. Conclusions We have characterized variable EST SSR markers for Norway spruce detected in expressed genes. Due to their moderate to high degree of variability in the two tested screening populations, these newly developed SSR markers are well suited for the analysis of stress related functional variation present in Norway spruce populations.

  4. Genetic characterization of an elite coffee germplasm assessed by gSSR and EST-SSR markers.

    Science.gov (United States)

    Missio, R F; Caixeta, E T; Zambolim, E M; Pena, G F; Zambolim, L; Dias, L A S; Sakiyama, N S

    2011-10-06

    Coffee is one of the main agrifood commodities traded worldwide. In 2009, coffee accounted for 6.1% of the value of Brazilian agricultural production, generating a revenue of US$6 billion. Despite the importance of coffee production in Brazil, it is supported by a narrow genetic base, with few accessions. Molecular differentiation and diversity of a coffee breeding program were assessed with gSSR and EST-SSR markers. The study comprised 24 coffee accessions according to their genetic origin: arabica accessions (six traditional genotypes of C. arabica), resistant arabica (six leaf rust-resistant C. arabica genotypes with introgression of Híbrido de Timor), robusta (five C. canephora genotypes), Híbrido de Timor (three C. arabica x C. canephora), triploids (three C. arabica x C. racemosa), and racemosa (one C. racemosa). Allele and polymorphism analysis, AMOVA, the Student t-test, Jaccard's dissimilarity coefficient, cluster analysis, correlation of genetic distances, and discriminant analysis, were performed. EST-SSR markers gave 25 exclusive alleles per genetic group, while gSSR showed 47, which will be useful for differentiating accessions and for fingerprinting varieties. The gSSR markers detected a higher percentage of polymorphism among (35% higher on average) and within (42.9% higher on average) the genetic groups, compared to EST-SSR markers. The highest percentage of polymorphism within the genetic groups was found with gSSR markers for robusta (89.2%) and for resistant arabica (39.5%). It was possible to differentiate all genotypes including the arabica-related accessions. Nevertheless, combined use of gSSR and EST-SSR markers is recommended for coffee molecular characterization, because EST-SSRs can provide complementary information.

  5. Development and Characterization of 37 Novel EST-SSR Markers in Pisum sativum (Fabaceae

    Directory of Open Access Journals (Sweden)

    Xiaofeng Zhuang

    2013-01-01

    Full Text Available Premise of the study: Simple sequence repeat markers were developed based on expressed sequence tags (EST-SSR and screened for polymorphism among 23 Pisum sativum individuals to assist development and refinement of pea linkage maps. In particular, the SSR markers were developed to assist in mapping of white mold disease resistance quantitative trait loci. Methods and Results: Primer pairs were designed for 46 SSRs identified in EST contiguous sequences assembled from a 454 pyrosequenced transcriptome of the pea cultivar, ‘LIFTER’. Thirty-seven SSR markers amplified PCR products, of which 11 (30% SSR markers produced polymorphism in 23 individuals, including parents of recombinant inbred lines, with two to four alleles. The observed and expected heterozygosities ranged from 0 to 0.43 and from 0.31 to 0.83, respectively. Conclusions: These EST-SSR markers for pea will be useful for refinement of pea linkage maps, and will likely be useful for comparative mapping of pea and as tools for marker-based pea breeding.

  6. Exploiting EST databases for the development and characterization of EST-SSR markers in castor bean (Ricinus communis L.

    Directory of Open Access Journals (Sweden)

    Yang Jun-Bo

    2010-12-01

    Full Text Available Abstract Background The castor bean (Ricinus communis L., a monotypic species in the spurge family (Euphorbiaceae, 2n = 20, is an important non-edible oilseed crop widely cultivated in tropical, sub-tropical and temperate countries for its high economic value. Because of the high level of ricinoleic acid (over 85% in its seed oil, the castor bean seed derivatives are often used in aviation oil, lubricants, nylon, dyes, inks, soaps, adhesive and biodiesel. Due to lack of efficient molecular markers, little is known about the population genetic diversity and the genetic relationships among castor bean germplasm. Efficient and robust molecular markers are increasingly needed for breeding and improving varieties in castor bean. The advent of modern genomics has produced large amounts of publicly available DNA sequence data. In particular, expressed sequence tags (ESTs provide valuable resources to develop gene-associated SSR markers. Results In total, 18,928 publicly available non-redundant castor bean EST sequences, representing approximately 17.03 Mb, were evaluated and 7732 SSR sites in 5,122 ESTs were identified by data mining. Castor bean exhibited considerably high frequency of EST-SSRs. We developed and characterized 118 polymorphic EST-SSR markers from 379 primer pairs flanking repeats by screening 24 castor bean samples collected from different countries. A total of 350 alleles were identified from 118 polymorphic SSR loci, ranging from 2-6 per locus (A with an average of 2.97. The EST-SSR markers developed displayed moderate gene diversity (He with an average of 0.41. Genetic relationships among 24 germplasms were investigated using the genotypes of 350 alleles, showing geographic pattern of genotypes across genetic diversity centers of castor bean. Conclusion Castor bean EST sequences exhibited considerably high frequency of SSR sites, and were rich resources for developing EST-SSR markers. These EST-SSR markers would be particularly

  7. Development and Characterization of 25 EST-SSR markers in Pinus sylvestris var. mongolica (Pinaceae

    Directory of Open Access Journals (Sweden)

    Pan Fang

    2014-01-01

    Full Text Available Premise of the study: A set of novel expressed sequence tag (EST microsatellite markers was developed in Pinus sylvestris var. mongolica to promote further genetic studies in this species. Methods and Results: One hundred seventy-five EST–simple sequence repeat (SSR primers were designed and synthesized for 31,653 isotigs based on P. tabuliformis EST sequences. The primer pairs were used to identify 25 polymorphic loci in 48 individuals. The number of alleles ranged from two to eight with observed and expected heterozygosity values of 0.0435 to 0.8125 and 0.0430 to 0.7820, respectively. Conclusions: These new polymorphic EST-SSR markers will be useful for assessing genetic diversity, molecular breeding and genetic improvement, and conservation of P. sylvestris var. mongolica.

  8. Transcriptome analysis of colored calla lily (Zantedeschia rehmannii Engl.) by Illumina sequencing: de novo assembly, annotation and EST-SSR marker development.

    Science.gov (United States)

    Wei, Zunzheng; Sun, Zhenzhen; Cui, Binbin; Zhang, Qixiang; Xiong, Min; Wang, Xian; Zhou, Di

    2016-01-01

    Colored calla lily is the short name for the species or hybrids in section Aestivae of genus Zantedeschia. It is currently one of the most popular flower plants in the world due to its beautiful flower spathe and long postharvest life. However, little genomic information and few molecular markers are available for its genetic improvement. Here, de novo transcriptome sequencing was performed to produce large transcript sequences for Z. rehmannii cv. 'Rehmannii' using an Illumina HiSeq 2000 instrument. More than 59.9 million cDNA sequence reads were obtained and assembled into 39,298 unigenes with an average length of 1,038 bp. Among these, 21,077 unigenes showed significant similarity to protein sequences in the non-redundant protein database (Nr) and in the Swiss-Prot, Gene Ontology (GO), Cluster of Orthologous Group (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Moreover, a total of 117 unique transcripts were then defined that might regulate the flower spathe development of colored calla lily. Additionally, 9,933 simple sequence repeats (SSRs) and 7,162 single nucleotide polymorphisms (SNPs) were identified as putative molecular markers. High-quality primers for 200 SSR loci were designed and selected, of which 58 amplified reproducible amplicons were polymorphic among 21 accessions of colored calla lily. The sequence information and molecular markers in the present study will provide valuable resources for genetic diversity analysis, germplasm characterization and marker-assisted selection in the genus Zantedeschia.

  9. De novo assembly and characterization of the floral transcriptome of an economically important tree species, Lindera glauca (Lauraceae), including the development of EST-SSR markers for population genetics.

    Science.gov (United States)

    Zhu, Shanshan; Ding, Yanqian; Yap, Zhaoyan; Qiu, Yingxiong

    2016-11-01

    Lindera glauca (Lauraceae) is an economically important East Asian forest tree characterized by a dioecy in China and apomixis in Japan. However, patterns of population genetic diversity and structure of this species remain unknown for this species due to a lack of efficient molecular markers. In this study, we employed Illumina sequencing to analyze the transcriptomes of the female and male flower buds of L. glauca. We retrieved 59,753 and 75,075 unigenes for the female and male buds, respectively. Based on sequence similarity, 44,379 (74.27 %) unigenes for the female and 45,414 (60.49 %) unigenes for the male were matched to public databases. We identified 11,127 putative differentially expressed genes between the female and male buds and 20,048 expressed sequence tag-simple sequence repeats (EST-SSRs). From 3147 primer pairs designed successfully, 120 were selected for validation of polymorphism, and 13 could reliably amplify polymorphic bands and exhibited moderate levels of genetic diversity (e.g., N A = 4.42; H E = 0.56) when surveyed across 96 individuals of altogether six L. glauca populations from China and Japan. One of the three population genetic clusters identified in China was fixed in Japan, suggesting a historical population bottleneck following island immigration. The present study has generated a wealth of transcriptome data for future functional genomic research focused on the variable reproductive system of L. glauca (dioecy, apomixis) as well as EST-SSR markers for population genetics studies and its intriguing evolutionary shift from dioecy to apomixis in the wake of island colonization.

  10. Expressed Sequence Tag-Simple Sequence Repeat (EST-SSR Marker Resources for Diversity Analysis of Mango (Mangifera indica L.

    Directory of Open Access Journals (Sweden)

    Natalie L. Dillon

    2014-01-01

    Full Text Available In this study, a collection of 24,840 expressed sequence tags (ESTs generated from five mango (Mangifera indica L. cDNA libraries was mined for EST-based simple sequence repeat (SSR markers. Over 1,000 ESTs with SSR motifs were detected from more than 24,000 EST sequences with di- and tri-nucleotide repeat motifs the most abundant. Of these, 25 EST-SSRs in genes involved in plant development, stress response, and fruit color and flavor development pathways were selected, developed into PCR markers and characterized in a population of 32 mango selections including M. indica varieties, and related Mangifera species. Twenty-four of the 25 EST-SSR markers exhibited polymorphisms, identifying a total of 86 alleles with an average of 5.38 alleles per locus, and distinguished between all Mangifera selections. Private alleles were identified for Mangifera species. These newly developed EST-SSR markers enhance the current 11 SSR mango genetic identity panel utilized by the Australian Mango Breeding Program. The current panel has been used to identify progeny and parents for selection and the application of this extended panel will further improve and help to design mango hybridization strategies for increased breeding efficiency.

  11. Development, cross-species/genera transferability of novel EST-SSR markers and their utility in revealing population structure and genetic diversity in sugarcane

    KAUST Repository

    Singh, Ram K.

    2013-07-01

    Sugarcane (Saccharum spp. hybrid) with complex polyploid genome requires a large number of informative DNA markers for various applications in genetics and breeding. Despite the great advances in genomic technology, it is observed in several crop species, especially in sugarcane, the availability of molecular tools such as microsatellite markers are limited. Now-a-days EST-SSR markers are preferred to genomic SSR (gSSR) as they represent only the functional part of the genome, which can be easily associated with desired trait. The present study was taken up with a new set of 351 EST-SSRs developed from the 4085 non redundant EST sequences of two Indian sugarcane cultivars. Among these EST-SSRs, TNR containing motifs were predominant with a frequency of 51.6%. Thirty percent EST-SSRs showed homology with annotated protein. A high frequency of SSRs was found in the 5\\'UTR and in the ORF (about 27%) and a low frequency was observed in the 3\\'UTR (about 8%). Two hundred twenty-seven EST-SSRs were evaluated, in sugarcane, allied genera of sugarcane and cereals, and 134 of these have revealed polymorphism with a range of PIC value 0.12 to 0.99. The cross transferability rate ranged from 87.0% to 93.4% in Saccharum complex, 80.0% to 87.0% in allied genera, and 76.0% to 80.0% in cereals. Cloning and sequencing of EST-SSR size variant amplicons revealed that the variation in the number of repeat-units was the main source of EST-SSR fragment polymorphism. When 124 sugarcane accessions were analyzed for population structure using model-based approach, seven genetically distinct groups or admixtures thereof were observed in sugarcane. Results of principal coordinate analysis or UPGMA to evaluate genetic relationships delineated also the 124 accessions into seven groups. Thus, a high level of polymorphism adequate genetic diversity and population structure assayed with the EST-SSR markers not only suggested their utility in various applications in genetics and genomics in

  12. Genetic diversity of Phytophthora sojae isolates in Heilongjiang Province in China assessed by RAPD and EST-SSR

    Science.gov (United States)

    Wu, J. J.; Xu, P. F.; Liu, L. J.; Wang, J. S.; Lin, W. G.; Zhang, S. Z.; Wei, L.

    Random-amplified polymorphic DNA (RAPD) and EST-SSR markers were used to estimate the genetic relationship among thirty-nine P.sojae isolates from three locations in Heilongjiang Province, and nine isolates from Ohio in America were made as reference strains. 10 of 50 RAPD primers and 5 of 33 EST-SSR were polymorphic across 48 P.sojae isolates. Similarity values among P.sojae isolates were from 49% to 82% based on the RAPD data. The similarities based on EST-SSR markers ranged from 47% to 85%. The genetic diversity revealed by EST-SSR marker analysis was higher than that obtained from RAPD. The similarity matrices for the SSR data and the RAPD data were moderately correlated (r = 0.47). Genetic similarity coefficients were also relatively lower, which demonstrated complicated genetic background within each location. The high similarity values range revealed the ability of RAPD/EST-SSR markers to distinguish even among morphological similar phytophthora.

  13. Assessment of genetic diversity in the sorghum reference set using EST-SSR markers.

    Science.gov (United States)

    Ramu, P; Billot, C; Rami, J-F; Senthilvel, S; Upadhyaya, H D; Ananda Reddy, L; Hash, C T

    2013-08-01

    Selection and use of genetically diverse genotypes are key factors in any crop breeding program to develop cultivars with a broad genetic base. Molecular markers play a major role in selecting diverse genotypes. In the present study, a reference set representing a wide range of sorghum genetic diversity was screened with 40 EST-SSR markers to validate both the use of these markers for genetic structure analyses and the population structure of this set. Grouping of accessions is identical in distance-based and model-based clustering methods. Genotypes were grouped primarily based on race within the geographic origins. Accessions derived from the African continent contributed 88.6 % of alleles confirming the African origin of sorghum. In total, 360 alleles were detected in the reference set with an average of 9 alleles per marker. The average PIC value was 0.5230 with a range of 0.1379-0.9483. Sub-race, guinea margaritiferum (Gma) from West Africa formed a separate cluster in close proximity to wild accessions suggesting that the Gma group represents an independent domestication event. Guineas from India and Western Africa formed two distinct clusters. Accessions belongs to the kafir race formed the most homogeneous group as observed in earlier studies. This analysis suggests that the EST-SSR markers used in the present study have greater discriminating power than the genomic SSRs. Genetic variance within the subpopulations was very high (71.7 %) suggesting that the germplasm lines included in the set are more diverse. Thus, this reference set representing the global germplasm is an ideal material for the breeding community, serving as a community resource for trait-specific allele mining as well as genome-wide association mapping.

  14. Genetic Variation in Triticum turgidum L.ssp.turgidum Landraces from China Assessed by EST-SSR Markers

    Institute of Scientific and Technical Information of China (English)

    LI Wei; DONG Pan; WEI Yu-ming; CHENG Guo-yue; ZHENG You-liang

    2008-01-01

    It was helpful for the wheat improvement to evaluate the genetic resources of Triticum turgidum L.ssp.turgidum landraces.In this study,68 turgidum landraces accessions,belonging to four geographic populations in China,were investigated by using EST-SSR markers.A total of 63 alleles were detected on 22 EST-SSR loci,and the number of alleles on each locus ranged from 1 to 5,with an average of 2.9.The results of the analysis of molecular variance (AMOVA)indicated that 92.5% of the total variations was attributed to the genetic variations within population,whereas only 7.5%variations among populations.Although the four populations had similar genetic diversity parameters,Sichuan population was yet distinguished from other populations when comparing the population samples in pairs.Significant correlations were detected by the statistic analysis among six genetic diversity parameters among each other.The selection difference between heterozygosty and homozygosty was also observed among different EST-SSR locus.The genetic similarity (GS)ranged from 0.18 to 0.98,with the mean of 0.72,and all accessions could be clustered into 7 groups.The dendrogram suggested that the genetic relationships among turgidum accessions evaluated by EST-SSR markers were unrelated to their geographic distributions.These results implied that turgidum landraces from China had the unique characters of genetic diversity.

  15. tropiTree: an NGS-based EST-SSR resource for 24 tropical tree species.

    Science.gov (United States)

    Russell, Joanne R; Hedley, Peter E; Cardle, Linda; Dancey, Siobhan; Morris, Jenny; Booth, Allan; Odee, David; Mwaura, Lucy; Omondi, William; Angaine, Peter; Machua, Joseph; Muchugi, Alice; Milne, Iain; Kindt, Roeland; Jamnadass, Ramni; Dawson, Ian K

    2014-01-01

    The development of genetic tools for non-model organisms has been hampered by cost, but advances in next-generation sequencing (NGS) have created new opportunities. In ecological research, this raises the prospect for developing molecular markers to simultaneously study important genetic processes such as gene flow in multiple non-model plant species within complex natural and anthropogenic landscapes. Here, we report the use of bar-coded multiplexed paired-end Illumina NGS for the de novo development of expressed sequence tag-derived simple sequence repeat (EST-SSR) markers at low cost for a range of 24 tree species. Each chosen tree species is important in complex tropical agroforestry systems where little is currently known about many genetic processes. An average of more than 5,000 EST-SSRs was identified for each of the 24 sequenced species, whereas prior to analysis 20 of the species had fewer than 100 nucleotide sequence citations. To make results available to potential users in a suitable format, we have developed an open-access, interactive online database, tropiTree (http://bioinf.hutton.ac.uk/tropiTree), which has a range of visualisation and search facilities, and which is a model for the efficient presentation and application of NGS data.

  16. SSR and EST-SSR-based genetic linkage map of cassava (Manihot esculenta Crantz).

    Science.gov (United States)

    Sraphet, Supajit; Boonchanawiwat, Athipong; Thanyasiriwat, Thanwanit; Boonseng, Opas; Tabata, Satoshi; Sasamoto, Shigemi; Shirasawa, Kenta; Isobe, Sachiko; Lightfoot, David A; Tangphatsornruang, Sithichoke; Triwitayakorn, Kanokporn

    2011-04-01

    Simple sequence repeat (SSR) markers provide a powerful tool for genetic linkage map construction that can be applied for identification of quantitative trait loci (QTL). In this study, a total of 640 new SSR markers were developed from an enriched genomic DNA library of the cassava variety 'Huay Bong 60' and 1,500 novel expressed sequence tag-simple sequence repeat (EST-SSR) loci were developed from the Genbank database. To construct a genetic linkage map of cassava, a 100 F(1) line mapping population was developed from the cross Huay Bong 60 by 'Hanatee'. Polymorphism screening between the parental lines revealed that 199 SSRs and 168 EST-SSRs were identified as novel polymorphic markers. Combining with previously developed SSRs, we report a linkage map consisted of 510 markers encompassing 1,420.3 cM, distributed on 23 linkage groups with a mean distance between markers of 4.54 cM. Comparison analysis of the SSR order on the cassava linkage map and the cassava genome sequences allowed us to locate 284 scaffolds on the genetic map. Although the number of linkage groups reported here revealed that this F(1) genetic linkage map is not yet a saturated map, it encompassed around 88% of the cassava genome indicating that the map was almost complete. Therefore, sufficient markers now exist to encompass most of the genomes and efficiently map traits in cassava.

  17. Genetic diversity revealed by genomic-SSR and EST-SSR markers among common wheat, spelt and compactum

    Institute of Scientific and Technical Information of China (English)

    YANG Xinquan; LIU Peng; HAN Zongfu; NI Zhongfu; SUN Qixin

    2005-01-01

    In this study, two SSR molecular markers, named genomic-SSR and EST-SSR, are used to measure the genetic diversity among three hexaploid wheat populations, which include 28 common wheat ( Triticum aestivum L. ), 13 spelt ( Triticum spelta L. ),and 11 compactum ( Triticum compactum Host. ). The results show that common wheat has the highest genetic polymorphism, followed by spelt and then compactum. The mean genetic distance between the populations is higher than that within a population, and similar tendency is detected for individual genomes A, B and D. Therefore, spelt and compactum can be used as potential germplasms for wheat breeding, especially for enriching the genetic variation in genome D. As compared with spelt, the genetic diversity between common wheat and compactum is much smaller, indicating a closer consanguine relationship between these two species. Although the polymorphism revealed by EST-SSR is lower than that by genomic-SSR, it can effectively differentiate diverse genotypes as well. Together with our present results, it is concluded that EST-SSR marker is an ideal marker for assessing the genetic diversity in wheat. Meanwhile, the origin and evolution of hexaploid wheat is also analyzed and discussed.

  18. First genetic linkage map of Taraxacum koksaghyz Rodin based on AFLP, SSR, COS and EST-SSR markers

    Science.gov (United States)

    Arias, Marina; Hernandez, Monica; Remondegui, Naroa; Huvenaars, Koen; van Dijk, Peter; Ritter, Enrique

    2016-01-01

    Taraxacum koksaghyz Rodin (TKS) has been studied in many occasions as a possible alternative source for natural rubber production of good quality and for inulin production. Some tire companies are already testing TKS tire prototypes. There are also many investigations on the production of bio-fuels from inulin and inulin applications for health improvement and in the food industry. A limited amount of genomic resources exist for TKS and particularly no genetic linkage map is available in this species. We have constructed the first TKS genetic linkage map based on AFLP, COS, SSR and EST-SSR markers. The integrated linkage map with eight linkage groups (LG), representing the eight chromosomes of Russian dandelion, has 185 individual AFLP markers from parent 1, 188 individual AFLP markers from parent 2, 75 common AFLP markers and 6 COS, 1 SSR and 63 EST-SSR loci. Blasting the EST-SSR sequences against known sequences from lettuce allowed a partial alignment of our TKS map with a lettuce map. Blast searches against plant gene databases revealed some homologies with useful genes for downstream applications in the future. PMID:27488242

  19. Development of Soybean EST-SSR Markers and Their Use to Assess Genetic Diversity in the Subgenus Soja

    Institute of Scientific and Technical Information of China (English)

    LIU Yu-lin; LI Ying-hui; ZHOU Guo-an; Uzokwe N; CHANG Ru-zhen; CHEN Shou-yi; QIU Li-juan

    2010-01-01

    Developing expressed sequence tag-derived SSR (EST-SSR) markers is imperative in genetic research. In this paper, we reported 37 EST-SSR markers which were developed from 286 unigenes obtained from soybean eDNA library. Among the 286 markers designed for the 4 accessions of Glycine max and 6 of its wild progenitor (G. soja) within the subgenus Soja,209 markers amplified DNA fragments, taking 73.1% and 37 markers appeared to be polymorphic, which was 12.9% of the total. The 37 loci detected a total of 142 alleles, while the PIC values varied from 0.194 to 0.794. Both the number of alleles per locus and PIC value were significantly related to the SSR motif. Six EST-SSR loci may be fixed for different alleles between G. max and G. soja since they were particularly polymorphic among the 6 G. soja accessions. A neighbor-joining tree placed the G. max accessions together as a group within the G. soja, though the average genetic distance among G. soja accessions was much higher. These new EST-SSRs markers will be useful for genetic diversity analysis, genetic mapping construction and gene discovery in Soja subgenus.

  20. First genetic linkage map of Taraxacum koksaghyz Rodin based on AFLP, SSR, COS and EST-SSR markers.

    Science.gov (United States)

    Arias, Marina; Hernandez, Monica; Remondegui, Naroa; Huvenaars, Koen; van Dijk, Peter; Ritter, Enrique

    2016-08-04

    Taraxacum koksaghyz Rodin (TKS) has been studied in many occasions as a possible alternative source for natural rubber production of good quality and for inulin production. Some tire companies are already testing TKS tire prototypes. There are also many investigations on the production of bio-fuels from inulin and inulin applications for health improvement and in the food industry. A limited amount of genomic resources exist for TKS and particularly no genetic linkage map is available in this species. We have constructed the first TKS genetic linkage map based on AFLP, COS, SSR and EST-SSR markers. The integrated linkage map with eight linkage groups (LG), representing the eight chromosomes of Russian dandelion, has 185 individual AFLP markers from parent 1, 188 individual AFLP markers from parent 2, 75 common AFLP markers and 6 COS, 1 SSR and 63 EST-SSR loci. Blasting the EST-SSR sequences against known sequences from lettuce allowed a partial alignment of our TKS map with a lettuce map. Blast searches against plant gene databases revealed some homologies with useful genes for downstream applications in the future.

  1. An EST-SSR linkage map of Raphanus sativus and comparative genomics of the Brassicaceae.

    Science.gov (United States)

    Shirasawa, Kenta; Oyama, Maki; Hirakawa, Hideki; Sato, Shusei; Tabata, Satoshi; Fujioka, Takashi; Kimizuka-Takagi, Chiaki; Sasamoto, Shigemi; Watanabe, Akiko; Kato, Midori; Kishida, Yoshie; Kohara, Mitsuyo; Takahashi, Chika; Tsuruoka, Hisano; Wada, Tsuyuko; Sakai, Takako; Isobe, Sachiko

    2011-08-01

    Raphanus sativus (2n = 2x = 18) is a widely cultivated member of the family Brassicaceae, for which genomic resources are available only to a limited extent in comparison to many other members of the family. To promote more genetic and genomic studies and to enhance breeding programmes of R. sativus, we have prepared genetic resources such as complementary DNA libraries, expressed sequences tags (ESTs), simple sequence repeat (SSR) markers and a genetic linkage map. A total of 26 606 ESTs have been collected from seedlings, roots, leaves, and flowers, and clustered into 10 381 unigenes. Similarities were observed between the expression patterns of transcripts from R. sativus and those from representative members of the genera Arabidopsis and Brassica, indicating their functional relatedness. The EST sequence data were used to design 3800 SSR markers and consequently 630 polymorphic SSR loci and 213 reported marker loci have been mapped onto nine linkage groups, covering 1129.2 cM with an average distance of 1.3 cM between loci. Comparison of the mapped EST-SSR marker positions in R. sativus with the genome sequence of A. thaliana indicated that the Brassicaceae members have evolved from a common ancestor. It appears that genomic fragments corresponding to those of A. thaliana have been doubled and tripled in R. sativus. The genetic map developed here is expected to provide a standard map for the genetics, genomics, and molecular breeding of R. sativus as well as of related species. The resources are available at http://marker.kazusa.or.jp/Daikon.

  2. Assessment of EST-SSR markers for genetic analisys on coffee Potencial de marcadores EST-SSR para análise genética em café

    Directory of Open Access Journals (Sweden)

    Robson Fernando Missio

    2009-09-01

    Full Text Available EST-SSR markers were used to investigate the genetic diversity among and within coffee populations, to explore the possibility of their use for fingerprinting of cultivars and to assist breeding programs. Seventeen markers, developed from ESTs (Expressed Sequence Tags from the Brazilian Coffee Genome Project, were used. All markers showed polymorphism among the genotypes assessed. The average number of allele per primer was 5.1. The highest polymorphisms were found within C. canephora (88.2% and rust-resistant varieties (35.3%. About 29.4% of the markers differentiated C. arabica from Híbrido de Timor; it was also possible to identify those closest and farthest from C. arabica . The analysis of population-grouped genotypes revealed a 64.0% genetic diversity among and a 36.0% genetic diversity within populations. The differentiation index was 0.637. Six markers distinguished four rust-resistance varieties, showing their fingerprinting potential. These results demonstrate the usefulness of EST-SSR markers for cross orientation, in diversity and introgression studies, and in genetic mapping.No estudo da diversidade genética entre e dentro de populações de café, foram usados marcadores EST-SSR, visando avaliar seu potencial para identificar cultivares comerciais e assistir programas de melhoramento. Os 17 marcadores utilizados foram desenvolvidos a partir das seqüências ESTs do Projeto Brasileiro do Genoma Café. Em todos os marcadores observou-se polimorfismo entre os genótipos avaliados, com um número médio de 5,1 alelos por primer. Os maiores polimorfismos foram constatados dentro de C. canephora (88,2% e em variedades resistentes à ferrugem (35,3%. Dos marcadores analisados, 29,4% distinguiram C. arabica dos Híbridos de Timor (HDT, sendo possível identificar os mais próximos e os mais distantes de C. arabica . A análise dos genótipos agrupados por população revelou diversidade genética de 64% entre populações e 36% dentro

  3. 马铃薯致病疫霉EST-SSR PCR反应体系的优化%Optimization of PCR system in EST-SSR analysis of Phytophthora infestans

    Institute of Scientific and Technical Information of China (English)

    王佳楠; 吕文河; 金光辉; 白雅梅; 李文霞; 韩英鹏

    2009-01-01

    由致病疫霉(Phytophthora infestans Montagne de Bary)引起的马铃薯晚疫病是危害马铃薯生产的最严重病害.试验以致病疫霉茵株HHO6-23为模板,以Pi08N为引物,研究致病疫霉EST-SSR PCR反应体系中各主要成分和退火温度对扩增结果的影响.优化后的PCR反应体系为:25 ng模板DNA,0.5 mmol·L-1dNTPs,2μL10×Buffer(Mg2+Free),1.75 mmol·L-1MgCl2,15 pmol引物,1.2 UTaq DNA聚合酶,加ddH2O至20μL;同时确定引物退火温度为63℃.PCR体系稳定性试验结果表明,优化后的致病疫霉EST-SSR PCR反应体系稳定、可靠,适合进行致病疫霉群体遗传多样性研究.

  4. Exploiting Illumina Sequencing for the Development of 95 Novel Polymorphic EST-SSR Markers in Common Vetch (Vicia sativa subsp. sativa

    Directory of Open Access Journals (Sweden)

    Zhipeng Liu

    2014-05-01

    Full Text Available The common vetch (Vicia sativa subsp. sativa, a self-pollinating and diploid species, is one of the most important annual legumes in the world due to its short growth period, high nutritional value, and multiple usages as hay, grain, silage, and green manure. The available simple sequence repeat (SSR markers for common vetch, however, are insufficient to meet the developing demand for genetic and molecular research on this important species. Here, we aimed to develop and characterise several polymorphic EST-SSR markers from the vetch Illumina transcriptome. A total number of 1,071 potential EST-SSR markers were identified from 1025 unigenes whose lengths were greater than 1,000 bp, and 450 primer pairs were then designed and synthesized. Finally, 95 polymorphic primer pairs were developed for the 10 common vetch accessions, which included 50 individuals. Among the 95 EST-SSR markers, the number of alleles ranged from three to 13, and the polymorphism information content values ranged from 0.09 to 0.98. The observed heterozygosity values ranged from 0.00 to 1.00, and the expected heterozygosity values ranged from 0.11 to 0.98. These 95 EST-SSR markers developed from the vetch Illumina transcriptome could greatly promote the development of genetic and molecular breeding studies pertaining to in this species.

  5. Construction of an EST-SSR-based interspecific transcriptome linkage map of fibre development in cotton

    Indian Academy of Sciences (India)

    Chuanxiang Liu; Daojun Yuan; Zhongxu Lin

    2014-12-01

    Quantitative trait locus (QTL) mapping is an important method in marker-assisted selection breeding. Many studies on the QTLs focus on cotton fibre yield and quality; however, most are conducted at the DNA level, which may reveal null QTLs. Hence, QTL mapping based on transcriptome maps at the cDNA level is often more reliable. In this study, an interspecific transcriptome map of allotetraploid cotton was developed based on an F2 population (Emian22 × 3-79) by amplifying cDNA using EST-SSRs. The map was constructed using cDNA obtained from developing fibres at five days post anthesis (DPA). A total of 1270 EST-SSRs were screened for polymorphisms between the mapping parents. The resulting transcriptome linkage map contained 242 markers that were distributed in 32 linkage groups (26 chromosomes). The full length of this map is 1938.72 cM with a mean marker distance of 8.01 cM. The functions of some ESTs have been annotated by exploring homologous sequences. Some markers were related to the differentiation and elongation of cotton fibre, while most were related to the basic metabolism. This study demonstrates that constructing a transcriptome linkage map by amplifying cDNAs using EST-SSRs is a simple and practical method as well as a powerful tool to map eQTLs for fibre quality and other traits in cotton.

  6. An EST-SSR based linkage map for Persea americana Mill. (avocado)

    Science.gov (United States)

    Recent enhancement of the pool of known molecular markers for avocado has allowed the construction of the first moderate density genetic map for this species. Over 300 microsatellite markers have been characterized and 163 of these were used to construct a map from the cross of two Florida cultivar...

  7. A deep sequencing analysis of transcriptomes and the development of EST-SSR markers in mungbean (Vigna radiata)

    Indian Academy of Sciences (India)

    CHANGYOU LIU; BAOJIE FAN; ZHIMIN CAO; QIUZHU SU; YAN WANG; ZHIXIAO ZHANG; JING WU; JING TIAN

    2016-09-01

    Mungbean (Vigna radiata L. Wilczek) is one of the most important leguminous food crops in Asia. We employed Illumina paired-end sequencing to analyse transcriptomes of three different mungbean genotypes. A total of 38.3–39.8 million paired-end reads with 73 bp lengths were generated. The pooled reads from the three libraries were assembled into 56,471 transcripts. Following a cluster analysis, 43,293 unigenes were obtained with an average length of 739 bp and N50 length of 1176 bp. Of the unigenes, 34,903 (80.6%) had significant similarity to known proteins in the NCBI nonredundant protein database (Nr), while 21,450 (58.4%) had BLAST hits in the Swiss-Prot database (E-value < 10⁻⁵). Further, 1245 differential expression genes were detected among three mungbean genotypes. In addition, we identified 3788 expressed sequence tag-simple sequence repeat (EST-SSR) motifs that could be used as potential molecular markers. Among 320 tested loci, 310 (96.5%) yielded amplification products, and 151 (47.0%) exhibited polymorphisms among six mungbean accessions. These transcriptome data and mungbean EST-SSRs could serve as a valuable resource for novel gene discovery and the marker-assisted selective breeding of this specie

  8. Genome evolution of intermediate wheatgrass as revealed by EST-SSR markers developed from its three progenitor diploid species.

    Science.gov (United States)

    Wang, Richard R-C; Larson, Steve R; Jensen, Kevin B; Bushman, B Shaun; DeHaan, Lee R; Wang, Shuwen; Yan, Xuebing

    2015-02-01

    Intermediate wheatgrass (Thinopyrum intermedium (Host) Barkworth & D.R. Dewey), a segmental autoallohexaploid (2n = 6x = 42), is not only an important forage crop but also a valuable gene reservoir for wheat (Triticum aestivum L.) improvement. Throughout the scientific literature, there continues to be disagreement as to the origin of the different genomes in intermediate wheatgrass. Genotypic data obtained from newly developed EST-SSR primers derived from the putative progenitor diploid species Pseudoroegneria spicata (Pursh) Á. Löve (St genome), Thinopyrum bessarabicum (Savul. & Rayss) Á. Löve (J = J(b) = E(b)), and Thinopyrum elongatum (Host) D. Dewey (E = J(e) = E(e)) indicate that the V genome of Dasypyrum (Coss. & Durieu) T. Durand is not one of the three genomes in intermediate wheatgrass. Based on all available information in the literature and findings in this study, the genomic designation of intermediate wheatgrass should be changed to J(vs)J(r)St, where J(vs) and J(r) represent ancestral genomes of present-day J(b) of Th. bessarabicum and J(e) of Th. elongatum, with J(vs) being more ancient. Furthermore, the information suggests that the St genome in intermediate wheatgrass is most similar to the present-day St found in diploid species of Pseudoroegneria from Eurasia.

  9. Determination of the genetic diversity of vegetable soybean [Glycine max (L.) Merr.] using EST-SSR markers

    Institute of Scientific and Technical Information of China (English)

    Gu-wen ZHANG; Sheng-chun XU; Wei-hua MAO; Qi-zan HU; Ya-ming GONG

    2013-01-01

    The development of expressed sequence tag-derived simple sequence repeats (EST-SSRs) provided a useful tool for investigating plant genetic diversity.In the present study,22 polymorphic EST-SSRs from grain soybean were identified and used to assess the genetic diversity in 48 vegetable soybean accessions.Among the 22 EST-SSR loci,tri-nucleotides were the most abundant repeats,accounting for 50.00% of the total motifs.GAA was the most common motif among tri-nucleotide repeats,with a frequency of 18.18%.Polymorphic analysis identified a total of 71 alleles,with an average of 3.23 per locus.The polymorphism information content (PIC) values ranged from 0.144 to 0.630,with a mean of 0.386.Observed heterozygosity (Ho) values varied from 0.0196 to 1.0000,with an average of 0.6092,while the expected heterozygosity (He) values ranged from 0.1502 to 0.6840,with a mean value of 0.4616.Principal coordinate analysis and phylogenetic tree analysis indicated that the accessions could be assigned to different groups based to a large extent on their geographic distribution,and most accessions from China were clustered into the same groups.These results suggest that Chinese vegetable soybean accessions have a narrow genetic base.The results of this study indicate that EST-SSRs from grain soybean have high transferability to vegetable soybean,and that these new markers would be helpful in taxonomy,molecular breeding,and comparative mapping studies of vegetable soybean in the future.

  10. De novo transcriptomic analysis and development of EST-SSR markers in the Siberian tiger (Panthera tigris altaica).

    Science.gov (United States)

    Lu, Taofeng; Sun, Yujiao; Ma, Qin; Zhu, Minghao; Liu, Dan; Ma, Jianzhang; Ma, Yuehui; Chen, Hongyan; Guan, Weijun

    2016-12-01

    The Siberian tiger, Panthera tigris altaica, is an endangered species, and much more work is needed to protect this species, which is still vulnerable to extinction. Conservation efforts may be supported by the genetic assessment of wild populations, for which highly specific microsatellite markers are required. However, only a limited amount of genetic sequence data is available for this species. To identify the genes involved in the lung transcriptome and to develop additional simple sequence repeat (SSR) markers for the Siberian tiger, we used high-throughput RNA-Seq to characterize the Siberian tiger transcriptome in lung tissue (designated 'PTA-lung') and a pooled tissue sample (designated 'PTA'). Approximately 47.5 % (33,187/69,836) of the lung transcriptome was annotated in four public databases (Nr, Swiss-Prot, KEGG, and COG). The annotated genes formed a potential pool for gene identification in the tiger. An analysis of the genes differentially expressed in the PTA lung, and PTA samples revealed that the tiger may have suffered a series of diseases before death. In total, 1062 non-redundant SSRs were identified in the Siberian tiger transcriptome. Forty-three primer pairs were randomly selected for amplification reactions, and 26 of the 43 pairs were also used to evaluate the levels of genetic polymorphism. Fourteen primer pairs (32.56 %) amplified products that were polymorphic in size in P. tigris altaica. In conclusion, the transcriptome sequences will provide a valuable genomic resource for genetic research, and these new SSR markers comprise a reasonable number of loci for the genetic analysis of wild and captive populations of P. tigris altaica.

  11. Development of EST-SSR Markers in (Citrus aurantium L.)%积壳EST-SSR标记的开发

    Institute of Scientific and Technical Information of China (English)

    杨春霞; 温强; 叶金山; 朱培林

    2011-01-01

    The big collection of expressed sequence tags (ESTs) from Citrus aurantium L. is available in public database, and offers an opportunity to identify simple sequence repeats (SSR) in ESTs by data mining. These sequences may provide an estimate of diversity in the expressed portion of the genome and may be useful for comparative mapping, for tagging important traits of interest, and for additional map-based cloning of important genes.We analyzed fiontal 11 029 Unigene sequences fi.om C. aurantium database using online SSR identified software SSRIT. Totally, 327 ESTs with 348 SSRs were identified, and accounted for 2.96% of the total number of EST sequences. Trinucleotide repeats (a total of 161) were the most abundant repeat class, and accounted for 46.26% of all found SSRs. According to these EST sequences containing SSR, 58 primer pairs were designed using Primer 3.0. of these, 36 primer pairs amplified DNA fragments and 6 primer pairs exhibited polymorphism, account for 62.07% and 10.34% of the total designed 58 pairs. The development of new EST-SSR markers from C. aurantium has potential important implication for genetic analysis and exploitation of genetic resources of C. aurantium and would provide a more direct estimate of functional diversity.%GenBank上己公布的积壳EST序列为开发新的SSR标记提供了宝贵的数据资源.本研究利用在线SSR鉴定软件SSRIT分析来自积壳EST数据库的11029条Unigene序列.分析结果共发现327条EST序列含有348个SSR位点,占总数的2.96%.其中,三核苷酸重复的SSR类型最多,共有161个,占检索总数的46.26%. Primer 3.0设计合成58对EST-SSR引物,其中36对能扩增出产物,6对引物产生多态性分离,分别占所设计引物总数的62.07%和10.34%.本文研究成果为今后积壳遗传多样性分析、遗传图谱构建及比较基因组等研究方面奠定了基础.

  12. Development of a panel of unigene-derived polymorphic EST-SSR markers in lentil using public database information

    Institute of Scientific and Technical Information of China (English)

    Debjyoti Sen Gupta; Peng Cheng; Gaurav Sablok; Dil Thavarajah; Pushparajah Thavarajah; Clarice J Coyne; Shiv Kumar; Michael Baum; Rebecca J McGee

    2016-01-01

    Lentil (Lens culinaris Medik.), a diploid (2n=14) with a genome size greater than 4000 Mbp, is an important cool season food legume grown worldwide. The availability of genomic resources is limited in this crop species. The objective of this study was to develop polymorphic markers in lentil using publicly available curated expressed sequence tag information (ESTs). In this study, 9513 ESTs were downloaded from the National Center for Biotechnology Information (NCBI) database to develop unigene-based simple sequence repeat (SSR) markers. The ESTs were assembled into 4053 unigenes and then analyzed to identify 374 SSRs using the MISA microsatellite identification tool. Among the 374 SSRs, 26 compound SSRs were observed. Primer pairs for these SSRs were designed using Primer3 version 1.14. To classify the functional annotation of ESTs and EST–SSRs, BLASTx searches (using E-value 1 × 10−5) against the public UniProt (http://www.uniprot.org/) and NCBI (http://www.ncbi.nlh.nih.gov/) data-bases were performed. Further functional annotation was performed using PLAZA (version 3.0) comparative genomics and GO annotation was summarized using the Plant GO slim category. Among the synthesized 312 primers, 219 successfully amplified Lens DNA. A diverse panel of 24 Lens genotypes was used to identify polymorphic markers. A polymorphic set of 57 markers successfully discriminated the test genotypes. This set of polymorphic markers with functional annotation data could be used as molecular tools in lentil breeding.

  13. Utilização de marcadores moleculares RAPD e EST's SSR para estudo da variabilidade genética em cana-de-açúcar¹ Use of molecular markers RAPD, and ESTs SSR to study genetic variability in sugarcane

    Directory of Open Access Journals (Sweden)

    João Andrade Dutra Filho

    2013-03-01

    Full Text Available Marcadores moleculares dos tipos RAPD e EST's SSR foram utilizados como ferramentas para avaliar a variabilidade bem como estimar a divergência genética entre variedades comerciais e clones de cana-de-açúcar oriundos via autofecundação. Vinte e três genótipos foram utilizados neste estudo oriundos do Programa de Melhoramento Genético da Cana-de-açúcar da Rede Interuniversitária para o Desenvolvimento do Setor Sucroalcooleiro (PMGCA/RIDESA. A extração do DNA genômico seguiu a metodologia CTAB com modificações para cana-de-açúcar. Foram utilizados 11 oligonucleotídeos RAPD da operon Technologies e 7 EST's SRR obtidos através de uma extensa revisão de literatura. As análises da divergência genética foram realizadas com o auxílio do Programa GENES. Os marcadores RAPD detectaram um alto grau de polimorfismo genético, produzindo 61 bandas, das quais 58 foram polimórficas. Os marcadores EST's SSR amplificaram 38 alelos, sendo 34 polimórficos. Havendo a formação de três grupos, com a população estudada. A maior parte da variação genética foi mantida dentro das progênies, evidenciando a ocorrência de um alto grau de variabilidade genética entre os genótipos de cada progênie para fins de melhoramento. Através da divergência genética estimada foi possível identificar parentais divergentes para trabalhos de hibridação, visando a obtenção de clones superiores com caracteres de interesse à agroindústria canavieira. Os marcadores molecuares RAPD e EST's SSR foram igualmente eficientes para estimar a variabilidade genética nos genótipos testados e elaborar os cruzamentos a serem realizados nos programas de melhoramento.Molecular markers of the type RAPD and ESTs SSR were used as tools to evaluate the variability, and estimate the genetic divergence between commercial varieties and sugarcane clones from self-pollination. Twenty-three genotypes from the Program for the Genetic Improvement of Sugarcane of the

  14. De novo transcriptome assembly, development of EST-SSR markers and population genetic analyses for the desert biomass willow, Salix psammophila.

    Science.gov (United States)

    Jia, Huixia; Yang, Haifeng; Sun, Pei; Li, Jianbo; Zhang, Jin; Guo, Yinghua; Han, Xiaojiao; Zhang, Guosheng; Lu, Mengzhu; Hu, Jianjun

    2016-12-20

    Salix psammophila, a sandy shrub known as desert willow, is regarded as a potential biomass feedstock and plays an important role in maintaining local ecosystems. However, a lack of genomic data and efficient molecular markers limit the study of its population evolution and genetic breeding. In this study, chromosome counts, flow cytometry and SSR analyses indicated that S. psammophila is tetraploid. A total of 6,346 EST-SSRs were detected based on 71,458 de novo assembled unigenes from transcriptome data. Twenty-seven EST-SSR markers were developed to evaluate the genetic diversity and population structure of S. psammophila from eight natural populations in Northern China. High levels of genetic diversity (mean 10.63 alleles per locus; mean HE 0.689) were dectected in S. psammophila. The weak population structure and little genetic differentiation (pairwise FST = 0.006-0.016) were found among Population 1-Population 7 (Pop1-Pop7; Inner Mongolia and Shaanxi), but Pop8 (Ningxia) was clearly separated from Pop1-Pop7 and moderate differentiation (pairwise FST = 0.045-0.055) was detected between them, which may be influenced by local habitat conditions. Molecular variance analyses indicated that most of the genetic variation (94.27%) existed within populations. These results provide valuable genetic informations for natural resource conservation and breeding programme optimisation of S. psammophila.

  15. Cross-Species, Amplifiable EST-SSR Markers for Amentotaxus Species Obtained by Next-Generation Sequencing

    Directory of Open Access Journals (Sweden)

    Chiuan-Yu Li

    2016-01-01

    Full Text Available Amentotaxus, a genus of Taxaceae, is an ancient lineage with six relic and endangered species. Four Amentotaxus species, namely A. argotaenia, A. formosana, A. yunnanensis, and A. poilanei, are considered a species complex because of their morphological similarities. Small populations of these species are allopatrically distributed in Asian forests. However, only a few codominant markers have been developed and applied to study population genetic structure of these endangered species. In this study, we developed and characterized polymorphic expressed sequence tag-simple sequence repeats (EST-SSRs from the transcriptome of A. formosana. We identified 4955 putative EST-SSRs from 68,281 unigenes as potential molecular markers. Twenty-six EST-SSRs were selected for estimating polymorphism and transferability among Amentotaxus species, of which 23 EST-SSRs were polymorphic within Amentotaxus species. Among these, the number of alleles ranged from 1–4, the polymorphism information content ranged from 0.000–0.692, and the observed and expected heterozygosity were 0.000–1.000 and 0.080–0.740, respectively. Population genetic structure analyses confirmed that A. argotaenia and A. formosana were separate species and A. yunnanensis and A. poilanei were the same species. These novel EST-SSRs can facilitate further population genetic structure research of Amentotaxus species.

  16. A combined functional and structural genomics approach identified an EST-SSR marker with complete linkage to the Ligon lintless-2 genetic locus in cotton (Gossypium hirsutum L.

    Directory of Open Access Journals (Sweden)

    Tang Yuhong

    2011-09-01

    Full Text Available Abstract Background Cotton fiber length is an important quality attribute to the textile industry and longer fibers can be more efficiently spun into yarns to produce superior fabrics. There is typically a negative correlation between yield and fiber quality traits such as length. An understanding of the regulatory mechanisms controlling fiber length can potentially provide a valuable tool for cotton breeders to improve fiber length while maintaining high yields. The cotton (Gossypium hirsutum L. fiber mutation Ligon lintless-2 is controlled by a single dominant gene (Li2 that results in significantly shorter fibers than a wild-type. In a near-isogenic state with a wild-type cotton line, Li2 is a model system with which to study fiber elongation. Results Two near-isogenic lines of Ligon lintless-2 (Li2 cotton, one mutant and one wild-type, were developed through five generations of backcrosses (BC5. An F2 population was developed from a cross between the two Li2 near-isogenic lines and used to develop a linkage map of the Li2 locus on chromosome 18. Five simple sequence repeat (SSR markers were closely mapped around the Li2 locus region with two of the markers flanking the Li2 locus at 0.87 and 0.52 centimorgan. No apparent differences in fiber initiation and early fiber elongation were observed between the mutant ovules and the wild-type ones. Gene expression profiling using microarrays suggested roles of reactive oxygen species (ROS homeostasis and cytokinin regulation in the Li2 mutant phenotype. Microarray gene expression data led to successful identification of an EST-SSR marker (NAU3991 that displayed complete linkage to the Li2 locus. Conclusions In the field of cotton genomics, we report the first successful conversion of gene expression data into an SSR marker that is associated with a genomic region harboring a gene responsible for a fiber trait. The EST-derived SSR marker NAU3991 displayed complete linkage to the Li2 locus on

  17. MannDB: A microbial annotation database for protein characterization

    Energy Technology Data Exchange (ETDEWEB)

    Zhou, C; Lam, M; Smith, J; Zemla, A; Dyer, M; Kuczmarski, T; Vitalis, E; Slezak, T

    2006-05-19

    MannDB was created to meet a need for rapid, comprehensive automated protein sequence analyses to support selection of proteins suitable as targets for driving the development of reagents for pathogen or protein toxin detection. Because a large number of open-source tools were needed, it was necessary to produce a software system to scale the computations for whole-proteome analysis. Thus, we built a fully automated system for executing software tools and for storage, integration, and display of automated protein sequence analysis and annotation data. MannDB is a relational database that organizes data resulting from fully automated, high-throughput protein-sequence analyses using open-source tools. Types of analyses provided include predictions of cleavage, chemical properties, classification, features, functional assignment, post-translational modifications, motifs, antigenicity, and secondary structure. Proteomes (lists of hypothetical and known proteins) are downloaded and parsed from Genbank and then inserted into MannDB, and annotations from SwissProt are downloaded when identifiers are found in the Genbank entry or when identical sequences are identified. Currently 36 open-source tools are run against MannDB protein sequences either on local systems or by means of batch submission to external servers. In addition, BLAST against protein entries in MvirDB, our database of microbial virulence factors, is performed. A web client browser enables viewing of computational results and downloaded annotations, and a query tool enables structured and free-text search capabilities. When available, links to external databases, including MvirDB, are provided. MannDB contains whole-proteome analyses for at least one representative organism from each category of biological threat organism listed by APHIS, CDC, HHS, NIAID, USDA, USFDA, and WHO. MannDB comprises a large number of genomes and comprehensive protein sequence analyses representing organisms listed as high

  18. Detection of genome donor species of neglected tetraploid crop Vigna reflexo-pilosa (creole bean, and genetic structure of diploid species based on newly developed EST-SSR markers from azuki bean (Vigna angularis.

    Directory of Open Access Journals (Sweden)

    Sompong Chankaew

    Full Text Available Vigna reflexo-pilosa, which includes a neglected crop, is the only one tetraploid species in genus Vigna. The ancestral species that make up this allotetraploid species have not conclusively been identified, although previous studies suggested that a donor genome of V. reflexo-pilosa is V. trinervia. In this study, 1,429 azuki bean EST-SSR markers were developed of which 38 EST-SSR primer pairs that amplified one product in diploid species and two discrete products in tetraploid species were selected to analyze 268 accessions from eight taxa of seven Asian Vigna species including V. reflexo-pilosa var. glabra, V. reflexo-pilosa var. reflexo-pilosa, V. exilis, V. hirtella, V. minima, V. radiata var. sublobata, V. tenuicaulis and V. trinervia to identify genome donor of V. reflexo-pilosa. Since both diploid and tetraploid species were analyzed and each SSR primer pair detected two loci in the tetraploid species, we separated genomes of the tetraploid species into two different diploid types, viz. A and B. In total, 445 alleles were detected by 38 EST-SSR markers. The highest gene diversity was observed in V. hirtella. By assigning the discrete PCR products of V. reflexo-pilosa into two distinguished genomes, we were able to identify the two genome donor parents of créole bean. Phylogenetic and principal coordinate analyses suggested that V. hirtella is a species complex and may be composed of at least three distinct taxa. Both analyses also clearly demonstrated that V. trinervia and one taxon of V. hirtella are the genome donors of V. reflexo-pilosa. Gene diversity indicates that the evolution rate of EST-SSRs on genome B of créole bean might be faster than that on genome A. Species relationship among the Vigna species in relation to genetic data, morphology and geographical distribution are presented.

  19. Detection of genome donor species of neglected tetraploid crop Vigna reflexo-pilosa (créole bean), and genetic structure of diploid species based on newly developed EST-SSR markers from azuki bean (Vigna angularis).

    Science.gov (United States)

    Chankaew, Sompong; Isemura, Takehisa; Isobe, Sachiko; Kaga, Akito; Tomooka, Norihiko; Somta, Prakit; Hirakawa, Hideki; Shirasawa, Kenta; Vaughan, Duncan A; Srinives, Peerasak

    2014-01-01

    Vigna reflexo-pilosa, which includes a neglected crop, is the only one tetraploid species in genus Vigna. The ancestral species that make up this allotetraploid species have not conclusively been identified, although previous studies suggested that a donor genome of V. reflexo-pilosa is V. trinervia. In this study, 1,429 azuki bean EST-SSR markers were developed of which 38 EST-SSR primer pairs that amplified one product in diploid species and two discrete products in tetraploid species were selected to analyze 268 accessions from eight taxa of seven Asian Vigna species including V. reflexo-pilosa var. glabra, V. reflexo-pilosa var. reflexo-pilosa, V. exilis, V. hirtella, V. minima, V. radiata var. sublobata, V. tenuicaulis and V. trinervia to identify genome donor of V. reflexo-pilosa. Since both diploid and tetraploid species were analyzed and each SSR primer pair detected two loci in the tetraploid species, we separated genomes of the tetraploid species into two different diploid types, viz. A and B. In total, 445 alleles were detected by 38 EST-SSR markers. The highest gene diversity was observed in V. hirtella. By assigning the discrete PCR products of V. reflexo-pilosa into two distinguished genomes, we were able to identify the two genome donor parents of créole bean. Phylogenetic and principal coordinate analyses suggested that V. hirtella is a species complex and may be composed of at least three distinct taxa. Both analyses also clearly demonstrated that V. trinervia and one taxon of V. hirtella are the genome donors of V. reflexo-pilosa. Gene diversity indicates that the evolution rate of EST-SSRs on genome B of créole bean might be faster than that on genome A. Species relationship among the Vigna species in relation to genetic data, morphology and geographical distribution are presented.

  20. 木薯基因组SSR和EST-SSR在麻疯树和橡胶树中的通用性分析%Transferability Analysis of Cassava EST-SSR and Genomic-SSR Markers in Jatropha and Rubber Tree

    Institute of Scientific and Technical Information of China (English)

    文明富; 陈新; 王海燕; 卢诚; 王文泉

    2011-01-01

    利用木薯的419对EST-SSR引物和182对基因组SSR引物在5个麻疯树品系和2个橡胶树品系中进行通用性分析.结果显示,木薯EST-SSR在麻疯树和橡胶树中的通用性比例分别为55.85%和38.90%,而木薯基因组SSR在麻疯树和橡胶树中的通用性比例分别为37.36%和26.37%.由此推测,EST-SSR的通用性高于基因组SSR.此外,木薯EST-SSR和基因组SSR的通用件在麻疯树中高于在橡胶树中.本研究发掘的通用性SSR引物可以用于木薯、麻疯树和橡胶树间的比较作网、基因发掘和QTL定位研究.%Euphorbiaceae family includes abundant economic species, such as rubber tree, cassava, castor bean, and Jatropha.Cassava (Manihot esculenta Crantz) ranks in the sixth food crop in the world.In China, cassava is also an important tropical economic crop.The genomic-SSRs derived from cassava genome, and EST-SSRs derived from expressed sequence tags (ESTs).In this study, the transferability of 419 pairs of EST-SSR primer and 182 pairs of genomic-SSR primer from cassava was tested in five Jatropha lines and two rubber tree lines.The results showed that the transferability rate of cassava EST-SSR in Jatropha and rubber tree was 55.85% and 38.90%, and the transferability rate of cassava genomic-SSR in Jatropha and rubber tree was 37.36% and 26.37%, respectively.The transferability EST-SSR was higher for cssava than that of genomic-SSR.Besides, the transferability of cassava EST-SSR and genomic-SSR was higher in Jatropha than in rubber tree.These results suggested that the cassava SSR can be used for comparative mapping, gene tagging and QTL mapping among cassava, Jatropha, and rubber tree.

  1. Transcriptome Sequencing Analysis and Development of EST-SSR Markers for Pinus koraiensis%红松转录组SSR分析及EST-SSR标记开发

    Institute of Scientific and Technical Information of China (English)

    张振; 张含国; 莫迟; 张磊

    2015-01-01

    ) breeding programs,lack of co-dominant genetic markers constrained the development of molecular marker assisted breeding. At present,development of SSR markers based on transcriptome data is still an economic and efficient development strategy of DNA molecular markers. In this study,we used high-throughput sequencing technology to develop EST-SSR markers for Korean pine. Distribution patterns of the markers in the transcriptome sequences and their characteristics were analyzed,in order to provide a basis for analysis of SSR diversity and mutation of Korean pine. [Method]A total of 1 757 SSR sites were identified from 41 476 unigenes in Korean pine transcriptome by using SSR searching program. Statistical analyses were conducted for number,distribution and characteristics of the SSR loci. And 101 pairs of SSR primers were designed and synthesized. Agarose electrophoresis was used for initial check and polyacrylamide gel electrophoresis for separation and detection of the polymorphisms of primers. Then amplification products were collected and sequenced for validation. Finally,16 pairs of SSR primers and 6 pairs of fluorescence primers were identified. To study the genetic variation,53 samples of open-pollinated progeny were collected from four seed orchards respectively in Hegang,Linkou,Tieli and Weihe. [Result]The distribution frequency of EST-SSRs ( ratio of the number of SSRs to the total number of unigenes) was 4. 24%,based on the transcriptome sequences. Mononucleotide dinucleotide and trinucleotide repeats were 46. 90% ,17. 12% and 34. 66% of total SSR, respectively. The SSR repeat number of SSR repeat units was between 5 and 24. Twenty-one pairs of primers showed polymorphism among 101 pairs of primer,which accounting for 20. 8% of the total number of primer pairs. By sequencing validation,16 pairs of primers amplified the target sequence. Eighteen alleles were tested from 6 pairs of fluorescence primers. Polymorphic information content ( PIC) was 0. 036 3 - 0. 667 4

  2. Mining and transferability analysis of EST-SSR primers in Kiwifruit (Actinidia spp.)%猕猴桃EST-SSR引物筛选及通用性分析

    Institute of Scientific and Technical Information of China (English)

    廖娇; 黄春辉; 辜青青; 曲雪艳; 徐小彪

    2011-01-01

    ESTs of Actinidia in the NCBI database were downloaded and screened by SSRHunter software, the EST-SSR primers were designed by Primer 5.0, and their transferabilities were analyzed in some Citrus plants. The 97 EST-SSR primers which were deigned were mined by using genomic DNA of Actinidia lijiangensis. The results indicated that the 77 pairs of primer showed amplification, accounting for 79.38%. Twenty pairs of primer selected were detected to PCR for DNAs from 10 A ctinidia varieties, the 17 primer pairs of the 20 showed amplification and polymorphism, accounting for 85%. The transferability of 20 pairs of EST-SSR primers which were randomly selected was explored in 9 Citrus germplasms (Okit-su, Kumquat, Trifoliate Orange, Ponkan, Sour Pummelo, Sweet Orange, Huyou, Owari, Miyagawa). The results showed that the 11 primer pairs of the 20 tested primers had the amplification, accounting for 55%. And the 8 primer pairs showed polymorphism, accounting for 72.7%. The results revealed that the EST-SSR markers in A ctinidia were transferable in some Citrus germplasms.%应用SSRHunter软件对NCBI公共数据库中的猕猴桃EST序列进行筛查,利用Primer5.0软件设计引物并进行筛选,同时对其在部分柑橘类植物的通用性进行分析。以漓江猕猴桃DNA为模板,对所设计的97对EST-SSR引物进行筛选,结果表明其中77对引物能扩增出清晰条带,有效扩增率为79-38%。随机选取20对引物对10份猕猴桃资源进行检测,结果发现有17对引物对所有材料都有扩增产物并呈现出多态性,多态性扩增率为85%。随机应用20对有效EST-SSR引物对兴津、金柑、枳壳、桠柑、酸柚、甜橙、胡柚、尾张、宫川等柑橘类植物基因组DNA进行扩增,结果表明,其中有11对引物在供试材料中有扩增产物,占引物总数的55%:有8对引物的扩增产物具有多态性,扩增率为72.7%。据此,基于猕猴桃EST序列而筛选的SSR

  3. Assessment of Functional EST-SSR Markers (Sugarcane in Cross-Species Transferability, Genetic Diversity among Poaceae Plants, and Bulk Segregation Analysis

    Directory of Open Access Journals (Sweden)

    Shamshad Ul Haq

    2016-01-01

    Full Text Available Expressed sequence tags (ESTs are important resource for gene discovery, gene expression and its regulation, molecular marker development, and comparative genomics. We procured 10000 ESTs and analyzed 267 EST-SSRs markers through computational approach. The average density was one SSR/10.45 kb or 6.4% frequency, wherein trinucleotide repeats (66.74% were the most abundant followed by di- (26.10%, tetra- (4.67%, penta- (1.5%, and hexanucleotide (1.2% repeats. Functional annotations were done and after-effect newly developed 63 EST-SSRs were used for cross transferability, genetic diversity, and bulk segregation analysis (BSA. Out of 63 EST-SSRs, 42 markers were identified owing to their expansion genetics across 20 different plants which amplified 519 alleles at 180 loci with an average of 2.88 alleles/locus and the polymorphic information content (PIC ranged from 0.51 to 0.93 with an average of 0.83. The cross transferability ranged from 25% for wheat to 97.22% for Schlerostachya, with an average of 55.86%, and genetic relationships were established based on diversification among them. Moreover, 10 EST-SSRs were recognized as important markers between bulks of pooled DNA of sugarcane cultivars through BSA. This study highlights the employability of the markers in transferability, genetic diversity in grass species, and distinguished sugarcane bulks.

  4. DEVELOPMENT OF EST-SSR MARKERS LINKED TO ME1,A NEMATODES RESISTANCE GENE IN PEPPER%与辣椒抗根结线虫基因Me1紧密连锁的EST-SSR标记开发

    Institute of Scientific and Technical Information of China (English)

    张宇; 张晓芬; 陈斌; 耿三省; 李焕秀

    2011-01-01

    以含抗根结线虫病基因Me1的PM217与感线虫品种茄门,及其F1、F2作为试验材料,采用南方根结线虫(Meloidogyne spp.)人工接种鉴定,根据鉴定结果,利用分离群体分组分析法(bulked segregantanalysis,BSA)建立抗感池,共筛选到3对(118、141及211)多态性EST-SSR引物在抗感池间存在差异。利用Joinmap3.0软件,结合抗病性鉴定,对F2群体进行SSR分析,研究3个EST-SSR标记位点与根结线虫抗性基因Me1的连锁关系,结果表明,EST-SSR标记141、118及211与Me1的遗传距离分别为6.984、18.684和29.310cM。本研究得到的与辣椒抗根结线虫显性单基因Me1紧密连锁的EST-SSR标记,为Me1基因的标记辅助选择提供了参考。%In this study,we investigated the resistance to Meloidogyne spp with artifical inoculation systems among F2 lines derivered from susceptible variety Qiemen and resistant variety PM217 carried nematodes resistance gene Me1,with artificial Meloidogyne spp.inoculation systems.Based on the phenotype,we constructed a resistance bulk and a susceptible bulks to screen out the linkage markers.Using EST-SSR technology,the combination primers 118,141 and 211 were shown to be linked to Me1.Furthermore,with exponding population,the three markers were mapped to the locus with genetic distances of 6.984,18.684 and 29.310cM,respectively using Joinmap3.0 software.Identification of EST-SSR molecular markers linked to the dominant resistant gene Me1 laid laying the foundation for resistance breeding.

  5. EST-SSR Markers Development from Cucurbita and Their Use in Purity Testing of F1 Hybrid Seed%南瓜属EST-SSR标记的开发及在杂种纯度鉴定中的应用

    Institute of Scientific and Technical Information of China (English)

    张国裕; 张帆; 姜立纲; 翟伟卜; 李海真

    2011-01-01

    Expressed Sequence Tags(ESTs) are a source of simple sequence repeats (SSRs) that can be used to develop molecular markers for genetic studies. The object of the present study is to develop EST-SSR markers from Cucurbita and uses these markers in quickly purity testing of F1 hybrid seed. The availability of 1457 ESTs for cucurbita,which documented in GenBank, provided a unique opportunity to develop EST-SSR markers, and 215 SSRs were identified among these non-redundant EST sequences. These SSRs contained 2 - 6 bp nucleotide motifs including 111 different motif types. The trinucleotide repeat is the dominant type,accounting for 40% with 86 motifs, and the frequency of occurrence of dinucleotide repeat is 28. 84% with 62 motifs. Among these motifs,CT motif (21 ,9. 77% ) was the most common type,and then were TC and AAG motifs,accounting for 6. 05% and 5. 58% , respectively. A total of 215 primer pairs were designed and 43 of them were synthesized and used to determine their usability by amplification in 4 Cucurbita inbred lines. The results show that 28 of them could successfully amplify, accounting for 65. 12% of testing primer pairs. Then these primer pairs were used to test the F, hybrid seed purity in Cucurbita. Seed genetic purity from 13 combinations ranged from 79. 2% to 100. 0% .which were in high accordance with those from field grow-out trials. Taken together, this study reported an effective and feasible approach to develop SSR markers for cucurbita,and demonstrated their usefulness in purity testing of hybrid squash seed.%为利用SSR标记进行快速、准确的南瓜杂交种纯度分析,进行了南瓜EST-SSR标记的开发.从NCBI数据库下载1 457条南瓜属作物EST序列,去除冗余序列后进行SSR位点搜索,共得到215条含有SSR位点的EST序列.这些SSR位点包含111种重复基元,其中二核苷酸(62个,占28.84%)和三核苷酸(86个,占40.0%)重复类型占主导地位;重复基元中出现最多的是CT(21个,占9.77

  6. EST-SSR Based Genetic Diversity Analysis on Salt Tolerant Plants from Six Species in Chenopodiaceae%藜科6种耐盐植物遗传多样性的EST-SSR分析

    Institute of Scientific and Technical Information of China (English)

    徐照龙; 易金鑫; 余桂红; 张大勇; 何晓兰; 王秀娥; 马鸿翔

    2011-01-01

    利用EST-SSR标记分析了藜科6种耐盐植物的遗传基础和遗传多样性,以期为藜科耐盐植物遗传育种提供快速、可靠的分子标记辅助选择工具.采用31对藜科海蓬子属和碱蓬属的EST-SSR引物对藜科6种植物进行PCR扩增,其中16对引物得到较好扩增,引物通用率为51.6%,共检测到18个多态性位点,每位点等位基因数2~4个,多态性丰富.进一步采用Nei's遗传距离聚类分析表明6种植物可以分为3组,主成分分析也支持上述分组,而且DY529957、DY529903和DY5298853个EST在分组中贡献率最高.经与GenBank中序列相似性比对,前两者分别编码生长素抑制蛋白(Auxin-repressed protein,ARP)和植物防御素(Defensins,Def),都参与植物逆境胁迫响应,但分属于不同代谢途径;后者则编码未知蛋白.总体而言,16对SSR引物在藜科6种植物间具有较好的通用性,能够揭示该6种植物间广泛的遗传多样性,及其存在不同耐盐机制提供分子证据.%This report focus on EST-SSR based evaluation of genetic diversity in salt tolerant plant from six species in Chenopodiaceae. Thirty-one pairs of EST-SSR primers were designed according to ESTs sequence collected from Salicornia and Suaeda genera. Only sixteen out of all primer pairs successfully amplified the DNA fragments by using PCR procedure across all samples, which demonstrated 51.6% over all primers was transferrable. Total 18 polymorphic loci were detected by the 16 primer pairs, and allele number at each locus ranged from 2 to 4, indicating a wide range of genetic diversity. Clusterring analysis based on Nei's genetic distance showed that the six plants could be grouped into three clades, and the division was confirmed by principal component analysis. Moreover, this grouping profile was mainly attributed to polymorphism of three ESTs, e. g. DY529957, DY529903 and DY529885.According to the sequence similarity, the three ESTs were assumed to encode an auxin

  7. Discovery and Characterization of Chromatin States for Systematic Annotation of the Human Genome

    Science.gov (United States)

    Ernst, Jason; Kellis, Manolis

    A plethora of epigenetic modifications have been described in the human genome and shown to play diverse roles in gene regulation, cellular differentiation and the onset of disease. Although individual modifications have been linked to the activity levels of various genetic functional elements, their combinatorial patterns are still unresolved and their potential for systematic de novo genome annotation remains untapped. Here, we use a multivariate Hidden Markov Model to reveal chromatin states in human T cells, based on recurrent and spatially coherent combinations of chromatin marks.We define 51 distinct chromatin states, including promoter-associated, transcription-associated, active intergenic, largescale repressed and repeat-associated states. Each chromatin state shows specific enrichments in functional annotations, sequence motifs and specific experimentally observed characteristics, suggesting distinct biological roles. This approach provides a complementary functional annotation of the human genome that reveals the genome-wide locations of diverse classes of epigenetic function.

  8. Annotated English

    CERN Document Server

    Hernandez-Orallo, Jose

    2010-01-01

    This document presents Annotated English, a system of diacritical symbols which turns English pronunciation into a precise and unambiguous process. The annotations are defined and located in such a way that the original English text is not altered (not even a letter), thus allowing for a consistent reading and learning of the English language with and without annotations. The annotations are based on a set of general rules that make the frequency of annotations not dramatically high. This makes the reader easily associate annotations with exceptions, and makes it possible to shape, internalise and consolidate some rules for the English language which otherwise are weakened by the enormous amount of exceptions in English pronunciation. The advantages of this annotation system are manifold. Any existing text can be annotated without a significant increase in size. This means that we can get an annotated version of any document or book with the same number of pages and fontsize. Since no letter is affected, the ...

  9. Characterization of Liaoning cashmere goat transcriptome: sequencing, de novo assembly, functional annotation and comparative analysis.

    Directory of Open Access Journals (Sweden)

    Hongliang Liu

    Full Text Available BACKGROUND: Liaoning cashmere goat is a famous goat breed for cashmere wool. In order to increase the transcriptome data and accelerate genetic improvement for this breed, we performed de novo transcriptome sequencing to generate the first expressed sequence tag dataset for the Liaoning cashmere goat, using next-generation sequencing technology. RESULTS: Transcriptome sequencing of Liaoning cashmere goat on a Roche 454 platform yielded 804,601 high-quality reads. Clustering and assembly of these reads produced a non-redundant set of 117,854 unigenes, comprising 13,194 isotigs and 104,660 singletons. Based on similarity searches with known proteins, 17,356 unigenes were assigned to 6,700 GO categories, and the terms were summarized into three main GO categories and 59 sub-categories. 3,548 and 46,778 unigenes had significant similarity to existing sequences in the KEGG and COG databases, respectively. Comparative analysis revealed that 42,254 unigenes were aligned to 17,532 different sequences in NCBI non-redundant nucleotide databases. 97,236 (82.51% unigenes were mapped to the 30 goat chromosomes. 35,551 (30.17% unigenes were matched to 11,438 reported goat protein-coding genes. The remaining non-matched unigenes were further compared with cattle and human reference genes, 67 putative new goat genes were discovered. Additionally, 2,781 potential simple sequence repeats were initially identified from all unigenes. CONCLUSION: The transcriptome of Liaoning cashmere goat was deep sequenced, de novo assembled, and annotated, providing abundant data to better understand the Liaoning cashmere goat transcriptome. The potential simple sequence repeats provide a material basis for future genetic linkage and quantitative trait loci analyses.

  10. Development and Characterization of Microsatellite Markers from the Transcriptome of Firmiana danxiaensis (Malvaceae s.l.

    Directory of Open Access Journals (Sweden)

    Qiang Fan

    2013-11-01

    Full Text Available Premise of the study: Firmiana consists of 12–16 species, many of which are narrow endemics. Expressed sequence tag (EST–simple sequence repeat (SSR markers were developed and characterized for size polymorphism in four Firmiana species. Methods and Results: A total of 102 EST-SSR primer pairs were designed based on the transcriptome sequences of F. danxiaensis; these were then characterized in four Firmiana species—F. danxiaensis, F. kwangsiensis, F. hainanensis, and F. simplex. In these four species, 17 primer pairs were successfully amplified, and 14 were polymorphic in at least one species. The number of alleles ranged from one to 13, and the observed and expected heterozygosities ranged from 0 to 1 and 0 to 0.925, respectively. The lowest level of polymorphism was observed in F. danxiaensis. Conclusions: These polymorphic EST-SSR markers are valuable for conservation genetics studies in the endangered Firmiana species.

  11. Characterization and Development of EST-SSRs by Deep Transcriptome Sequencing in Chinese Cabbage (Brassica rapa L. ssp. pekinensis

    Directory of Open Access Journals (Sweden)

    Qian Ding

    2015-01-01

    Full Text Available Simple sequence repeats (SSRs are among the most important markers for population analysis and have been widely used in plant genetic mapping and molecular breeding. Expressed sequence tag-SSR (EST-SSR markers, located in the coding regions, are potentially more efficient for QTL mapping, gene targeting, and marker-assisted breeding. In this study, we investigated 51,694 nonredundant unigenes, assembled from clean reads from deep transcriptome sequencing with a Solexa/Illumina platform, for identification and development of EST-SSRs in Chinese cabbage. In total, 10,420 EST-SSRs with over 12 bp were identified and characterized, among which 2744 EST-SSRs are new and 2317 are known ones showing polymorphism with previously reported SSRs. A total of 7877 PCR primer pairs for 1561 EST-SSR loci were designed, and primer pairs for twenty-four EST-SSRs were selected for primer evaluation. In nineteen EST-SSR loci (79.2%, amplicons were successfully generated with high quality. Seventeen (89.5% showed polymorphism in twenty-four cultivars of Chinese cabbage. The polymorphic alleles of each polymorphic locus were sequenced, and the results showed that most polymorphisms were due to variations of SSR repeat motifs. The EST-SSRs identified and characterized in this study have important implications for developing new tools for genetics and molecular breeding in Chinese cabbage.

  12. 不同来源 SSR 和 EST-SSR 在披碱草属和鹅观草属物种中的通用性分析%The transferability of SSR and EST-SSR markers of different origins in Elymus and Roegneria in the Triticeae (Poaceae)

    Institute of Scientific and Technical Information of China (English)

    陈仕勇; 马啸; 张新全; 陈智华; 周凯

    2016-01-01

    important forage grasses,and also carry elite genes for improving cereal crops.However,there is ongoing dispute on the species boundaries and interspecific systematic relationships in the two genera.The biosystematic relationships in the group are of keen interest to agrostologists around the world.Based on known SSR markers in Triticeae,theobjective of this study was to screen out the high transferability markers for Elymus and Roegneria,which would provide important information for understanding the biosystematic relationships of the two genera.A to-tal of 230 simple sequence repeats (SSR)and SSR based on expressed sequence tags (EST -SSR)markers from 5 different genera including wheat,barley,Elymus ,Pseudoroegneria and Leymus ,were used to study the transferability to 23 species containing the St,H,and Y genomes.Among the 230 SSR markers,163 (70.87%)markers could generate clear bands,which showed a high transferability for those markers.The EST-SSR markers (87.60%)showed a higher transferability than genomic SSR markers (49.50%),but the genomic SSR markers showed a higher polymorphism (85.98%)than the EST-SSR markers (79.37%).A total of 579 bands were amplified from the 163 SSR markers able to generate clear bands,of which 533 bands were polymorphic.The number of amplified bands of each of these 163 SSR markers ranged from 1 to 11 with an average of 3.55 bands.Cluster analysis using the unweighted pair group method with arithmetic average (UPGMA)showed that the species with same or similar genome could be grouped together.Additionally,the H genome showed a rather distant phylogenetic relationship with St genome,and distant phylogenetic relation-ships were also revealed between the diploid species containing the H genome and other species in the study. The selected SSR markers from 5 genera could be amplified successfully in Elymus and Roegneria.To sum-marise,a total of 163 high transferability SSR markers were found to be suitable for the further phylogeny a

  13. 松属GAs相关EST-SSR标记与火炬松×洪都拉斯加勒比松苗高相关性分析%Analysis on correlation of EST-SSR markers relating to GAs metabolism for Pinus L. with the seedling height of Pinus taeda x P. caribaea var. hondurensis

    Institute of Scientific and Technical Information of China (English)

    栾启福; 姜景民; 张守攻

    2011-01-01

    以火炬松×洪都拉斯加勒比松F1代群体为研究对象,从松树PGI(松类基因索弓1)数据中筛选出13个与赤霉素(GAs)代谢有关的序列.设计了这13条序列的EST-SSR引物对,并筛选出4对引物作为F.代检测的较好的标记.4对引物PCR分析显示在2个亲本和39个子代中共扩增出1 014个多态性位点,其中,杂种F1代扩增出的位点数中有50.19%与父本相同,52.17%与母本相同,这表明母本(火炬松)和父本(加勒比松)杂交能够得到获得双亲遗传物质的新杂种.4个引物检测的26个等位基因位点中有6个与苗龄6个月的苗高有显著或极显著的相关性,有5个位点与苗龄9个月的苗高有显著或极显著相关性.这为早期选择提供了较好的分子标记.%Thirteen EST-SSR molecular markers related to GAs metabolism for Pinus L. were developed from EST sequences in PGI( Pings gene index)database. 1 014 alleles were amplified with two parents and 39 F1 progeny of Pings taeda × P. caribaea var. hondurensis with four primer-pairs, which were selected from the thirteen EST-SSRs. 50. 19 % of the 1 014 alleles were originated from the male parent(P, taeda) and 52. 17 % were shared with the female parent (P. caribaea var. hondurensis), which showed that the progeny of this hybridization inherent genetic materials from both parents. Six locus selected from all 26 locus amplified by four primer pairs were related significantly to six-month old seedling height of the 39 F1 hybrids, and five locus related significantly to nine-month old seedling height. Therefore,this markets can serve as good molecular markers for early selection.

  14. Maize microarray annotation database

    Directory of Open Access Journals (Sweden)

    Berger Dave K

    2011-10-01

    Full Text Available Abstract Background Microarray technology has matured over the past fifteen years into a cost-effective solution with established data analysis protocols for global gene expression profiling. The Agilent-016047 maize 44 K microarray was custom-designed from EST sequences, but only reporter sequences with EST accession numbers are publicly available. The following information is lacking: (a reporter - gene model match, (b number of reporters per gene model, (c potential for cross hybridization, (d sense/antisense orientation of reporters, (e position of reporter on B73 genome sequence (for eQTL studies, and (f functional annotations of genes represented by reporters. To address this, we developed a strategy to annotate the Agilent-016047 maize microarray, and built a publicly accessible annotation database. Description Genomic annotation of the 42,034 reporters on the Agilent-016047 maize microarray was based on BLASTN results of the 60-mer reporter sequences and their corresponding ESTs against the maize B73 RefGen v2 "Working Gene Set" (WGS predicted transcripts and the genome sequence. The agreement between the EST, WGS transcript and gDNA BLASTN results were used to assign the reporters into six genomic annotation groups. These annotation groups were: (i "annotation by sense gene model" (23,668 reporters, (ii "annotation by antisense gene model" (4,330; (iii "annotation by gDNA" without a WGS transcript hit (1,549; (iv "annotation by EST", in which case the EST from which the reporter was designed, but not the reporter itself, has a WGS transcript hit (3,390; (v "ambiguous annotation" (2,608; and (vi "inconclusive annotation" (6,489. Functional annotations of reporters were obtained by BLASTX and Blast2GO analysis of corresponding WGS transcripts against GenBank. The annotations are available in the Maize Microarray Annotation Database http://MaizeArrayAnnot.bi.up.ac.za/, as well as through a GBrowse annotation file that can be uploaded to

  15. Ubiquitous Annotation Systems

    DEFF Research Database (Denmark)

    Hansen, Frank Allan

    2006-01-01

    Ubiquitous annotation systems allow users to annotate physical places, objects, and persons with digital information. Especially in the field of location based information systems much work has been done to implement adaptive and context-aware systems, but few efforts have focused on the general...... requirements for linking information to objects in both physical and digital space. This paper surveys annotation techniques from open hypermedia systems, Web based annotation systems, and mobile and augmented reality systems to illustrate different approaches to four central challenges ubiquitous annotation...... systems have to deal with: anchoring, structuring, presentation, and authoring. Through a number of examples each challenge is discussed and HyCon, a context-aware hypermedia framework developed at the University of Aarhus, Denmark, is used to illustrate an integrated approach to ubiquitous annotations...

  16. De novo assembly, annotation, and characterization of the whole brain transcriptome of male and female Syrian hamsters

    Science.gov (United States)

    McCann, Katharine E.; Sinkiewicz, David M.; Norvelle, Alisa; Huhman, Kim L.

    2017-01-01

    Hamsters are an ideal animal model for a variety of biomedical research areas such as cancer, virology, circadian rhythms, and behavioural neuroscience. The use of hamsters has declined, however, most likely due to the dearth of genetic tools available for these animals. Our laboratory uses hamsters to study acute social stress, and we are beginning to investigate the genetic mechanisms subserving defeat-induced behavioural change. We have been limited, however, by the lack of genetic resources available for hamsters. In this study, we sequenced the brain transcriptome of male and female Syrian hamsters to generate the necessary resources to continue our research. We completed a de novo assembly and after assembly optimization, there were 113,329 transcripts representing 14,530 unique genes. This study is the first to characterize transcript expression in both female and male hamster brains and offers invaluable information to promote understanding of a host of important biomedical research questions for which hamsters are an excellent model. PMID:28071753

  17. Annotating Coloured Petri Nets

    DEFF Research Database (Denmark)

    Lindstrøm, Bo; Wells, Lisa Marie

    2002-01-01

    Coloured Petri nets (CP-nets) can be used for several fundamentally different purposes like functional analysis, performance analysis, and visualisation. To be able to use the corresponding tool extensions and libraries it is sometimes necessary to include extra auxiliary information in the CP-ne...... a certain use of the CP-net. We define the semantics of annotations by describing a translation from a CP-net and the corresponding annotation layers to another CP-net where the annotations are an integrated part of the CP-net....... a method which makes it possible to associate auxiliary information, called annotations, with tokens without modifying the colour sets of the CP-net. Annotations are pieces of information that are not essential for determining the behaviour of the system being modelled, but are rather added to support...

  18. Personnalisation de Syst\\`emes OLAP Annot\\'es

    CERN Document Server

    Jerbi, Houssem; Ravat, Franck; Teste, Olivier

    2010-01-01

    This paper deals with personalization of annotated OLAP systems. Data constellation is extended to support annotations and user preferences. Annotations reflect the decision-maker experience whereas user preferences enable users to focus on the most interesting data. User preferences allow annotated contextual recommendations helping the decision-maker during his/her multidimensional navigations.

  19. An Introduction to Genome Annotation.

    Science.gov (United States)

    Campbell, Michael S; Yandell, Mark

    2015-12-17

    Genome projects have evolved from large international undertakings to tractable endeavors for a single lab. Accurate genome annotation is critical for successful genomic, genetic, and molecular biology experiments. These annotations can be generated using a number of approaches and available software tools. This unit describes methods for genome annotation and a number of software tools commonly used in gene annotation.

  20. Semantic annotation of mutable data.

    Science.gov (United States)

    Morris, Robert A; Dou, Lei; Hanken, James; Kelly, Maureen; Lowery, David B; Ludäscher, Bertram; Macklin, James A; Morris, Paul J

    2013-01-01

    Electronic annotation of scientific data is very similar to annotation of documents. Both types of annotation amplify the original object, add related knowledge to it, and dispute or support assertions in it. In each case, annotation is a framework for discourse about the original object, and, in each case, an annotation needs to clearly identify its scope and its own terminology. However, electronic annotation of data differs from annotation of documents: the content of the annotations, including expectations and supporting evidence, is more often shared among members of networks. Any consequent actions taken by the holders of the annotated data could be shared as well. But even those current annotation systems that admit data as their subject often make it difficult or impossible to annotate at fine-enough granularity to use the results in this way for data quality control. We address these kinds of issues by offering simple extensions to an existing annotation ontology and describe how the results support an interest-based distribution of annotations. We are using the result to design and deploy a platform that supports annotation services overlaid on networks of distributed data, with particular application to data quality control. Our initial instance supports a set of natural science collection metadata services. An important application is the support for data quality control and provision of missing data. A previous proof of concept demonstrated such use based on data annotations modeled with XML-Schema.

  1. Semantic annotation of mutable data.

    Directory of Open Access Journals (Sweden)

    Robert A Morris

    Full Text Available Electronic annotation of scientific data is very similar to annotation of documents. Both types of annotation amplify the original object, add related knowledge to it, and dispute or support assertions in it. In each case, annotation is a framework for discourse about the original object, and, in each case, an annotation needs to clearly identify its scope and its own terminology. However, electronic annotation of data differs from annotation of documents: the content of the annotations, including expectations and supporting evidence, is more often shared among members of networks. Any consequent actions taken by the holders of the annotated data could be shared as well. But even those current annotation systems that admit data as their subject often make it difficult or impossible to annotate at fine-enough granularity to use the results in this way for data quality control. We address these kinds of issues by offering simple extensions to an existing annotation ontology and describe how the results support an interest-based distribution of annotations. We are using the result to design and deploy a platform that supports annotation services overlaid on networks of distributed data, with particular application to data quality control. Our initial instance supports a set of natural science collection metadata services. An important application is the support for data quality control and provision of missing data. A previous proof of concept demonstrated such use based on data annotations modeled with XML-Schema.

  2. NCBI和cDNA文库中栽培花生EST-SSR分子标记的开发及其特点%Development and Characterization of EST-SSR Markers from NCBI and cDNA Library in Cultivated Peanut (Arachis hypogaea L.)

    Institute of Scientific and Technical Information of China (English)

    王金彦; 潘丽娟; 杨庆利; 禹山林

    2009-01-01

    86 132 ESTs downloaded from GenBank in NCBI and 12 501 ESTs from cDNA library constructed by high-oil linoleic acid accession E 12 were analysed. After the preprocession, there were 18 051 singletons and 9 972 contigs in the GenBank of NCBI and cDNA library. Totally 3 104 SSR loci had been screened by MISA software, accounting for 11.08% for these non-redundant ESTs. All SSR loci are divided into di-nucleotide, thi-nucleotide, tetra-nucleotide, penta-nucleotide, hexa-nucleotide and multi-nucleotide etc., and thi-nucleotide motif is the most motifs and the frequency was 43.0% and 56.8% in NCBI and cDNA libraray, respectively. The number of di-and penta-nucleotide motifs were second and third in all motifs. And the hexa-nucleotide was the least mo-tif both in NCBI and cDNA library. In all repeat motifs nucleotide, AG/TC was the most motifs and accounted for 8.65% and 13.42% in NCBI and cDNA library, respectively. Among the tri-nucleotide repeats, CTT/GAA was the most frequent motif, accounting for 6.7% and 13.42%, respectively. The repeat unit number of SSR loci is from 4 to 51.%本研究利用NCBI的GenBank数据库中公布的花生86 132条EST序列以及利用高油酸品种E12所创建的cDNA文库中的12 501条EST序列,对这些序列进行前期处理,总共获得非冗余且拼接较长的singleton 11 260条,contig 9 972条.通过MISA软件分析发现两个EST库中共包含有3 104个SSR位点,占到总共非冗余序列的11.08%.这些SSR位点被分成二核苷酸重复、三核苷酸重复、四核苷酸重复、五核苷酸重复、六核苷酸重复以及混合核苷酸重复等,其中三核苷酸重复占的比例最多,分别占到NCBI和cDNA文库的43.0%和56.8%,二核苷酸和五核苷酸重复占到所有重复位点的第二位和第三位,六核苷酸重复的比例最少.在所有重复基序中,AG/TC重复的数量最多,分别占到NCBI和cDNA文库的8.65%和13.42%.在三核苷酸重复中,CTT/GAA出现的频率最大,分别占到6.7%和13.42%.所有这些SSR基序的长度在4~51个之间.

  3. Human Genome Annotation

    Science.gov (United States)

    Gerstein, Mark

    A central problem for 21st century science is annotating the human genome and making this annotation useful for the interpretation of personal genomes. My talk will focus on annotating the 99% of the genome that does not code for canonical genes, concentrating on intergenic features such as structural variants (SVs), pseudogenes (protein fossils), binding sites, and novel transcribed RNAs (ncRNAs). In particular, I will describe how we identify regulatory sites and variable blocks (SVs) based on processing next-generation sequencing experiments. I will further explain how we cluster together groups of sites to create larger annotations. Next, I will discuss a comprehensive pseudogene identification pipeline, which has enabled us to identify >10K pseudogenes in the genome and analyze their distribution with respect to age, protein family, and chromosomal location. Throughout, I will try to introduce some of the computational algorithms and approaches that are required for genome annotation. Much of this work has been carried out in the framework of the ENCODE, modENCODE, and 1000 genomes projects.

  4. Algal functional annotation tool

    Energy Technology Data Exchange (ETDEWEB)

    Lopez, D. [UCLA; Casero, D. [UCLA; Cokus, S. J. [UCLA; Merchant, S. S. [UCLA; Pellegrini, M. [UCLA

    2012-07-01

    The Algal Functional Annotation Tool is a web-based comprehensive analysis suite integrating annotation data from several pathway, ontology, and protein family databases. The current version provides annotation for the model alga Chlamydomonas reinhardtii, and in the future will include additional genomes. The site allows users to interpret large gene lists by identifying associated functional terms, and their enrichment. Additionally, expression data for several experimental conditions were compiled and analyzed to provide an expression-based enrichment search. A tool to search for functionally-related genes based on gene expression across these conditions is also provided. Other features include dynamic visualization of genes on KEGG pathway maps and batch gene identifier conversion.

  5. Improving microbial genome annotations in an integrated database context.

    Directory of Open Access Journals (Sweden)

    I-Min A Chen

    Full Text Available Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/.

  6. Cheating. An Annotated Bibliography.

    Science.gov (United States)

    Wildemuth, Barbara M., Comp.

    This 89-item, annotated bibliography was compiled to provide access to research and discussions of cheating and, specifically, cheating on tests. It is not limited to any educational level, nor is it confined to any specific curriculum area. Two data bases were searched by computer, and a library search was conducted. A computer search of the…

  7. Annotated bibliography traceability

    NARCIS (Netherlands)

    Narain, G.

    2006-01-01

    This annotated bibliography contains summaries of articles and chapters of books, which are relevant to traceability. After each summary there is a part about the relevancy of the paper for the LEI project. The aim of the LEI-project is to gain insight in several aspects of traceability in order to

  8. Annotation of Regular Polysemy

    DEFF Research Database (Denmark)

    Martinez Alonso, Hector

    Regular polysemy has received a lot of attention from the theory of lexical semantics and from computational linguistics. However, there is no consensus on how to represent the sense of underspecified examples at the token level, namely when annotating or disambiguating senses of metonymic words...

  9. Collaborative Movie Annotation

    Science.gov (United States)

    Zad, Damon Daylamani; Agius, Harry

    In this paper, we focus on metadata for self-created movies like those found on YouTube and Google Video, the duration of which are increasing in line with falling upload restrictions. While simple tags may have been sufficient for most purposes for traditionally very short video footage that contains a relatively small amount of semantic content, this is not the case for movies of longer duration which embody more intricate semantics. Creating metadata is a time-consuming process that takes a great deal of individual effort; however, this effort can be greatly reduced by harnessing the power of Web 2.0 communities to create, update and maintain it. Consequently, we consider the annotation of movies within Web 2.0 environments, such that users create and share that metadata collaboratively and propose an architecture for collaborative movie annotation. This architecture arises from the results of an empirical experiment where metadata creation tools, YouTube and an MPEG-7 modelling tool, were used by users to create movie metadata. The next section discusses related work in the areas of collaborative retrieval and tagging. Then, we describe the experiments that were undertaken on a sample of 50 users. Next, the results are presented which provide some insight into how users interact with existing tools and systems for annotating movies. Based on these results, the paper then develops an architecture for collaborative movie annotation.

  10. Annotated Bibliography. First Edition.

    Science.gov (United States)

    Haring, Norris G.

    An annotated bibliography which presents approximately 300 references from 1951 to 1973 on the education of severely/profoundly handicapped persons. Citations are grouped alphabetically by author's name within the following categories: characteristics and treatment, gross motor development, sensory and motor development, physical therapy for the…

  11. Annotation: The Savant Syndrome

    Science.gov (United States)

    Heaton, Pamela; Wallace, Gregory L.

    2004-01-01

    Background: Whilst interest has focused on the origin and nature of the savant syndrome for over a century, it is only within the past two decades that empirical group studies have been carried out. Methods: The following annotation briefly reviews relevant research and also attempts to address outstanding issues in this research area.…

  12. Annotation of Ehux ESTs

    Energy Technology Data Exchange (ETDEWEB)

    Kuo, Alan; Grigoriev, Igor

    2009-06-12

    22 percent ESTs do no align with scaffolds. EST Pipeleine assembles 17126 consensi from the noaligned ESTs. Annotation Pipeline predicts 8564 ORFS on the consensi. Domain analysis of ORFs reveals missing genes. Cluster analysis reveals missing genes. Expression analysis reveals potential strain specific genes.

  13. Automating Ontological Annotation with WordNet

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Tratz, Stephen C.; Gregory, Michelle L.; Chappell, Alan R.; Whitney, Paul D.; Posse, Christian; Paulson, Patrick R.; Baddeley, Bob L.; Hohimer, Ryan E.; White, Amanda M.

    2006-01-22

    Semantic Web applications require robust and accurate annotation tools that are capable of automating the assignment of ontological classes to words in naturally occurring text (ontological annotation). Most current ontologies do not include rich lexical databases and are therefore not easily integrated with word sense disambiguation algorithms that are needed to automate ontological annotation. WordNet provides a potentially ideal solution to this problem as it offers a highly structured lexical conceptual representation that has been extensively used to develop word sense disambiguation algorithms. However, WordNet has not been designed as an ontology, and while it can be easily turned into one, the result of doing this would present users with serious practical limitations due to the great number of concepts (synonym sets) it contains. Moreover, mapping WordNet to an existing ontology may be difficult and requires substantial labor. We propose to overcome these limitations by developing an analytical platform that (1) provides a WordNet-based ontology offering a manageable and yet comprehensive set of concept classes, (2) leverages the lexical richness of WordNet to give an extensive characterization of concept class in terms of lexical instances, and (3) integrates a class recognition algorithm that automates the assignment of concept classes to words in naturally occurring text. The ensuing framework makes available an ontological annotation platform that can be effectively integrated with intelligence analysis systems to facilitate evidence marshaling and sustain the creation and validation of inference models.

  14. Ontological Annotation with WordNet

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Tratz, Stephen C.; Gregory, Michelle L.; Chappell, Alan R.; Whitney, Paul D.; Posse, Christian; Paulson, Patrick R.; Baddeley, Bob; Hohimer, Ryan E.; White, Amanda M.

    2006-06-06

    Semantic Web applications require robust and accurate annotation tools that are capable of automating the assignment of ontological classes to words in naturally occurring text (ontological annotation). Most current ontologies do not include rich lexical databases and are therefore not easily integrated with word sense disambiguation algorithms that are needed to automate ontological annotation. WordNet provides a potentially ideal solution to this problem as it offers a highly structured lexical conceptual representation that has been extensively used to develop word sense disambiguation algorithms. However, WordNet has not been designed as an ontology, and while it can be easily turned into one, the result of doing this would present users with serious practical limitations due to the great number of concepts (synonym sets) it contains. Moreover, mapping WordNet to an existing ontology may be difficult and requires substantial labor. We propose to overcome these limitations by developing an analytical platform that (1) provides a WordNet-based ontology offering a manageable and yet comprehensive set of concept classes, (2) leverages the lexical richness of WordNet to give an extensive characterization of concept class in terms of lexical instances, and (3) integrates a class recognition algorithm that automates the assignment of concept classes to words in naturally occurring text. The ensuing framework makes available an ontological annotation platform that can be effectively integrated with intelligence analysis systems to facilitate evidence marshaling and sustain the creation and validation of inference models.

  15. Expressed Sequence Tags Analysis and Design of Simple Sequence Repeats Markers from a Full-Length cDNA Library in Perilla frutescens (L.

    Directory of Open Access Journals (Sweden)

    Eun Soo Seong

    2015-01-01

    Full Text Available Perilla frutescens is valuable as a medicinal plant as well as a natural medicine and functional food. However, comparative genomics analyses of P. frutescens are limited due to a lack of gene annotations and characterization. A full-length cDNA library from P. frutescens leaves was constructed to identify functional gene clusters and probable EST-SSR markers via analysis of 1,056 expressed sequence tags. Unigene assembly was performed using basic local alignment search tool (BLAST homology searches and annotated Gene Ontology (GO. A total of 18 simple sequence repeats (SSRs were designed as primer pairs. This study is the first to report comparative genomics and EST-SSR markers from P. frutescens will help gene discovery and provide an important source for functional genomics and molecular genetic research in this interesting medicinal plant.

  16. Mesotext. Framing and exploring annotations

    NARCIS (Netherlands)

    Boot, P.; Boot, P.; Stronks, E.

    2007-01-01

    From the introduction: Annotation is an important item on the wish list for digital scholarly tools. It is one of John Unsworth’s primitives of scholarship (Unsworth 2000). Especially in linguistics,a number of tools have been developed that facilitate the creation of annotations to source material

  17. Collective dynamics of social annotation

    CERN Document Server

    Cattuto, Ciro; Baldassarri, Andrea; Schehr, G; Loreto, Vittorio

    2009-01-01

    The enormous increase of popularity and use of the WWW has led in the recent years to important changes in the ways people communicate. An interesting example of this fact is provided by the now very popular social annotation systems, through which users annotate resources (such as web pages or digital photographs) with text keywords dubbed tags. Understanding the rich emerging structures resulting from the uncoordinated actions of users calls for an interdisciplinary effort. In particular concepts borrowed from statistical physics, such as random walks, and the complex networks framework, can effectively contribute to the mathematical modeling of social annotation systems. Here we show that the process of social annotation can be seen as a collective but uncoordinated exploration of an underlying semantic space, pictured as a graph, through a series of random walks. This modeling framework reproduces several aspects, so far unexplained, of social annotation, among which the peculiar growth of the size of the...

  18. Sequence alignment status and amplicon size difference affecting EST-SSR primer performance and polymorphism

    Science.gov (United States)

    Little attention has been given to failed, poorly-performing, and non-polymorphic expressed sequence tag (EST) simple sequence repeat (SSR) primers. This is due in part to a lack of interest and value in reporting them but also because of the difficulty in addressing the causes of failure on a prime...

  19. Sentiment Analysis of Document Based on Annotation

    CERN Document Server

    Shukla, Archana

    2011-01-01

    I present a tool which tells the quality of document or its usefulness based on annotations. Annotation may include comments, notes, observation, highlights, underline, explanation, question or help etc. comments are used for evaluative purpose while others are used for summarization or for expansion also. Further these comments may be on another annotation. Such annotations are referred as meta-annotation. All annotation may not get equal weightage. My tool considered highlights, underline as well as comments to infer the collective sentiment of annotators. Collective sentiments of annotators are classified as positive, negative, objectivity. My tool computes collective sentiment of annotations in two manners. It counts all the annotation present on the documents as well as it also computes sentiment scores of all annotation which includes comments to obtain the collective sentiments about the document or to judge the quality of document. I demonstrate the use of tool on research paper.

  20. Comparative genomic mapping of the bovine Fragile Histidine Triad (FHIT tumour suppressor gene: characterization of a 2 Mb BAC contig covering the locus, complete annotation of the gene, analysis of cDNA and of physiological expression profiles

    Directory of Open Access Journals (Sweden)

    Boussaha Mekki

    2006-05-01

    Full Text Available Abstract Background The Fragile Histidine Triad gene (FHIT is an oncosuppressor implicated in many human cancers, including vesical tumors. FHIT is frequently hit by deletions caused by fragility at FRA3B, the most active of human common fragile sites, where FHIT lays. Vesical tumors affect also cattle, including animals grazing in the wild on bracken fern; compounds released by the fern are known to induce chromosome fragility and may trigger cancer with the interplay of latent Papilloma virus. Results The bovine FHIT was characterized by assembling a contig of 78 BACs. Sequence tags were designed on human exons and introns and used directly to select bovine BACs, or compared with sequence data in the bovine genome database or in the trace archive of the bovine genome sequencing project, and adapted before use. FHIT is split in ten exons like in man, with exons 5 to 9 coding for a 149 amino acids protein. VISTA global alignments between bovine genomic contigs retrieved from the bovine genome database and the human FHIT region were performed. Conservation was extremely high over a 2 Mb region spanning the whole FHIT locus, including the size of introns. Thus, the bovine FHIT covers about 1.6 Mb compared to 1.5 Mb in man. Expression was analyzed by RT-PCR and Northern blot, and was found to be ubiquitous. Four cDNA isoforms were isolated and sequenced, that originate from an alternative usage of three variants of exon 4, revealing a size very close to the major human FHIT cDNAs. Conclusion A comparative genomic approach allowed to assemble a contig of 78 BACs and to completely annotate a 1.6 Mb region spanning the bovine FHIT gene. The findings confirmed the very high level of conservation between human and bovine genomes and the importance of comparative mapping to speed the annotation process of the recently sequenced bovine genome. The detailed knowledge of the genomic FHIT region will allow to study the role of FHIT in bovine cancerogenesis

  1. Automatic Function Annotations for Hoare Logic

    Directory of Open Access Journals (Sweden)

    Daniel Matichuk

    2012-11-01

    Full Text Available In systems verification we are often concerned with multiple, inter-dependent properties that a program must satisfy. To prove that a program satisfies a given property, the correctness of intermediate states of the program must be characterized. However, this intermediate reasoning is not always phrased such that it can be easily re-used in the proofs of subsequent properties. We introduce a function annotation logic that extends Hoare logic in two important ways: (1 when proving that a function satisfies a Hoare triple, intermediate reasoning is automatically stored as function annotations, and (2 these function annotations can be exploited in future Hoare logic proofs. This reduces duplication of reasoning between the proofs of different properties, whilst serving as a drop-in replacement for traditional Hoare logic to avoid the costly process of proof refactoring. We explain how this was implemented in Isabelle/HOL and applied to an experimental branch of the seL4 microkernel to significantly reduce the size and complexity of existing proofs.

  2. Semantic annotation of medical images

    Science.gov (United States)

    Seifert, Sascha; Kelm, Michael; Moeller, Manuel; Mukherjee, Saikat; Cavallaro, Alexander; Huber, Martin; Comaniciu, Dorin

    2010-03-01

    Diagnosis and treatment planning for patients can be significantly improved by comparing with clinical images of other patients with similar anatomical and pathological characteristics. This requires the images to be annotated using common vocabulary from clinical ontologies. Current approaches to such annotation are typically manual, consuming extensive clinician time, and cannot be scaled to large amounts of imaging data in hospitals. On the other hand, automated image analysis while being very scalable do not leverage standardized semantics and thus cannot be used across specific applications. In our work, we describe an automated and context-sensitive workflow based on an image parsing system complemented by an ontology-based context-sensitive annotation tool. An unique characteristic of our framework is that it brings together the diverse paradigms of machine learning based image analysis and ontology based modeling for accurate and scalable semantic image annotation.

  3. Publication Production: An Annotated Bibliography.

    Science.gov (United States)

    Firman, Anthony H.

    1994-01-01

    Offers brief annotations of 52 articles and papers on document production (from the Society for Technical Communication's journal and proceedings) on 9 topics: information processing, document design, using color, typography, tables, illustrations, photography, printing and binding, and production management. (SR)

  4. NCBI prokaryotic genome annotation pipeline.

    Science.gov (United States)

    Tatusova, Tatiana; DiCuccio, Michael; Badretdin, Azat; Chetvernin, Vyacheslav; Nawrocki, Eric P; Zaslavsky, Leonid; Lomsadze, Alexandre; Pruitt, Kim D; Borodovsky, Mark; Ostell, James

    2016-08-19

    Recent technological advances have opened unprecedented opportunities for large-scale sequencing and analysis of populations of pathogenic species in disease outbreaks, as well as for large-scale diversity studies aimed at expanding our knowledge across the whole domain of prokaryotes. To meet the challenge of timely interpretation of structure, function and meaning of this vast genetic information, a comprehensive approach to automatic genome annotation is critically needed. In collaboration with Georgia Tech, NCBI has developed a new approach to genome annotation that combines alignment based methods with methods of predicting protein-coding and RNA genes and other functional elements directly from sequence. A new gene finding tool, GeneMarkS+, uses the combined evidence of protein and RNA placement by homology as an initial map of annotation to generate and modify ab initio gene predictions across the whole genome. Thus, the new NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) relies more on sequence similarity when confident comparative data are available, while it relies more on statistical predictions in the absence of external evidence. The pipeline provides a framework for generation and analysis of annotation on the full breadth of prokaryotic taxonomy. For additional information on PGAP see https://www.ncbi.nlm.nih.gov/genome/annotation_prok/ and the NCBI Handbook, https://www.ncbi.nlm.nih.gov/books/NBK174280/.

  5. Applied bioinformatics: Genome annotation and transcriptome analysis

    DEFF Research Database (Denmark)

    Gupta, Vikas

    and dhurrin, which have not previously been characterized in blueberries. There are more than 44,500 spider species with distinct habitats and unique characteristics. Spiders are masters of producing silk webs to catch prey and using venom to neutralize. The exploration of the genetics behind these properties...... has just started. We have assembled and annotated the first two spider genomes to facilitate our understanding of spiders at the molecular level. The need for analyzing the large and increasing amount of sequencing data has increased the demand for efficient, user friendly, and broadly applicable...

  6. Objective-guided image annotation.

    Science.gov (United States)

    Mao, Qi; Tsang, Ivor Wai-Hung; Gao, Shenghua

    2013-04-01

    Automatic image annotation, which is usually formulated as a multi-label classification problem, is one of the major tools used to enhance the semantic understanding of web images. Many multimedia applications (e.g., tag-based image retrieval) can greatly benefit from image annotation. However, the insufficient performance of image annotation methods prevents these applications from being practical. On the other hand, specific measures are usually designed to evaluate how well one annotation method performs for a specific objective or application, but most image annotation methods do not consider optimization of these measures, so that they are inevitably trapped into suboptimal performance of these objective-specific measures. To address this issue, we first summarize a variety of objective-guided performance measures under a unified representation. Our analysis reveals that macro-averaging measures are very sensitive to infrequent keywords, and hamming measure is easily affected by skewed distributions. We then propose a unified multi-label learning framework, which directly optimizes a variety of objective-specific measures of multi-label learning tasks. Specifically, we first present a multilayer hierarchical structure of learning hypotheses for multi-label problems based on which a variety of loss functions with respect to objective-guided measures are defined. And then, we formulate these loss functions as relaxed surrogate functions and optimize them by structural SVMs. According to the analysis of various measures and the high time complexity of optimizing micro-averaging measures, in this paper, we focus on example-based measures that are tailor-made for image annotation tasks but are seldom explored in the literature. Experiments show consistency with the formal analysis on two widely used multi-label datasets, and demonstrate the superior performance of our proposed method over state-of-the-art baseline methods in terms of example-based measures on four

  7. An automated annotation tool for genomic DNA sequences using GeneScan and BLAST

    Indian Academy of Sciences (India)

    Andrew M. Lynn; Chakresh Kumar Jain; K. Kosalai; Pranjan Barman; Nupur Thakur; Harish Batra; Alok Bhattacharya

    2001-04-01

    Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated annotation of genome DNA sequences.

  8. Collective dynamics of social annotation.

    Science.gov (United States)

    Cattuto, Ciro; Barrat, Alain; Baldassarri, Andrea; Schehr, Gregory; Loreto, Vittorio

    2009-06-30

    The enormous increase of popularity and use of the worldwide web has led in the recent years to important changes in the ways people communicate. An interesting example of this fact is provided by the now very popular social annotation systems, through which users annotate resources (such as web pages or digital photographs) with keywords known as "tags." Understanding the rich emergent structures resulting from the uncoordinated actions of users calls for an interdisciplinary effort. In particular concepts borrowed from statistical physics, such as random walks (RWs), and complex networks theory, can effectively contribute to the mathematical modeling of social annotation systems. Here, we show that the process of social annotation can be seen as a collective but uncoordinated exploration of an underlying semantic space, pictured as a graph, through a series of RWs. This modeling framework reproduces several aspects, thus far unexplained, of social annotation, among which are the peculiar growth of the size of the vocabulary used by the community and its complex network structure that represents an externalization of semantic structures grounded in cognition and that are typically hard to access.

  9. ProGMap: an integrated annotation resource for protein orthology

    NARCIS (Netherlands)

    Kuzniar, A.; Lin, K.; He, Y.; Nijveen, H.; Pongor, S.; Leunissen, J.A.M.

    2009-01-01

    Current protein sequence databases employ different classification schemes that often provide conflicting annotations, especially for poorly characterized proteins. ProGMap (Protein Group Mappings, http://www.bioinformatics.nl/progmap) is a web-tool designed to help researchers and database annotato

  10. Towards Automated Annotation of Benthic Survey Images: Variability of Human Experts and Operational Modes of Automation.

    Directory of Open Access Journals (Sweden)

    Oscar Beijbom

    Full Text Available Global climate change and other anthropogenic stressors have heightened the need to rapidly characterize ecological changes in marine benthic communities across large scales. Digital photography enables rapid collection of survey images to meet this need, but the subsequent image annotation is typically a time consuming, manual task. We investigated the feasibility of using automated point-annotation to expedite cover estimation of the 17 dominant benthic categories from survey-images captured at four Pacific coral reefs. Inter- and intra- annotator variability among six human experts was quantified and compared to semi- and fully- automated annotation methods, which are made available at coralnet.ucsd.edu. Our results indicate high expert agreement for identification of coral genera, but lower agreement for algal functional groups, in particular between turf algae and crustose coralline algae. This indicates the need for unequivocal definitions of algal groups, careful training of multiple annotators, and enhanced imaging technology. Semi-automated annotation, where 50% of the annotation decisions were performed automatically, yielded cover estimate errors comparable to those of the human experts. Furthermore, fully-automated annotation yielded rapid, unbiased cover estimates but with increased variance. These results show that automated annotation can increase spatial coverage and decrease time and financial outlay for image-based reef surveys.

  11. A Manual Curation Strategy to Improve Genome Annotation: Application to a Set of Haloarchael Genomes

    Directory of Open Access Journals (Sweden)

    Friedhelm Pfeiffer

    2015-06-01

    Full Text Available Genome annotation errors are a persistent problem that impede research in the biosciences. A manual curation effort is described that attempts to produce high-quality genome annotations for a set of haloarchaeal genomes (Halobacterium salinarum and Hbt. hubeiense, Haloferax volcanii and Hfx. mediterranei, Natronomonas pharaonis and Nmn. moolapensis, Haloquadratum walsbyi strains HBSQ001 and C23, Natrialba magadii, Haloarcula marismortui and Har. hispanica, and Halohasta litchfieldiae. Genomes are checked for missing genes, start codon misassignments, and disrupted genes. Assignments of a specific function are preferably based on experimentally characterized homologs (Gold Standard Proteins. To avoid overannotation, which is a major source of database errors, we restrict annotation to only general function assignments when support for a specific substrate assignment is insufficient. This strategy results in annotations that are resistant to the plethora of errors that compromise public databases. Annotation consistency is rigorously validated for ortholog pairs from the genomes surveyed. The annotation is regularly crosschecked against the UniProt database to further improve annotations and increase the level of standardization. Enhanced genome annotations are submitted to public databases (EMBL/GenBank, UniProt, to the benefit of the scientific community. The enhanced annotations are also publically available via HaloLex.

  12. Towards Automated Annotation of Benthic Survey Images: Variability of Human Experts and Operational Modes of Automation

    Science.gov (United States)

    Beijbom, Oscar; Edmunds, Peter J.; Roelfsema, Chris; Smith, Jennifer; Kline, David I.; Neal, Benjamin P.; Dunlap, Matthew J.; Moriarty, Vincent; Fan, Tung-Yung; Tan, Chih-Jui; Chan, Stephen; Treibitz, Tali; Gamst, Anthony; Mitchell, B. Greg; Kriegman, David

    2015-01-01

    Global climate change and other anthropogenic stressors have heightened the need to rapidly characterize ecological changes in marine benthic communities across large scales. Digital photography enables rapid collection of survey images to meet this need, but the subsequent image annotation is typically a time consuming, manual task. We investigated the feasibility of using automated point-annotation to expedite cover estimation of the 17 dominant benthic categories from survey-images captured at four Pacific coral reefs. Inter- and intra- annotator variability among six human experts was quantified and compared to semi- and fully- automated annotation methods, which are made available at coralnet.ucsd.edu. Our results indicate high expert agreement for identification of coral genera, but lower agreement for algal functional groups, in particular between turf algae and crustose coralline algae. This indicates the need for unequivocal definitions of algal groups, careful training of multiple annotators, and enhanced imaging technology. Semi-automated annotation, where 50% of the annotation decisions were performed automatically, yielded cover estimate errors comparable to those of the human experts. Furthermore, fully-automated annotation yielded rapid, unbiased cover estimates but with increased variance. These results show that automated annotation can increase spatial coverage and decrease time and financial outlay for image-based reef surveys. PMID:26154157

  13. Annotated Bibliography on Humanistic Education

    Science.gov (United States)

    Ganung, Cynthia

    1975-01-01

    Part I of this annotated bibliography deals with books and articles on such topics as achievement motivation, process education, transactional analysis, discipline without punishment, role-playing, interpersonal skills, self-acceptance, moral education, self-awareness, values clarification, and non-verbal communication. Part II focuses on…

  14. Meaningful Assessment: An Annotated Bibliography.

    Science.gov (United States)

    Thrond, Mary A.

    The annotated bibliography contains citations of nine references on alternative student assessment methods in second language programs, particularly at the secondary school level. The references include a critique of conventional reading comprehension assessment, a discussion of performance assessment, a proposal for a multi-trait, multi-method…

  15. Teacher Evaluation: An Annotated Bibliography.

    Science.gov (United States)

    McKenna, Bernard H.; And Others

    In his introduction to the 86-item annotated bibliography by Mueller and Poliakoff, McKenna discusses his views on teacher evaluation and his impressions of the documents cited. He observes, in part, that the current concern is with the process of evaluation and that most researchers continue to believe that student achievement is the most…

  16. Bioinformatics for plant genome annotation

    NARCIS (Netherlands)

    Fiers, M.W.E.J.

    2006-01-01

    Large amounts of genome sequence data are available and much more will become available in the near future. A DNA sequence alone has, however, limited use. Genome annotation is required to assign biological interpretation to the DNA sequence. This thesis describ

  17. Child Development: An Annotated Bibliography.

    Science.gov (United States)

    Dickerson, LaVerne Thornton, Comp.

    This annotated bibliography focuses on recent publications dealing with factors that influence child growth and development, rather than the developmental processes themselves. Topics include: general sources on child development; physical and perceptual-motor development; cognitive development; social and personality development; and play.…

  18. Nikos Kazantzakis: An Annotated Bibliography.

    Science.gov (United States)

    Qiu, Kui

    This research paper consists of an annotated bibliography about Nikos Kazantzakis, one of the major modern Greek writers and author of "The Last Temptation of Christ,""Zorba the Greek," and many other works. Because of Kazantzakis' position in world literature there are many critical works about him; however, bibliographical control of these works…

  19. Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs.

    Directory of Open Access Journals (Sweden)

    Norihiro Maeda

    2006-04-01

    Full Text Available The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts, providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.

  20. Annotating BI Visualization Dashboards: Needs and Challenges

    OpenAIRE

    Elias, Micheline; Bezerianos, Anastasia

    2012-01-01

    International audience; Annotations have been identified as an important aid in analysis record-keeping and recently data discovery. In this paper we discuss the use of annotations on visualization dashboards, with a special focus on business intelligence (BI) analysis. In-depth interviews with experts lead to new annotation needs for multi-chart visualization systems, on which we based the design of a dashboard prototype that supports data and context aware annotations. We focus particularly ...

  1. Are clickthrough data reliable as image annotations?

    NARCIS (Netherlands)

    Tsikrika, T.; Diou, C.; Vries, A.P. de; Delopoulos, A.

    2009-01-01

    We examine the reliability of clickthrough data as concept-based image annotations, by comparing them against manual annotations, for different concept categories. Our analysis shows that, for many concepts, the image annotations generated by using clickthrough data are reliable, with up to 90% of t

  2. Annotating images by mining image search results

    NARCIS (Netherlands)

    Wang, X.J.; Zhang, L.; Li, X.; Ma, W.Y.

    2008-01-01

    Although it has been studied for years by the computer vision and machine learning communities, image annotation is still far from practical. In this paper, we propose a novel attempt at model-free image annotation, which is a data-driven approach that annotates images by mining their search results

  3. Multi-annotation discursive de corpus écrit

    OpenAIRE

    Péry-Woodley, Marie-Paule

    2011-01-01

    National audience; On the basis of the experience acquired in the course of the ANNODIS project, the following questions are discussed: - what is the annotation campaign for? building an annotated " reference corpus" vs. annotation as an experiment; - defining annotation tasks. Naïve vs. expert annotation; - the annotation manual : from linguistic model to annotation protocol; - automatic pre-processing vs. manual annotation. Segmentation, tagging and mark-ups: steps in corpus preparation; - ...

  4. Dictionary-driven protein annotation.

    Science.gov (United States)

    Rigoutsos, Isidore; Huynh, Tien; Floratos, Aris; Parida, Laxmi; Platt, Daniel

    2002-09-01

    Computational methods seeking to automatically determine the properties (functional, structural, physicochemical, etc.) of a protein directly from the sequence have long been the focus of numerous research groups. With the advent of advanced sequencing methods and systems, the number of amino acid sequences that are being deposited in the public databases has been increasing steadily. This has in turn generated a renewed demand for automated approaches that can annotate individual sequences and complete genomes quickly, exhaustively and objectively. In this paper, we present one such approach that is centered around and exploits the Bio-Dictionary, a collection of amino acid patterns that completely covers the natural sequence space and can capture functional and structural signals that have been reused during evolution, within and across protein families. Our annotation approach also makes use of a weighted, position-specific scoring scheme that is unaffected by the over-representation of well-conserved proteins and protein fragments in the databases used. For a given query sequence, the method permits one to determine, in a single pass, the following: local and global similarities between the query and any protein already present in a public database; the likeness of the query to all available archaeal/ bacterial/eukaryotic/viral sequences in the database as a function of amino acid position within the query; the character of secondary structure of the query as a function of amino acid position within the query; the cytoplasmic, transmembrane or extracellular behavior of the query; the nature and position of binding domains, active sites, post-translationally modified sites, signal peptides, etc. In terms of performance, the proposed method is exhaustive, objective and allows for the rapid annotation of individual sequences and full genomes. Annotation examples are presented and discussed in Results, including individual queries and complete genomes that were

  5. Automatic annotation of organellar genomes with DOGMA

    Energy Technology Data Exchange (ETDEWEB)

    Wyman, Stacia; Jansen, Robert K.; Boore, Jeffrey L.

    2004-06-01

    Dual Organellar GenoMe Annotator (DOGMA) automates the annotation of extra-nuclear organellar (chloroplast and animal mitochondrial) genomes. It is a web-based package that allows the use of comparative BLAST searches to identify and annotate genes in a genome. DOGMA presents a list of putative genes to the user in a graphical format for viewing and editing. Annotations are stored on our password-protected server. Complete annotations can be extracted for direct submission to GenBank. Furthermore, intergenic regions of specified length can be extracted, as well the nucleotide sequences and amino acid sequences of the genes.

  6. Certifying cost annotations in compilers

    CERN Document Server

    Amadio, Roberto M; Régis-Gianas, Yann; Saillard, Ronan

    2010-01-01

    We discuss the problem of building a compiler which can lift in a provably correct way pieces of information on the execution cost of the object code to cost annotations on the source code. To this end, we need a clear and flexible picture of: (i) the meaning of cost annotations, (ii) the method to prove them sound and precise, and (iii) the way such proofs can be composed. We propose a so-called labelling approach to these three questions. As a first step, we examine its application to a toy compiler. This formal study suggests that the labelling approach has good compositionality and scalability properties. In order to provide further evidence for this claim, we report our successful experience in implementing and testing the labelling approach on top of a prototype compiler written in OCAML for (a large fragment of) the C language.

  7. Corpus Annotation for Parser Evaluation

    OpenAIRE

    CARROLL, JOHN; Minnen, Guido; Briscoe, Ted

    1999-01-01

    We describe a recently developed corpus annotation scheme for evaluating parsers that avoids shortcomings of current methods. The scheme encodes grammatical relations between heads and dependents, and has been used to mark up a new public-domain corpus of naturally occurring English text. We show how the corpus can be used to evaluate the accuracy of a robust parser, and relate the corpus to extant resources.

  8. Multilayer Neural Networks and Nearest Neighbor Classifier Performances for Image Annotation

    Directory of Open Access Journals (Sweden)

    Mustapha OUJAOURA

    2012-12-01

    Full Text Available The explosive growth of image data leads to the research and development of image content searching and indexing systems. Image annotation systems aim at annotating automatically animage with some controlled keywords that can be used for indexing and retrieval of images. This paper presents a comparative evaluation of the image content annotation system by using the multilayer neural networks and the nearest neighbour classifier. The region growing segmentation is used to separate objects, the Hu moments, Legendre moments and Zernike moments which are used in as feature descriptors for the image content characterization and annotation.The ETH-80 database image is used in the experiments here. The best annotation rate is achieved by using Legendre moments as feature extraction method and the multilayer neural network as a classifier

  9. Functional annotation of hierarchical modularity.

    Directory of Open Access Journals (Sweden)

    Kanchana Padmanabhan

    Full Text Available In biological networks of molecular interactions in a cell, network motifs that are biologically relevant are also functionally coherent, or form functional modules. These functionally coherent modules combine in a hierarchical manner into larger, less cohesive subsystems, thus revealing one of the essential design principles of system-level cellular organization and function-hierarchical modularity. Arguably, hierarchical modularity has not been explicitly taken into consideration by most, if not all, functional annotation systems. As a result, the existing methods would often fail to assign a statistically significant functional coherence score to biologically relevant molecular machines. We developed a methodology for hierarchical functional annotation. Given the hierarchical taxonomy of functional concepts (e.g., Gene Ontology and the association of individual genes or proteins with these concepts (e.g., GO terms, our method will assign a Hierarchical Modularity Score (HMS to each node in the hierarchy of functional modules; the HMS score and its p-value measure functional coherence of each module in the hierarchy. While existing methods annotate each module with a set of "enriched" functional terms in a bag of genes, our complementary method provides the hierarchical functional annotation of the modules and their hierarchically organized components. A hierarchical organization of functional modules often comes as a bi-product of cluster analysis of gene expression data or protein interaction data. Otherwise, our method will automatically build such a hierarchy by directly incorporating the functional taxonomy information into the hierarchy search process and by allowing multi-functional genes to be part of more than one component in the hierarchy. In addition, its underlying HMS scoring metric ensures that functional specificity of the terms across different levels of the hierarchical taxonomy is properly treated. We have evaluated our

  10. Omics data management and annotation.

    Science.gov (United States)

    Harel, Arye; Dalah, Irina; Pietrokovski, Shmuel; Safran, Marilyn; Lancet, Doron

    2011-01-01

    Technological Omics breakthroughs, including next generation sequencing, bring avalanches of data which need to undergo effective data management to ensure integrity, security, and maximal knowledge-gleaning. Data management system requirements include flexible input formats, diverse data entry mechanisms and views, user friendliness, attention to standards, hardware and software platform definition, as well as robustness. Relevant solutions elaborated by the scientific community include Laboratory Information Management Systems (LIMS) and standardization protocols facilitating data sharing and managing. In project planning, special consideration has to be made when choosing relevant Omics annotation sources, since many of them overlap and require sophisticated integration heuristics. The data modeling step defines and categorizes the data into objects (e.g., genes, articles, disorders) and creates an application flow. A data storage/warehouse mechanism must be selected, such as file-based systems and relational databases, the latter typically used for larger projects. Omics project life cycle considerations must include the definition and deployment of new versions, incorporating either full or partial updates. Finally, quality assurance (QA) procedures must validate data and feature integrity, as well as system performance expectations. We illustrate these data management principles with examples from the life cycle of the GeneCards Omics project (http://www.genecards.org), a comprehensive, widely used compendium of annotative information about human genes. For example, the GeneCards infrastructure has recently been changed from text files to a relational database, enabling better organization and views of the growing data. Omics data handling benefits from the wealth of Web-based information, the vast amount of public domain software, increasingly affordable hardware, and effective use of data management and annotation principles as outlined in this chapter.

  11. Knowledge Annotation maknig implicit knowledge explicit

    CERN Document Server

    Dingli, Alexiei

    2011-01-01

    Did you ever read something on a book, felt the need to comment, took up a pencil and scribbled something on the books' text'? If you did, you just annotated a book. But that process has now become something fundamental and revolutionary in these days of computing. Annotation is all about adding further information to text, pictures, movies and even to physical objects. In practice, anything which can be identified either virtually or physically can be annotated. In this book, we will delve into what makes annotations, and analyse their significance for the future evolutions of the web. We wil

  12. An evaluation of GO annotation retrieval for BioCreAtIvE and GOA

    Directory of Open Access Journals (Sweden)

    Camon Evelyn B

    2005-05-01

    Full Text Available Abstract Background The Gene Ontology Annotation (GOA database http://www.ebi.ac.uk/GOA aims to provide high-quality supplementary GO annotation to proteins in the UniProt Knowledgebase. Like many other biological databases, GOA gathers much of its content from the careful manual curation of literature. However, as both the volume of literature and of proteins requiring characterization increases, the manual processing capability can become overloaded. Consequently, semi-automated aids are often employed to expedite the curation process. Traditionally, electronic techniques in GOA depend largely on exploiting the knowledge in existing resources such as InterPro. However, in recent years, text mining has been hailed as a potentially useful tool to aid the curation process. To encourage the development of such tools, the GOA team at EBI agreed to take part in the functional annotation task of the BioCreAtIvE (Critical Assessment of Information Extraction systems in Biology challenge. BioCreAtIvE task 2 was an experiment to test if automatically derived classification using information retrieval and extraction could assist expert biologists in the annotation of the GO vocabulary to the proteins in the UniProt Knowledgebase. GOA provided the training corpus of over 9000 manual GO annotations extracted from the literature. For the test set, we provided a corpus of 200 new Journal of Biological Chemistry articles used to annotate 286 human proteins with GO terms. A team of experts manually evaluated the results of 9 participating groups, each of which provided highlighted sentences to support their GO and protein annotation predictions. Here, we give a biological perspective on the evaluation, explain how we annotate GO using literature and offer some suggestions to improve the precision of future text-retrieval and extraction techniques. Finally, we provide the results of the first inter-annotator agreement study for manual GO curation, as well as an

  13. Information theory applied to the sparse gene ontology annotation network to predict novel gene function

    Science.gov (United States)

    Tao, Ying; Li, Jianrong

    2010-01-01

    Motivation Despite advances in the gene annotation process, the functions of a large portion of the gene products remain insufficiently characterized. In addition, the “in silico” prediction of novel Gene Ontology (GO) annotations for partially characterized gene functions or processes is highly dependent on reverse genetic or function genomics approaches. Results We propose a novel approach, Information Theory-based Semantic Similarity (ITSS), to automatically predict molecular functions of genes based on Gene Ontology annotations. We have demonstrated using a 10-fold cross-validation that the ITSS algorithm obtains prediction accuracies (Precision 97%, Recall 77%) comparable to other machine learning algorithms when applied to similarly dense annotated portions of the GO datasets. In addition, such method can generate highly accurate predictions in sparsely annotated portions of GO, in which previous algorithm failed to do so. As a result, our technique generates an order of magnitude more gene function predictions than previous methods. Further, this paper presents the first historical rollback validation for the predicted GO annotations, which may represent more realistic conditions for an evaluation than generally used cross-validations type of evaluations. By manually assessing a random sample of 100 predictions conducted in a historical roll-back evaluation, we estimate that a minimum precision of 51% (95% confidence interval: 43%–58%) can be achieved for the human GO Annotation file dated 2003. Availability The program is available on request. The 97,732 positive predictions of novel gene annotations from the 2005 GO Annotation dataset are available at http://phenos.bsd.uchicago.edu/mphenogo/prediction_result_2005.txt. PMID:17646340

  14. Towards an event annotated corpus of Polish

    Directory of Open Access Journals (Sweden)

    Michał Marcińczuk

    2015-12-01

    Full Text Available Towards an event annotated corpus of Polish The paper presents a typology of events built on the basis of TimeML specification adapted to Polish language. Some changes were introduced to the definition of the event categories and a motivation for event categorization was formulated. The event annotation task is presented on two levels – ontology level (language independent and text mentions (language dependant. The various types of event mentions in Polish text are discussed. A procedure for annotation of event mentions in Polish texts is presented and evaluated. In the evaluation a randomly selected set of documents from the Corpus of Wrocław University of Technology (called KPWr was annotated by two linguists and the annotator agreement was calculated. The evaluation was done in two iterations. After the first evaluation we revised and improved the annotation procedure. The second evaluation showed a significant improvement of the agreement between annotators. The current work was focused on annotation and categorisation of event mentions in text. The future work will be focused on description of event with a set of attributes, arguments and relations.

  15. Ground Truth Annotation in T Analyst

    DEFF Research Database (Denmark)

    2015-01-01

    This video shows how to annotate the ground truth tracks in the thermal videos. The ground truth tracks are produced to be able to compare them to tracks obtained from a Computer Vision tracking approach. The program used for annotation is T-Analyst, which is developed by Aliaksei Laureshyn, Ph...

  16. Creating Gaze Annotations in Head Mounted Displays

    DEFF Research Database (Denmark)

    Mardanbeigi, Diako; Qvarfordt, Pernilla

    2015-01-01

    To facilitate distributed communication in mobile settings, we developed GazeNote for creating and sharing gaze annotations in head mounted displays (HMDs). With gaze annotations it possible to point out objects of interest within an image and add a verbal description. To create an annota- tion, ...

  17. Annotation of regular polysemy and underspecification

    DEFF Research Database (Denmark)

    Martínez Alonso, Héctor; Pedersen, Bolette Sandford; Bel, Núria

    2013-01-01

    We present the result of an annotation task on regular polysemy for a series of seman- tic classes or dot types in English, Dan- ish and Spanish. This article describes the annotation process, the results in terms of inter-encoder agreement, and the sense distributions obtained with two methods...

  18. Harnessing Collaborative Annotations on Online Formative Assessments

    Science.gov (United States)

    Lin, Jian-Wei; Lai, Yuan-Cheng

    2013-01-01

    This paper harnesses collaborative annotations by students as learning feedback on online formative assessments to improve the learning achievements of students. Through the developed Web platform, students can conduct formative assessments, collaboratively annotate, and review historical records in a convenient way, while teachers can generate…

  19. The surplus value of semantic annotations

    NARCIS (Netherlands)

    M. Marx

    2010-01-01

    We compare the costs of semantic annotation of textual documents to its benefits for information processing tasks. Semantic annotation can improve the performance of retrieval tasks and facilitates an improved search experience through faceted search, focused retrieval, better document summaries, an

  20. Towards Viral Genome Annotation Standards, Report from the 2010 NCBI Annotation Workshop.

    Science.gov (United States)

    Brister, James Rodney; Bao, Yiming; Kuiken, Carla; Lefkowitz, Elliot J; Le Mercier, Philippe; Leplae, Raphael; Madupu, Ramana; Scheuermann, Richard H; Schobel, Seth; Seto, Donald; Shrivastava, Susmita; Sterk, Peter; Zeng, Qiandong; Klimke, William; Tatusova, Tatiana

    2010-10-01

    Improvements in DNA sequencing technologies portend a new era in virology and could possibly lead to a giant leap in our understanding of viral evolution and ecology. Yet, as viral genome sequences begin to fill the world's biological databases, it is critically important to recognize that the scientific promise of this era is dependent on consistent and comprehensive genome annotation. With this in mind, the NCBI Genome Annotation Workshop recently hosted a study group tasked with developing sequence, function, and metadata annotation standards for viral genomes. This report describes the issues involved in viral genome annotation and reviews policy recommendations presented at the NCBI Annotation Workshop.

  1. KEGG as a reference resource for gene and protein annotation.

    Science.gov (United States)

    Kanehisa, Minoru; Sato, Yoko; Kawashima, Masayuki; Furumichi, Miho; Tanabe, Mao

    2016-01-04

    KEGG (http://www.kegg.jp/ or http://www.genome.jp/kegg/) is an integrated database resource for biological interpretation of genome sequences and other high-throughput data. Molecular functions of genes and proteins are associated with ortholog groups and stored in the KEGG Orthology (KO) database. The KEGG pathway maps, BRITE hierarchies and KEGG modules are developed as networks of KO nodes, representing high-level functions of the cell and the organism. Currently, more than 4000 complete genomes are annotated with KOs in the KEGG GENES database, which can be used as a reference data set for KO assignment and subsequent reconstruction of KEGG pathways and other molecular networks. As an annotation resource, the following improvements have been made. First, each KO record is re-examined and associated with protein sequence data used in experiments of functional characterization. Second, the GENES database now includes viruses, plasmids, and the addendum category for functionally characterized proteins that are not represented in complete genomes. Third, new automatic annotation servers, BlastKOALA and GhostKOALA, are made available utilizing the non-redundant pangenome data set generated from the GENES database. As a resource for translational bioinformatics, various data sets are created for antimicrobial resistance and drug interaction networks.

  2. Manual Annotation of Translational Equivalence The Blinker Project

    CERN Document Server

    Melamed, I D

    1998-01-01

    Bilingual annotators were paid to link roughly sixteen thousand corresponding words between on-line versions of the Bible in modern French and modern English. These annotations are freely available to the research community from http://www.cis.upenn.edu/~melamed . The annotations can be used for several purposes. First, they can be used as a standard data set for developing and testing translation lexicons and statistical translation models. Second, researchers in lexical semantics will be able to mine the annotations for insights about cross-linguistic lexicalization patterns. Third, the annotations can be used in research into certain recently proposed methods for monolingual word-sense disambiguation. This paper describes the annotated texts, the specially-designed annotation tool, and the strategies employed to increase the consistency of the annotations. The annotation process was repeated five times by different annotators. Inter-annotator agreement rates indicate that the annotations are reasonably rel...

  3. Concept annotation in the CRAFT corpus

    Directory of Open Access Journals (Sweden)

    Bada Michael

    2012-07-01

    Full Text Available Abstract Background Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text. Results This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released. Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement. Conclusions As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens, our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection, the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are

  4. Facilitating functional annotation of chicken microarray data

    Directory of Open Access Journals (Sweden)

    Gresham Cathy R

    2009-10-01

    Full Text Available Abstract Background Modeling results from chicken microarray studies is challenging for researchers due to little functional annotation associated with these arrays. The Affymetrix GenChip chicken genome array, one of the biggest arrays that serve as a key research tool for the study of chicken functional genomics, is among the few arrays that link gene products to Gene Ontology (GO. However the GO annotation data presented by Affymetrix is incomplete, for example, they do not show references linked to manually annotated functions. In addition, there is no tool that facilitates microarray researchers to directly retrieve functional annotations for their datasets from the annotated arrays. This costs researchers amount of time in searching multiple GO databases for functional information. Results We have improved the breadth of functional annotations of the gene products associated with probesets on the Affymetrix chicken genome array by 45% and the quality of annotation by 14%. We have also identified the most significant diseases and disorders, different types of genes, and known drug targets represented on Affymetrix chicken genome array. To facilitate functional annotation of other arrays and microarray experimental datasets we developed an Array GO Mapper (AGOM tool to help researchers to quickly retrieve corresponding functional information for their dataset. Conclusion Results from this study will directly facilitate annotation of other chicken arrays and microarray experimental datasets. Researchers will be able to quickly model their microarray dataset into more reliable biological functional information by using AGOM tool. The disease, disorders, gene types and drug targets revealed in the study will allow researchers to learn more about how genes function in complex biological systems and may lead to new drug discovery and development of therapies. The GO annotation data generated will be available for public use via AgBase website and

  5. A Common XML-based Framework for Syntactic Annotations

    CERN Document Server

    Ide, Nancy; Erjavec, Tomaz

    2009-01-01

    It is widely recognized that the proliferation of annotation schemes runs counter to the need to re-use language resources, and that standards for linguistic annotation are becoming increasingly mandatory. To answer this need, we have developed a framework comprised of an abstract model for a variety of different annotation types (e.g., morpho-syntactic tagging, syntactic annotation, co-reference annotation, etc.), which can be instantiated in different ways depending on the annotator's approach and goals. In this paper we provide an overview of the framework, demonstrate its applicability to syntactic annotation, and show how it can contribute to comparative evaluation of parser output and diverse syntactic annotation schemes.

  6. Making web annotations persistent over time

    Energy Technology Data Exchange (ETDEWEB)

    Sanderson, Robert [Los Alamos National Laboratory; Van De Sompel, Herbert [Los Alamos National Laboratory

    2010-01-01

    As Digital Libraries (DL) become more aligned with the web architecture, their functional components need to be fundamentally rethought in terms of URIs and HTTP. Annotation, a core scholarly activity enabled by many DL solutions, exhibits a clearly unacceptable characteristic when existing models are applied to the web: due to the representations of web resources changing over time, an annotation made about a web resource today may no longer be relevant to the representation that is served from that same resource tomorrow. We assume the existence of archived versions of resources, and combine the temporal features of the emerging Open Annotation data model with the capability offered by the Memento framework that allows seamless navigation from the URI of a resource to archived versions of that resource, and arrive at a solution that provides guarantees regarding the persistence of web annotations over time. More specifically, we provide theoretical solutions and proof-of-concept experimental evaluations for two problems: reconstructing an existing annotation so that the correct archived version is displayed for all resources involved in the annotation, and retrieving all annotations that involve a given archived version of a web resource.

  7. SeqAnt: A web service to rapidly identify and annotate DNA sequence variations

    Directory of Open Access Journals (Sweden)

    Patel Viren

    2010-09-01

    Full Text Available Abstract Background The enormous throughput and low cost of second-generation sequencing platforms now allow research and clinical geneticists to routinely perform single experiments that identify tens of thousands to millions of variant sites. Existing methods to annotate variant sites using information from publicly available databases via web browsers are too slow to be useful for the large sequencing datasets being routinely generated by geneticists. Because sequence annotation of variant sites is required before functional characterization can proceed, the lack of a high-throughput pipeline to efficiently annotate variant sites can act as a significant bottleneck in genetics research. Results SeqAnt (Sequence Annotator is an open source web service and software package that rapidly annotates DNA sequence variants and identifies recessive or compound heterozygous loci in human, mouse, fly, and worm genome sequencing experiments. Variants are characterized with respect to their functional type, frequency, and evolutionary conservation. Annotated variants can be viewed on a web browser, downloaded in a tab-delimited text file, or directly uploaded in a BED format to the UCSC genome browser. To demonstrate the speed of SeqAnt, we annotated a series of publicly available datasets that ranged in size from 37 to 3,439,107 variant sites. The total time to completely annotate these data completely ranged from 0.17 seconds to 28 minutes 49.8 seconds. Conclusion SeqAnt is an open source web service and software package that overcomes a critical bottleneck facing research and clinical geneticists using second-generation sequencing platforms. SeqAnt will prove especially useful for those investigators who lack dedicated bioinformatics personnel or infrastructure in their laboratories.

  8. Computational annotation of genes differentially expressed along olive fruit development

    Directory of Open Access Journals (Sweden)

    Martinelli Federico

    2009-10-01

    used to query all known KEGG (Kyoto Encyclopaedia of Genes and Genomes metabolic pathways for characterizing and positioning retrieved EST records. The integration of the olive sequence datasets within the MapMan platform for microarray analysis allowed the identification of specific biosynthetic pathways useful for the definition of key functional categories in time course analyses for gene groups. Conclusion The bioinformatic annotation of all gene sequences was useful to shed light on metabolic pathways and transcriptional aspects related to carbohydrates, fatty acids, secondary metabolites, transcription factors and hormones as well as response to biotic and abiotic stresses throughout olive drupe development. These results represent a first step toward both functional genomics and systems biology research for understanding the gene functions and regulatory networks in olive fruit growth and ripening.

  9. Annotation Style Guide for the Blinker Project

    CERN Document Server

    Melamed, I D

    1998-01-01

    This annotation style guide was created by and for the Blinker project at the University of Pennsylvania. The Blinker project was so named after the ``bilingual linker'' GUI, which was created to enable bilingual annotators to ``link'' word tokens that are mutual translations in parallel texts. The parallel text chosen for this project was the Bible, because it is probably the easiest text to obtain in electronic form in multiple languages. The languages involved were English and French, because, of the languages with which the project co-ordinator was familiar, these were the two for which a sufficient number of annotators was likely to be found.

  10. DIMA – Annotation guidelines for German intonation

    DEFF Research Database (Denmark)

    Kügler, Frank; Smolibocki, Bernadett; Arnold, Denis

    2015-01-01

    easier since German intonation is currently annotated according to different models. To this end, we aim to provide guidelines that are easy to learn. The guidelines were evaluated running an inter-annotator reliability study on three different speech styles (read speech, monologue and dialogue......This paper presents newly developed guidelines for prosodic annotation of German as a consensus system agreed upon by German intonologists. The DIMA system is rooted in the framework of autosegmental-metrical phonology. One important goal of the consensus is to make exchanging data between groups...

  11. Crowdsourcing and annotating NER for Twitter #drift

    DEFF Research Database (Denmark)

    Fromreide, Hege; Hovy, Dirk; Søgaard, Anders

    2014-01-01

    We present two new NER datasets for Twitter; a manually annotated set of 1,467 tweets (kappa=0.942) and a set of 2,975 expert-corrected, crowdsourced NER annotated tweets from the dataset described in Finin et al. (2010). In our experiments with these datasets, we observe two important points: (a......) language drift on Twitter is significant, and while off-the-shelf systems have been reported to perform well on in-sample data, they often perform poorly on new samples of tweets, (b) state-of-the-art performance across various datasets can beobtained from crowdsourced annotations, making it more feasible...

  12. Augmented annotation and orthologue analysis for Oryctolagus cuniculus: Better Bunny

    Directory of Open Access Journals (Sweden)

    Craig Douglas B

    2012-05-01

    Full Text Available Abstract Background The rabbit is an important model organism used in a wide range of biomedical research. However, the rabbit genome is still sparsely annotated, thus prohibiting extensive functional analysis of gene sets derived from whole-genome experiments. We developed a web-based application that provides augmented annotation and orthologue analysis for rabbit genes. Importantly, the application allows comprehensive functional analysis through the use of orthologous relationships. Results Using data extracted from several public bioinformatics repositories we created Better Bunny, a database and query tool that extensively augments the available functional annotation for rabbit genes. Using the complete set of target genes from a commercial rabbit gene expression microarray as our benchmark, we are able to obtain functional information for 88 % of the genes on the microarray. Previously, functional information was available for fewer than 10 % of the rabbit genes. Conclusions We have developed a freely available, web-accessible bioinformatics tool that enables investigators to quickly and easily perform extensive functional analysis of rabbit genes (http://cptweb.cpt.wayne.edu. The software application fills a critical void for a wide range of biomedical research that relies on the rabbit model and requires characterization of biological function for large sets of genes.

  13. Meteor showers an annotated catalog

    CERN Document Server

    Kronk, Gary W

    2014-01-01

    Meteor showers are among the most spectacular celestial events that may be observed by the naked eye, and have been the object of fascination throughout human history. In “Meteor Showers: An Annotated Catalog,” the interested observer can access detailed research on over 100 annual and periodic meteor streams in order to capitalize on these majestic spectacles. Each meteor shower entry includes details of their discovery, important observations and orbits, and gives a full picture of duration, location in the sky, and expected hourly rates. Armed with a fuller understanding, the amateur observer can better view and appreciate the shower of their choice. The original book, published in 1988, has been updated with over 25 years of research in this new and improved edition. Almost every meteor shower study is expanded, with some original minor showers being dropped while new ones are added. The book also includes breakthroughs in the study of meteor showers, such as accurate predictions of outbursts as well ...

  14. SASL: A Semantic Annotation System for Literature

    Science.gov (United States)

    Yuan, Pingpeng; Wang, Guoyin; Zhang, Qin; Jin, Hai

    Due to ambiguity, search engines for scientific literatures may not return right search results. One efficient solution to the problems is to automatically annotate literatures and attach the semantic information to them. Generally, semantic annotation requires identifying entities before attaching semantic information to them. However, due to abbreviation and other reasons, it is very difficult to identify entities correctly. The paper presents a Semantic Annotation System for Literature (SASL), which utilizes Wikipedia as knowledge base to annotate literatures. SASL mainly attaches semantic to terminology, academic institutions, conferences, and journals etc. Many of them are usually abbreviations, which induces ambiguity. Here, SASL uses regular expressions to extract the mapping between full name of entities and their abbreviation. Since full names of several entities may map to a single abbreviation, SASL introduces Hidden Markov Model to implement name disambiguation. Finally, the paper presents the experimental results, which confirm SASL a good performance.

  15. Annotation and retrieval in protein interaction databases

    Science.gov (United States)

    Cannataro, Mario; Hiram Guzzi, Pietro; Veltri, Pierangelo

    2014-06-01

    Biological databases have been developed with a special focus on the efficient retrieval of single records or the efficient computation of specialized bioinformatics algorithms against the overall database, such as in sequence alignment. The continuos production of biological knowledge spread on several biological databases and ontologies, such as Gene Ontology, and the availability of efficient techniques to handle such knowledge, such as annotation and semantic similarity measures, enable the development on novel bioinformatics applications that explicitly use and integrate such knowledge. After introducing the annotation process and the main semantic similarity measures, this paper shows how annotations and semantic similarity can be exploited to improve the extraction and analysis of biologically relevant data from protein interaction databases. As case studies, the paper presents two novel software tools, OntoPIN and CytoSeVis, both based on the use of Gene Ontology annotations, for the advanced querying of protein interaction databases and for the enhanced visualization of protein interaction networks.

  16. Modeling Social Annotation: a Bayesian Approach

    CERN Document Server

    Plangprasopchok, Anon

    2008-01-01

    Collaborative tagging systems, such as del.icio.us, CiteULike, and others, allow users to annotate objects, e.g., Web pages or scientific papers, with descriptive labels called tags. The social annotations, contributed by thousands of users, can potentially be used to infer categorical knowledge, classify documents or recommend new relevant information. Traditional text inference methods do not make best use of socially-generated data, since they do not take into account variations in individual users' perspectives and vocabulary. In a previous work, we introduced a simple probabilistic model that takes interests of individual annotators into account in order to find hidden topics of annotated objects. Unfortunately, our proposed approach had a number of shortcomings, including overfitting, local maxima and the requirement to specify values for some parameters. In this paper we address these shortcomings in two ways. First, we extend the model to a fully Bayesian framework. Second, we describe an infinite ver...

  17. Annotation of Scientific Summaries for Information Retrieval

    CERN Document Server

    Ibekwe-Sanjuan, Fidelia; Eric, Sanjuan; Eric, Charton

    2011-01-01

    We present a methodology combining surface NLP and Machine Learning techniques for ranking asbtracts and generating summaries based on annotated corpora. The corpora were annotated with meta-semantic tags indicating the category of information a sentence is bearing (objective, findings, newthing, hypothesis, conclusion, future work, related work). The annotated corpus is fed into an automatic summarizer for query-oriented abstract ranking and multi- abstract summarization. To adapt the summarizer to these two tasks, two novel weighting functions were devised in order to take into account the distribution of the tags in the corpus. Results, although still preliminary, are encouraging us to pursue this line of work and find better ways of building IR systems that can take into account semantic annotations in a corpus.

  18. Development and Validation of 697 Novel Polymorphic Genomic and EST-SSR Markers in the American Cranberry (Vaccinium macrocarpon Ait.

    Directory of Open Access Journals (Sweden)

    Brandon Schlautman

    2015-01-01

    Full Text Available The American cranberry, Vaccinium macrocarpon Ait., is an economically important North American fruit crop that is consumed because of its unique flavor and potential health benefits. However, a lack of abundant, genome-wide molecular markers has limited the adoption of modern molecular assisted selection approaches in cranberry breeding programs. To increase the number of available markers in the species, this study identified, tested, and validated microsatellite markers from existing nuclear and transcriptome sequencing data. In total, new primers were designed, synthesized, and tested for 979 SSR loci; 697 of the markers amplified allele patterns consistent with single locus segregation in a diploid organism and were considered polymorphic. Of the 697 polymorphic loci, 507 were selected for additional genetic diversity and segregation analyses in 29 cranberry genotypes. More than 95% of the 507 loci did not display segregation distortion at the p < 0.05 level, and contained moderate to high levels of polymorphism with a polymorphic information content >0.25. This comprehensive collection of developed and validated microsatellite loci represents a substantial addition to the molecular tools available for geneticists, genomicists, and breeders in cranberry and Vaccinium.

  19. Development of EST-SSR markers for the study of population structure in lettuce (Lactuca sativa L.).

    Science.gov (United States)

    Simko, Ivan

    2009-01-01

    A set of 61 simple sequence repeat (SSR) markers was developed from the 19,523 Lactuca sativa and Lactuca serriola unigenes. Approximately 4.5% of the unigenes contained a perfect SSR at least 20 bp long, corresponding to roughly 1 perfect SSR per 14.7 kb. Marker polymorphism was tested on a set comprising 96 accessions representing all major horticultural types and 3 wild species (L. serriola, Lactuca saligna, and Lactuca virosa). Both the average marker heterozygosity (UHe = 0.32) and the number of different alleles per locus (Na = 3.56) were significantly reduced in expressed sequence tag (EST)-SSRs as compared with anonymous SSRs (UHe = 0.59, Na = 5.53). Marker transfer rate to the wild species corresponded to the decreasing sexual compatibility with L. sativa and was higher for EST-SSRs (100% L. serriola, 87% L. saligna, and 75% L. virosa) than for anonymous SSRs (93%, 66%, and 42%, respectively). Assessment of population structure among 90 L. sativa cultivars with SSRs was in good agreement with classification into the horticultural types. The average marker heterozygosity was smallest in iceberg (0.097), Latin (0.140), and romaine-type (0.151) cultivars while highest in leaf (green leaf 0.208 and red leaf 0.240) lettuces. The level of marker heterozygosity is in accord with morphological variability observed in different horticultural types.

  20. A fast and cost-effective approach to develop and map EST-SSR markers: oak as a case study

    Directory of Open Access Journals (Sweden)

    Cherubini Marcello

    2010-10-01

    Full Text Available Abstract Background Expressed Sequence Tags (ESTs are a source of simple sequence repeats (SSRs that can be used to develop molecular markers for genetic studies. The availability of ESTs for Quercus robur and Quercus petraea provided a unique opportunity to develop microsatellite markers to accelerate research aimed at studying adaptation of these long-lived species to their environment. As a first step toward the construction of a SSR-based linkage map of oak for quantitative trait locus (QTL mapping, we describe the mining and survey of EST-SSRs as well as a fast and cost-effective approach (bin mapping to assign these markers to an approximate map position. We also compared the level of polymorphism between genomic and EST-derived SSRs and address the transferability of EST-SSRs in Castanea sativa (chestnut. Results A catalogue of 103,000 Sanger ESTs was assembled into 28,024 unigenes from which 18.6% presented one or more SSR motifs. More than 42% of these SSRs corresponded to trinucleotides. Primer pairs were designed for 748 putative unigenes. Overall 37.7% (283 were found to amplify a single polymorphic locus in a reference full-sib pedigree of Quercus robur. The usefulness of these loci for establishing a genetic map was assessed using a bin mapping approach. Bin maps were constructed for the male and female parental tree for which framework linkage maps based on AFLP markers were available. The bin set consisting of 14 highly informative offspring selected based on the number and position of crossover sites. The female and male maps comprised 44 and 37 bins, with an average bin length of 16.5 cM and 20.99 cM, respectively. A total of 256 EST-SSRs were assigned to bins and their map position was further validated by linkage mapping. EST-SSRs were found to be less polymorphic than genomic SSRs, but their transferability rate to chestnut, a phylogenetically related species to oak, was higher. Conclusion We have generated a bin map for oak comprising 256 EST-SSRs. This resource constitutes a first step toward the establishment of a gene-based map for this genus that will facilitate the dissection of QTLs affecting complex traits of ecological importance.

  1. Fluid Annotations in a Open World

    DEFF Research Database (Denmark)

    Zellweger, Polle Trescott; Bouvin, Niels Olof; Jehøj, Henning

    2001-01-01

    Fluid Documents use animated typographical changes to provide a novel and appealing user experience for hypertext browsing and for viewing document annotations in context. This paper describes an effort to broaden the utility of Fluid Documents by using the open hypermedia Arakne Environment to l...... to layer fluid annotations and links on top of abitrary HTML pages on the World Wide Web. Changes to both Fluid Documents and Arakne are required....

  2. Semantic Annotation to Support Automatic Taxonomy Classification

    DEFF Research Database (Denmark)

    Kim, Sanghee; Ahmed, Saeema; Wallace, Ken

    2006-01-01

    , the annotations identify which parts of a text are more important for understanding its contents. The extraction of salient sentences is a major issue in text summarisation. Commonly used methods are based on statistical analysis, but for subject-matter type texts, linguistically motivated natural language...... processing techniques, like semantic annotations, are preferred. An experiment to test the method using 140 documents collected from industry demonstrated that classification accuracy can be improved by up to 16%....

  3. Collaborative annotation of 3D crystallographic models.

    Science.gov (United States)

    Hunter, J; Henderson, M; Khan, I

    2007-01-01

    This paper describes the AnnoCryst system-a tool that was designed to enable authenticated collaborators to share online discussions about 3D crystallographic structures through the asynchronous attachment, storage, and retrieval of annotations. Annotations are personal comments, interpretations, questions, assessments, or references that can be attached to files, data, digital objects, or Web pages. The AnnoCryst system enables annotations to be attached to 3D crystallographic models retrieved from either private local repositories (e.g., Fedora) or public online databases (e.g., Protein Data Bank or Inorganic Crystal Structure Database) via a Web browser. The system uses the Jmol plugin for viewing and manipulating the 3D crystal structures but extends Jmol by providing an additional interface through which annotations can be created, attached, stored, searched, browsed, and retrieved. The annotations are stored on a standardized Web annotation server (Annotea), which has been extended to support 3D macromolecular structures. Finally, the system is embedded within a security framework that is capable of authenticating users and restricting access only to trusted colleagues.

  4. An annotation based approach to support design communication

    CERN Document Server

    Hisarciklilar, Onur

    2007-01-01

    The aim of this paper is to propose an approach based on the concept of annotation for supporting design communication. In this paper, we describe a co-operative design case study where we analyse some annotation practices, mainly focused on design minutes recorded during project reviews. We point out specific requirements concerning annotation needs. Based on these requirements, we propose an annotation model, inspired from the Speech Act Theory (SAT) to support communication in a 3D digital environment. We define two types of annotations in the engineering design context, locutionary and illocutionary annotations. The annotations we describe in this paper are materialised by a set of digital artefacts, which have a semantic dimension allowing express/record elements of technical justifications, traces of contradictory debates, etc. In this paper, we first clarify the semantic annotation concept, and we define general properties of annotations in the engineering design context, and the role of annotations in...

  5. Discovering gene annotations in biomedical text databases

    Directory of Open Access Journals (Sweden)

    Ozsoyoglu Gultekin

    2008-03-01

    Full Text Available Abstract Background Genes and gene products are frequently annotated with Gene Ontology concepts based on the evidence provided in genomics articles. Manually locating and curating information about a genomic entity from the biomedical literature requires vast amounts of human effort. Hence, there is clearly a need forautomated computational tools to annotate the genes and gene products with Gene Ontology concepts by computationally capturing the related knowledge embedded in textual data. Results In this article, we present an automated genomic entity annotation system, GEANN, which extracts information about the characteristics of genes and gene products in article abstracts from PubMed, and translates the discoveredknowledge into Gene Ontology (GO concepts, a widely-used standardized vocabulary of genomic traits. GEANN utilizes textual "extraction patterns", and a semantic matching framework to locate phrases matching to a pattern and produce Gene Ontology annotations for genes and gene products. In our experiments, GEANN has reached to the precision level of 78% at therecall level of 61%. On a select set of Gene Ontology concepts, GEANN either outperforms or is comparable to two other automated annotation studies. Use of WordNet for semantic pattern matching improves the precision and recall by 24% and 15%, respectively, and the improvement due to semantic pattern matching becomes more apparent as the Gene Ontology terms become more general. Conclusion GEANN is useful for two distinct purposes: (i automating the annotation of genomic entities with Gene Ontology concepts, and (ii providing existing annotations with additional "evidence articles" from the literature. The use of textual extraction patterns that are constructed based on the existing annotations achieve high precision. The semantic pattern matching framework provides a more flexible pattern matching scheme with respect to "exactmatching" with the advantage of locating approximate

  6. GO annotation in InterPro: why stability does not indicate accuracy in a sea of changing annotations.

    Science.gov (United States)

    Sangrador-Vegas, Amaia; Mitchell, Alex L; Chang, Hsin-Yu; Yong, Siew-Yit; Finn, Robert D

    2016-01-01

    The removal of annotation from biological databases is often perceived as an indicator of erroneous annotation. As a corollary, annotation stability is considered to be a measure of reliability. However, diverse data-driven events can affect the stability of annotations in both primary protein sequence databases and the protein family databases that are built upon the sequence databases and used to help annotate them. Here, we describe some of these events and their consequences for the InterPro database, and demonstrate that annotation removal or reassignment is not always linked to incorrect annotation by the curator. Database URL: http://www.ebi.ac.uk/interpro.

  7. Improving pan-genome annotation using whole genome multiple alignment

    Directory of Open Access Journals (Sweden)

    Salzberg Steven L

    2011-06-01

    Full Text Available Abstract Background Rapid annotation and comparisons of genomes from multiple isolates (pan-genomes is becoming commonplace due to advances in sequencing technology. Genome annotations can contain inconsistencies and errors that hinder comparative analysis even within a single species. Tools are needed to compare and improve annotation quality across sets of closely related genomes. Results We introduce a new tool, Mugsy-Annotator, that identifies orthologs and evaluates annotation quality in prokaryotic genomes using whole genome multiple alignment. Mugsy-Annotator identifies anomalies in annotated gene structures, including inconsistently located translation initiation sites and disrupted genes due to draft genome sequencing or pseudogenes. An evaluation of species pan-genomes using the tool indicates that such anomalies are common, especially at translation initiation sites. Mugsy-Annotator reports alternate annotations that improve consistency and are candidates for further review. Conclusions Whole genome multiple alignment can be used to efficiently identify orthologs and annotation problem areas in a bacterial pan-genome. Comparisons of annotated gene structures within a species may show more variation than is actually present in the genome, indicating errors in genome annotation. Our new tool Mugsy-Annotator assists re-annotation efforts by highlighting edits that improve annotation consistency.

  8. Semi-Semantic Annotation: A guideline for the URDU.KON-TB treebank POS annotation

    Directory of Open Access Journals (Sweden)

    Qaiser ABBAS

    2016-12-01

    Full Text Available This work elaborates the semi-semantic part of speech annotation guidelines for the URDU.KON-TB treebank: an annotated corpus. A hierarchical annotation scheme was designed to label the part of speech and then applied on the corpus. This raw corpus was collected from the Urdu Wikipedia and the Jang newspaper and then annotated with the proposed semi-semantic part of speech labels. The corpus contains text of local & international news, social stories, sports, culture, finance, religion, traveling, etc. This exercise finally contributed a part of speech annotation to the URDU.KON-TB treebank. Twenty-two main part of speech categories are divided into subcategories, which conclude the morphological, and semantical information encoded in it. This article reports the annotation guidelines in major; however, it also briefs the development of the URDU.KON-TB treebank, which includes the raw corpus collection, designing & employment of annotation scheme and finally, its statistical evaluation and results. The guidelines presented as follows, will be useful for linguistic community to annotate the sentences not only for the national language Urdu but for the other indigenous languages like Punjab, Sindhi, Pashto, etc., as well.

  9. MixtureTree annotator: a program for automatic colorization and visual annotation of MixtureTree.

    Directory of Open Access Journals (Sweden)

    Shu-Chuan Chen

    Full Text Available The MixtureTree Annotator, written in JAVA, allows the user to automatically color any phylogenetic tree in Newick format generated from any phylogeny reconstruction program and output the Nexus file. By providing the ability to automatically color the tree by sequence name, the MixtureTree Annotator provides a unique advantage over any other programs which perform a similar function. In addition, the MixtureTree Annotator is the only package that can efficiently annotate the output produced by MixtureTree with mutation information and coalescent time information. In order to visualize the resulting output file, a modified version of FigTree is used. Certain popular methods, which lack good built-in visualization tools, for example, MEGA, Mesquite, PHY-FI, TreeView, treeGraph and Geneious, may give results with human errors due to either manually adding colors to each node or with other limitations, for example only using color based on a number, such as branch length, or by taxonomy. In addition to allowing the user to automatically color any given Newick tree by sequence name, the MixtureTree Annotator is the only method that allows the user to automatically annotate the resulting tree created by the MixtureTree program. The MixtureTree Annotator is fast and easy-to-use, while still allowing the user full control over the coloring and annotating process.

  10. MixtureTree annotator: a program for automatic colorization and visual annotation of MixtureTree.

    Science.gov (United States)

    Chen, Shu-Chuan; Ogata, Aaron

    2015-01-01

    The MixtureTree Annotator, written in JAVA, allows the user to automatically color any phylogenetic tree in Newick format generated from any phylogeny reconstruction program and output the Nexus file. By providing the ability to automatically color the tree by sequence name, the MixtureTree Annotator provides a unique advantage over any other programs which perform a similar function. In addition, the MixtureTree Annotator is the only package that can efficiently annotate the output produced by MixtureTree with mutation information and coalescent time information. In order to visualize the resulting output file, a modified version of FigTree is used. Certain popular methods, which lack good built-in visualization tools, for example, MEGA, Mesquite, PHY-FI, TreeView, treeGraph and Geneious, may give results with human errors due to either manually adding colors to each node or with other limitations, for example only using color based on a number, such as branch length, or by taxonomy. In addition to allowing the user to automatically color any given Newick tree by sequence name, the MixtureTree Annotator is the only method that allows the user to automatically annotate the resulting tree created by the MixtureTree program. The MixtureTree Annotator is fast and easy-to-use, while still allowing the user full control over the coloring and annotating process.

  11. Genome Annotation Transfer Utility (GATU: rapid annotation of viral genomes using a closely related reference genome

    Directory of Open Access Journals (Sweden)

    Upton Chris

    2006-06-01

    Full Text Available Abstract Background Since DNA sequencing has become easier and cheaper, an increasing number of closely related viral genomes have been sequenced. However, many of these have been deposited in GenBank without annotations, severely limiting their value to researchers. While maintaining comprehensive genomic databases for a set of virus families at the Viral Bioinformatics Resource Center http://www.biovirus.org and Viral Bioinformatics – Canada http://www.virology.ca, we found that researchers were unnecessarily spending time annotating viral genomes that were close relatives of already annotated viruses. We have therefore designed and implemented a novel tool, Genome Annotation Transfer Utility (GATU, to transfer annotations from a previously annotated reference genome to a new target genome, thereby greatly reducing this laborious task. Results GATU transfers annotations from a reference genome to a closely related target genome, while still giving the user final control over which annotations should be included. GATU also detects open reading frames present in the target but not the reference genome and provides the user with a variety of bioinformatics tools to quickly determine if these ORFs should also be included in the annotation. After this process is complete, GATU saves the newly annotated genome as a GenBank, EMBL or XML-format file. The software is coded in Java and runs on a variety of computer platforms. Its user-friendly Graphical User Interface is specifically designed for users trained in the biological sciences. Conclusion GATU greatly simplifies the initial stages of genome annotation by using a closely related genome as a reference. It is not intended to be a gene prediction tool or a "complete" annotation system, but we have found that it significantly reduces the time required for annotation of genes and mature peptides as well as helping to standardize gene names between related organisms by transferring reference genome

  12. Automated analysis and annotation of basketball video

    Science.gov (United States)

    Saur, Drew D.; Tan, Yap-Peng; Kulkarni, Sanjeev R.; Ramadge, Peter J.

    1997-01-01

    Automated analysis and annotation of video sequences are important for digital video libraries, content-based video browsing and data mining projects. A successful video annotation system should provide users with useful video content summary in a reasonable processing time. Given the wide variety of video genres available today, automatically extracting meaningful video content for annotation still remains hard by using current available techniques. However, a wide range video has inherent structure such that some prior knowledge about the video content can be exploited to improve our understanding of the high-level video semantic content. In this paper, we develop tools and techniques for analyzing structured video by using the low-level information available directly from MPEG compressed video. Being able to work directly in the video compressed domain can greatly reduce the processing time and enhance storage efficiency. As a testbed, we have developed a basketball annotation system which combines the low-level information extracted from MPEG stream with the prior knowledge of basketball video structure to provide high level content analysis, annotation and browsing for events such as wide- angle and close-up views, fast breaks, steals, potential shots, number of possessions and possession times. We expect our approach can also be extended to structured video in other domains.

  13. Quantifying Variability of Manual Annotation in Cryo-Electron Tomograms.

    Science.gov (United States)

    Hecksel, Corey W; Darrow, Michele C; Dai, Wei; Galaz-Montoya, Jesús G; Chin, Jessica A; Mitchell, Patrick G; Chen, Shurui; Jakana, Jemba; Schmid, Michael F; Chiu, Wah

    2016-06-01

    Although acknowledged to be variable and subjective, manual annotation of cryo-electron tomography data is commonly used to answer structural questions and to create a "ground truth" for evaluation of automated segmentation algorithms. Validation of such annotation is lacking, but is critical for understanding the reproducibility of manual annotations. Here, we used voxel-based similarity scores for a variety of specimens, ranging in complexity and segmented by several annotators, to quantify the variation among their annotations. In addition, we have identified procedures for merging annotations to reduce variability, thereby increasing the reliability of manual annotation. Based on our analyses, we find that it is necessary to combine multiple manual annotations to increase the confidence level for answering structural questions. We also make recommendations to guide algorithm development for automated annotation of features of interest.

  14. Corpus annotation for mining biomedical events from literature

    Directory of Open Access Journals (Sweden)

    Tsujii Jun'ichi

    2008-01-01

    Full Text Available Abstract Background Advanced Text Mining (TM such as semantic enrichment of papers, event or relation extraction, and intelligent Question Answering have increasingly attracted attention in the bio-medical domain. For such attempts to succeed, text annotation from the biological point of view is indispensable. However, due to the complexity of the task, semantic annotation has never been tried on a large scale, apart from relatively simple term annotation. Results We have completed a new type of semantic annotation, event annotation, which is an addition to the existing annotations in the GENIA corpus. The corpus has already been annotated with POS (Parts of Speech, syntactic trees, terms, etc. The new annotation was made on half of the GENIA corpus, consisting of 1,000 Medline abstracts. It contains 9,372 sentences in which 36,114 events are identified. The major challenges during event annotation were (1 to design a scheme of annotation which meets specific requirements of text annotation, (2 to achieve biology-oriented annotation which reflect biologists' interpretation of text, and (3 to ensure the homogeneity of annotation quality across annotators. To meet these challenges, we introduced new concepts such as Single-facet Annotation and Semantic Typing, which have collectively contributed to successful completion of a large scale annotation. Conclusion The resulting event-annotated corpus is the largest and one of the best in quality among similar annotation efforts. We expect it to become a valuable resource for NLP (Natural Language Processing-based TM in the bio-medical domain.

  15. Protein function annotation by local binding site surface similarity.

    Science.gov (United States)

    Spitzer, Russell; Cleves, Ann E; Varela, Rocco; Jain, Ajay N

    2014-04-01

    Hundreds of protein crystal structures exist for proteins whose function cannot be confidently determined from sequence similarity. Surflex-PSIM, a previously reported surface-based protein similarity algorithm, provides an alternative method for hypothesizing function for such proteins. The method now supports fully automatic binding site detection and is fast enough to screen comprehensive databases of protein binding sites. The binding site detection methodology was validated on apo/holo cognate protein pairs, correctly identifying 91% of ligand binding sites in holo structures and 88% in apo structures where corresponding sites existed. For correctly detected apo binding sites, the cognate holo site was the most similar binding site 87% of the time. PSIM was used to screen a set of proteins that had poorly characterized functions at the time of crystallization, but were later biochemically annotated. Using a fully automated protocol, this set of 8 proteins was screened against ∼60,000 ligand binding sites from the PDB. PSIM correctly identified functional matches that predated query protein biochemical annotation for five out of the eight query proteins. A panel of 12 currently unannotated proteins was also screened, resulting in a large number of statistically significant binding site matches, some of which suggest likely functions for the poorly characterized proteins.

  16. Critical Assessment of Function Annotation Meeting, 2011

    Energy Technology Data Exchange (ETDEWEB)

    Friedberg, Iddo

    2015-01-21

    The Critical Assessment of Function Annotation meeting was held July 14-15, 2011 at the Austria Conference Center in Vienna, Austria. There were 73 registered delegates at the meeting. We thank the DOE for this award. It helped us organize and support a scientific meeting AFP 2011 as a special interest group (SIG) meeting associated with the ISMB 2011 conference. The conference was held in Vienna, Austria, in July 2011. The AFP SIG was held on July 15-16, 2011 (immediately preceding the conference). The meeting consisted of two components, the first being a series of talks (invited and contributed) and discussion sections dedicated to protein function research, with an emphasis on the theory and practice of computational methods utilized in functional annotation. The second component provided a large-scale assessment of computational methods through participation in the Critical Assessment of Functional Annotation (CAFA).

  17. I2Cnet medical image annotation service.

    Science.gov (United States)

    Chronaki, C E; Zabulis, X; Orphanoudakis, S C

    1997-01-01

    I2Cnet (Image Indexing by Content network) aims to provide services related to the content-based management of images in healthcare over the World-Wide Web. Each I2Cnet server maintains an autonomous repository of medical images and related information. The annotation service of I2Cnet allows specialists to interact with the contents of the repository, adding comments or illustrations to medical images of interest. I2Cnet annotations may be communicated to other users via e-mail or posted to I2Cnet for inclusion in its local repositories. This paper discusses the annotation service of I2Cnet and argues that such services pave the way towards the evolution of active digital medical image libraries.

  18. Annotating images by mining image search results.

    Science.gov (United States)

    Wang, Xin-Jing; Zhang, Lei; Li, Xirong; Ma, Wei-Ying

    2008-11-01

    Although it has been studied for years by the computer vision and machine learning communities, image annotation is still far from practical. In this paper, we propose a novel attempt at model-free image annotation, which is a data-driven approach that annotates images by mining their search results. Some 2.4 million images with their surrounding text are collected from a few photo forums to support this approach. The entire process is formulated in a divide-and-conquer framework where a query keyword is provided along with the uncaptioned image to improve both the effectiveness and efficiency. This is helpful when the collected data set is not dense everywhere. In this sense, our approach contains three steps: 1) the search process to discover visually and semantically similar search results, 2) the mining process to identify salient terms from textual descriptions of the search results, and 3) the annotation rejection process to filter out noisy terms yielded by Step 2. To ensure real-time annotation, two key techniques are leveraged-one is to map the high-dimensional image visual features into hash codes, the other is to implement it as a distributed system, of which the search and mining processes are provided as Web services. As a typical result, the entire process finishes in less than 1 second. Since no training data set is required, our approach enables annotating with unlimited vocabulary and is highly scalable and robust to outliers. Experimental results on both real Web images and a benchmark image data set show the effectiveness and efficiency of the proposed algorithm. It is also worth noting that, although the entire approach is illustrated within the divide-and conquer framework, a query keyword is not crucial to our current implementation. We provide experimental results to prove this.

  19. Software for computing and annotating genomic ranges.

    Directory of Open Access Journals (Sweden)

    Michael Lawrence

    Full Text Available We describe Bioconductor infrastructure for representing and computing on annotated genomic ranges and integrating genomic data with the statistical computing features of R and its extensions. At the core of the infrastructure are three packages: IRanges, GenomicRanges, and GenomicFeatures. These packages provide scalable data structures for representing annotated ranges on the genome, with special support for transcript structures, read alignments and coverage vectors. Computational facilities include efficient algorithms for overlap and nearest neighbor detection, coverage calculation and other range operations. This infrastructure directly supports more than 80 other Bioconductor packages, including those for sequence analysis, differential expression analysis and visualization.

  20. Solar Tutorial and Annotation Resource (STAR)

    Science.gov (United States)

    Showalter, C.; Rex, R.; Hurlburt, N. E.; Zita, E. J.

    2009-12-01

    We have written a software suite designed to facilitate solar data analysis by scientists, students, and the public, anticipating enormous datasets from future instruments. Our “STAR" suite includes an interactive learning section explaining 15 classes of solar events. Users learn software tools that exploit humans’ superior ability (over computers) to identify many events. Annotation tools include time slice generation to quantify loop oscillations, the interpolation of event shapes using natural cubic splines (for loops, sigmoids, and filaments) and closed cubic splines (for coronal holes). Learning these tools in an environment where examples are provided prepares new users to comfortably utilize annotation software with new data. Upon completion of our tutorial, users are presented with media of various solar events and asked to identify and annotate the images, to test their mastery of the system. Goals of the project include public input into the data analysis of very large datasets from future solar satellites, and increased public interest and knowledge about the Sun. In 2010, the Solar Dynamics Observatory (SDO) will be launched into orbit. SDO’s advancements in solar telescope technology will generate a terabyte per day of high-quality data, requiring innovation in data management. While major projects develop automated feature recognition software, so that computers can complete much of the initial event tagging and analysis, still, that software cannot annotate features such as sigmoids, coronal magnetic loops, coronal dimming, etc., due to large amounts of data concentrated in relatively small areas. Previously, solar physicists manually annotated these features, but with the imminent influx of data it is unrealistic to expect specialized researchers to examine every image that computers cannot fully process. A new approach is needed to efficiently process these data. Providing analysis tools and data access to students and the public have proven

  1. Ranking Biomedical Annotations with Annotator’s Semantic Relevancy

    Directory of Open Access Journals (Sweden)

    Aihua Wu

    2014-01-01

    Full Text Available Biomedical annotation is a common and affective artifact for researchers to discuss, show opinion, and share discoveries. It becomes increasing popular in many online research communities, and implies much useful information. Ranking biomedical annotations is a critical problem for data user to efficiently get information. As the annotator’s knowledge about the annotated entity normally determines quality of the annotations, we evaluate the knowledge, that is, semantic relationship between them, in two ways. The first is extracting relational information from credible websites by mining association rules between an annotator and a biomedical entity. The second way is frequent pattern mining from historical annotations, which reveals common features of biomedical entities that an annotator can annotate with high quality. We propose a weighted and concept-extended RDF model to represent an annotator, a biomedical entity, and their background attributes and merge information from the two ways as the context of an annotator. Based on that, we present a method to rank the annotations by evaluating their correctness according to user’s vote and the semantic relevancy between the annotator and the annotated entity. The experimental results show that the approach is applicable and efficient even when data set is large.

  2. Annotated Bibliography of EDGE2D Use

    Energy Technology Data Exchange (ETDEWEB)

    J.D. Strachan and G. Corrigan

    2005-06-24

    This annotated bibliography is intended to help EDGE2D users, and particularly new users, find existing published literature that has used EDGE2D. Our idea is that a person can find existing studies which may relate to his intended use, as well as gain ideas about other possible applications by scanning the attached tables.

  3. Bibliografia de Aztlan: An Annotated Chicano Bibliography.

    Science.gov (United States)

    Barrios, Ernie, Ed.

    More than 300 books and articles published from 1920 to 1971 are reviewed in this annotated bibliography of literature on the Chicano. The citations and reviews are categorized by subject area and deal with contemporary Chicano history, education, health, history of Mexico, literature, native Americans, philosophy, political science, pre-Columbian…

  4. Nutrition & Adolescent Pregnancy: A Selected Annotated Bibliography.

    Science.gov (United States)

    National Agricultural Library (USDA), Washington, DC.

    This annotated bibliography on nutrition and adolescent pregnancy is intended to be a source of technical assistance for nurses, nutritionists, physicians, educators, social workers, and other personnel concerned with improving the health of teenage mothers and their babies. It is divided into two major sections. The first section lists selected…

  5. Structuring and presenting annotated media repositories

    NARCIS (Netherlands)

    Rutledge, L.; Ossenbruggen, J.R. van; Hardman, L.

    2004-01-01

    The Semantic Web envisions a Web that is both human readable and machine processible. In practice, however, there is still a large conceptual gap between annotated content repositories on the one hand, and coherent, human readable Web pages on the other. To bridge this conceptual gap, one needs to s

  6. An Annotated Bibliography in Financial Therapy

    Directory of Open Access Journals (Sweden)

    Dorothy B. Durband

    2010-10-01

    Full Text Available The following annotated bibliography contains a summary of articles and websites, as well as a list of books related to financial therapy. The resources were compiled through e-mail solicitation from members of the Financial Therapy Forum in November 2008. Members of the forum are marked with an asterisk.

  7. Skin Cancer Education Materials: Selected Annotations.

    Science.gov (United States)

    National Cancer Inst. (NIH), Bethesda, MD.

    This annotated bibliography presents 85 entries on a variety of approaches to cancer education. The entries are grouped under three broad headings, two of which contain smaller sub-divisions. The first heading, Public Education, contains prevention and general information, and non-print materials. The second heading, Professional Education,…

  8. Learning to search for images without annotations

    NARCIS (Netherlands)

    Kordumova, S.

    2016-01-01

    Humans are adjusted to the environment and can easily recognize what they see around them or in images. Machines, however, cannot recognize images unless trained to do so. The usual approach is to annotate images with what they capture and train a machine learning algorithm. This thesis focuses on a

  9. Small Group Communication: An Annotated Bibliography.

    Science.gov (United States)

    Gouran, Dennis S.; Guadagnino, Christopher S.

    This annotated bibliography includes sources of information that are primarily concerned with problem solving, decision making, and processes of social influence in small groups, and secondarily deal with other aspects of communication and interaction in groups, such as conflict management and negotiation. The 57 entries, all dating from 1980…

  10. Ludwig von Mises: An Annotated Bibliography.

    Science.gov (United States)

    Gordon, David

    A 117-item annotated bibliography of books, articles, essays, lectures, and reviews by economist Ludwig von Mises is presented. The bibliography is arranged chronologicaly, and is followed by an alphabetical listing of the citations, excluding books. An index and information on the Ludwig von Mises Institute at Auburn University (Alabama) are…

  11. Political Campaign Debating: A Selected, Annotated Bibliography.

    Science.gov (United States)

    Ritter, Kurt; Hellweg, Susan A.

    Noting that television debates have become a regular feature of the media politics by which candidates seek office, this annotated bibliography is particularly intended to assist teachers and researchers of debate, argumentation, and political communication. The 40 citations are limited to the television era of American politics and categorized as…

  12. A Partially Annotated Political Communication Bibliography.

    Science.gov (United States)

    Thornton, Barbara C.

    This 63-page annotated bibliography contains available materials in the area of political communication, a relatively new field of political science. Political communication includes facets of the election process and interaction between political parties and the voter. A variety of materials dating from 1960 to 1972 include books, pamphlets,…

  13. College Students in Transition: An Annotated Bibliography

    Science.gov (United States)

    Foote, Stephanie M., Ed.; Hinkle, Sara M., Ed.; Kranzow, Jeannine, Ed.; Pistilli, Matthew D., Ed.; Miles, LaTonya Rease, Ed.; Simmons, Jannell G., Ed.

    2013-01-01

    The transition from high school to college is an important milestone, but it is only one of many steps in the journey through higher education. This volume is an annotated bibliography of the emerging literature examining the many other transitions students make beyond the first year, including the sophomore year, the transfer experience, and the…

  14. Greeks in Canada (an Annotated Bibliography).

    Science.gov (United States)

    Bombas, Leonidas C.

    This bibliography on Greeks in Canada includes annotated references to both published and (mostly) unpublished works. Among the 70 entries (arranged in alphabetical order by author) are articles, reports, papers, and theses that deal either exclusively with or include a separate section on Greeks in the various Canadian provinces. (GC)

  15. Reflective Annotations: On Becoming a Scholar

    Science.gov (United States)

    Alexander, Mark; Taylor, Caroline; Greenberger, Scott; Watts, Margie; Balch, Riann

    2012-01-01

    This article presents the authors' reflective annotations on becoming a scholar. This paper begins with a discussion on socialization for teaching, followed by a discussion on socialization for service and sense of belonging. Then, it describes how the doctoral process evolves. Finally, it talks about adult learners who pursue doctoral education.

  16. SNAD: sequence name annotation-based designer

    Directory of Open Access Journals (Sweden)

    Gorbalenya Alexander E

    2009-08-01

    Full Text Available Abstract Background A growing diversity of biological data is tagged with unique identifiers (UIDs associated with polynucleotides and proteins to ensure efficient computer-mediated data storage, maintenance, and processing. These identifiers, which are not informative for most people, are often substituted by biologically meaningful names in various presentations to facilitate utilization and dissemination of sequence-based knowledge. This substitution is commonly done manually that may be a tedious exercise prone to mistakes and omissions. Results Here we introduce SNAD (Sequence Name Annotation-based Designer that mediates automatic conversion of sequence UIDs (associated with multiple alignment or phylogenetic tree, or supplied as plain text list into biologically meaningful names and acronyms. This conversion is directed by precompiled or user-defined templates that exploit wealth of annotation available in cognate entries of external databases. Using examples, we demonstrate how this tool can be used to generate names for practical purposes, particularly in virology. Conclusion A tool for controllable annotation-based conversion of sequence UIDs into biologically meaningful names and acronyms has been developed and placed into service, fostering links between quality of sequence annotation, and efficiency of communication and knowledge dissemination among researchers.

  17. La Mujer Chicana: An Annotated Bibliography, 1976.

    Science.gov (United States)

    Chapa, Evey, Ed.; And Others

    Intended to provide interested persons, researchers, and educators with information about "la mujer Chicana", this annotated bibliography cites 320 materials published between 1916 and 1975, with the majority being between 1960 and 1975. The 12 sections cover the following subject areas: Chicana publications; Chicana feminism and…

  18. Mulligan Concept manual therapy: standardizing annotation.

    Science.gov (United States)

    McDowell, Jillian Marie; Johnson, Gillian Margaret; Hetherington, Barbara Helen

    2014-10-01

    Quality technique documentation is integral to the practice of manual therapy, ensuring uniform application and reproducibility of treatment. Manual therapy techniques are described by annotations utilizing a range of acronyms, abbreviations and universal terminology based on biomechanical and anatomical concepts. The various combinations of therapist and patient generated forces utilized in a variety of weight-bearing positions, which are synonymous with Mulligan Concept, challenge practitioners existing annotational skills. An annotation framework with recording rules adapted to the Mulligan Concept is proposed in which the abbreviations incorporate established manual therapy tenets and are detailed in the following sequence of; starting position, side, joint/s, method of application, glide/s, Mulligan technique, movement (or function), whether an assistant is used, overpressure (and by whom) and numbers of repetitions or time and sets. Therapist or patient application of overpressure and utilization of treatment belts or manual techniques must be recorded to capture the complete description. The adoption of the Mulligan Concept annotation framework in this way for documentation purposes will provide uniformity and clarity of information transfer for the future purposes of teaching, clinical practice and audit for its practitioners.

  19. Statistical mechanics of ontology based annotations

    CERN Document Server

    Hoyle, David C

    2016-01-01

    We present a statistical mechanical theory of the process of annotating an object with terms selected from an ontology. The term selection process is formulated as an ideal lattice gas model, but in a highly structured inhomogeneous field. The model enables us to explain patterns recently observed in real-world annotation data sets, in terms of the underlying graph structure of the ontology. By relating the external field strengths to the information content of each node in the ontology graph, the statistical mechanical model also allows us to propose a number of practical metrics for assessing the quality of both the ontology, and the annotations that arise from its use. Using the statistical mechanical formalism we also study an ensemble of ontologies of differing size and complexity; an analysis not readily performed using real data alone. Focusing on regular tree ontology graphs we uncover a rich set of scaling laws describing the growth in the optimal ontology size as the number of objects being annotate...

  20. La Mujer Chicana: An Annotated Bibliography, 1976.

    Science.gov (United States)

    Chapa, Evey, Ed.; And Others

    Intended to provide interested persons, researchers, and educators with information about "la mujer Chicana", this annotated bibliography cites 320 materials published between 1916 and 1975, with the majority being between 1960 and 1975. The 12 sections cover the following subject areas: Chicana publications; Chicana feminism and "el movimiento";…

  1. Communication and Politics: A Selected, Annotated Bibliography.

    Science.gov (United States)

    Kaid, Lynda Lee; And Others

    Noting that the study of communication in political settings is an increasingly popular and important area of teaching and research in many disciplines, this 51-item annotated bibliography reflects the interdisciplinary nature of the field and is designed to incorporate varying approaches to the subject matter. With few exceptions, the books and…

  2. Genotyping and annotation of Affymetrix SNP arrays

    DEFF Research Database (Denmark)

    Lamy, Philippe; Andersen, Claus Lindbjerg; Wikman, Friedrik;

    2006-01-01

    allows us to annotate SNPs that have poor performance, either because of poor experimental conditions or because for one of the alleles the probes do not behave in a dose-response manner. Generally, our method agrees well with a method developed by Affymetrix. When both methods make a call they agree...

  3. Suggested Books for Children: An Annotated Bibliography

    Science.gov (United States)

    NHSA Dialog, 2008

    2008-01-01

    This article provides an annotated bibliography of various children's books. It includes listings of books that illustrate the dynamic relationships within the natural environment, economic context, racial and cultural identities, cross-group similarities and differences, gender, different abilities and stories of injustice and resistance.

  4. Studies of Scientific Disciplines. An Annotated Bibliography.

    Science.gov (United States)

    Weisz, Diane; Kruytbosch, Carlos

    Provided in this bibliography are annotated lists of social studies of science literature, arranged alphabetically by author in 13 disciplinary areas. These areas include astronomy; general biology; biochemistry and molecular biology; biomedicine; chemistry; earth and space sciences; economics; engineering; mathematics; physics; political science;…

  5. DNAVis: interactive visualization of comparative genome annotations

    NARCIS (Netherlands)

    Fiers, M.W.E.J.; Wetering, van de H.; Peeters, T.H.J.M.; Wijk, van J.J.; Nap, J.P.H.

    2006-01-01

    The software package DNAVis offers a fast, interactive and real-time visualization of DNA sequences and their comparative genome annotations. DNAVis implements advanced methods of information visualization such as linked views, perspective walls and semantic zooming, in addition to the display of he

  6. Computer systems for annotation of single molecule fragments

    Science.gov (United States)

    Schwartz, David Charles; Severin, Jessica

    2016-07-19

    There are provided computer systems for visualizing and annotating single molecule images. Annotation systems in accordance with this disclosure allow a user to mark and annotate single molecules of interest and their restriction enzyme cut sites thereby determining the restriction fragments of single nucleic acid molecules. The markings and annotations may be automatically generated by the system in certain embodiments and they may be overlaid translucently onto the single molecule images. An image caching system may be implemented in the computer annotation systems to reduce image processing time. The annotation systems include one or more connectors connecting to one or more databases capable of storing single molecule data as well as other biomedical data. Such diverse array of data can be retrieved and used to validate the markings and annotations. The annotation systems may be implemented and deployed over a computer network. They may be ergonomically optimized to facilitate user interactions.

  7. Annotation Method (AM): SE22_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available ether with predicted molecular formulae and putative structures, were provided as metabolite annotations. Comparison with public data...bases was performed. A grading system was introduced to describe the evidence supporting the annotations. ...

  8. MEETING: Chlamydomonas Annotation Jamboree - October 2003

    Energy Technology Data Exchange (ETDEWEB)

    Grossman, Arthur R

    2007-04-13

    Shotgun sequencing of the nuclear genome of Chlamydomonas reinhardtii (Chlamydomonas throughout) was performed at an approximate 10X coverage by JGI. Roughly half of the genome is now contained on 26 scaffolds, all of which are at least 1.6 Mb, and the coverage of the genome is ~95%. There are now over 200,000 cDNA sequence reads that we have generated as part of the Chlamydomonas genome project (Grossman, 2003; Shrager et al., 2003; Grossman et al. 2007; Merchant et al., 2007); other sequences have also been generated by the Kasuza sequence group (Asamizu et al., 1999; Asamizu et al., 2000) or individual laboratories that have focused on specific genes. Shrager et al. (2003) placed the reads into distinct contigs (an assemblage of reads with overlapping nucleotide sequences), and contigs that group together as part of the same genes have been designated ACEs (assembly of contigs generated from EST information). All of the reads have also been mapped to the Chlamydomonas nuclear genome and the cDNAs and their corresponding genomic sequences have been reassembled, and the resulting assemblage is called an ACEG (an Assembly of contiguous EST sequences supported by genomic sequence) (Jain et al., 2007). Most of the unique genes or ACEGs are also represented by gene models that have been generated by the Joint Genome Institute (JGI, Walnut Creek, CA). These gene models have been placed onto the DNA scaffolds and are presented as a track on the Chlamydomonas genome browser associated with the genome portal (http://genome.jgi-psf.org/Chlre3/Chlre3.home.html). Ultimately, the meeting grant awarded by DOE has helped enormously in the development of an annotation pipeline (a set of guidelines used in the annotation of genes) and resulted in high quality annotation of over 4,000 genes; the annotators were from both Europe and the USA. Some of the people who led the annotation initiative were Arthur Grossman, Olivier Vallon, and Sabeeha Merchant (with many individual

  9. ASAP: Amplification, sequencing & annotation of plastomes

    Directory of Open Access Journals (Sweden)

    Folta Kevin M

    2005-12-01

    Full Text Available Abstract Background Availability of DNA sequence information is vital for pursuing structural, functional and comparative genomics studies in plastids. Traditionally, the first step in mining the valuable information within a chloroplast genome requires sequencing a chloroplast plasmid library or BAC clones. These activities involve complicated preparatory procedures like chloroplast DNA isolation or identification of the appropriate BAC clones to be sequenced. Rolling circle amplification (RCA is being used currently to amplify the chloroplast genome from purified chloroplast DNA and the resulting products are sheared and cloned prior to sequencing. Herein we present a universal high-throughput, rapid PCR-based technique to amplify, sequence and assemble plastid genome sequence from diverse species in a short time and at reasonable cost from total plant DNA, using the large inverted repeat region from strawberry and peach as proof of concept. The method exploits the highly conserved coding regions or intergenic regions of plastid genes. Using an informatics approach, chloroplast DNA sequence information from 5 available eudicot plastomes was aligned to identify the most conserved regions. Cognate primer pairs were then designed to generate ~1 – 1.2 kb overlapping amplicons from the inverted repeat region in 14 diverse genera. Results 100% coverage of the inverted repeat region was obtained from Arabidopsis, tobacco, orange, strawberry, peach, lettuce, tomato and Amaranthus. Over 80% coverage was obtained from distant species, including Ginkgo, loblolly pine and Equisetum. Sequence from the inverted repeat region of strawberry and peach plastome was obtained, annotated and analyzed. Additionally, a polymorphic region identified from gel electrophoresis was sequenced from tomato and Amaranthus. Sequence analysis revealed large deletions in these species relative to tobacco plastome thus exhibiting the utility of this method for structural and

  10. BEACON: automated tool for Bacterial GEnome Annotation ComparisON

    KAUST Repository

    Kalkatawi, Manal Matoq Saeed

    2015-08-18

    Background Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). Results The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON’s utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27 %, while the number of genes without any function assignment is reduced. Conclusions We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/

  11. On Semantic Annotation in Clarin-PL Parallel Corpora

    OpenAIRE

    Violetta Koseska-Toszewa; Roman Roszko

    2015-01-01

    On Semantic Annotation in Clarin-PL Parallel Corpora In the article, the authors present a proposal for semantic annotation in Clarin-PL parallel corpora: Polish-Bulgarian-Russian and Polish-Lithuanian ones. Semantic annotation of quantification is a novum in developing sentence level semantics in multilingual parallel corpora. This is why our semantic annotation is manual. The authors hope it will be interesting to IT specialists working on automatic processing of the given natural langu...

  12. AnnaBot: A Static Verifier for Java Annotation Usage

    OpenAIRE

    Ian Darwin

    2010-01-01

    This paper describes AnnaBot, one of the first tools to verify correct use of Annotation-based metadata in the Java programming language. These Annotations are a standard Java 5 mechanism used to attach metadata to types, methods, or fields without using an external configuration file. A binary representation of the Annotation becomes part of the compiled “.class” file, for inspection by another component or library at runtime. Java Annotations were introduced into the Java language in ...

  13. Interoperable Multimedia Annotation and Retrieval for the Tourism Sector

    NARCIS (Netherlands)

    Chatzitoulousis, Antonios; Efraimidis, Pavlos S.; Athanasiadis, I.N.

    2015-01-01

    The Atlas Metadata System (AMS) employs semantic web annotation techniques in order to create an interoperable information annotation and retrieval platform for the tourism sector. AMS adopts state-of-the-art metadata vocabularies, annotation techniques and semantic web technologies. Interoperabilit

  14. Annotation of the protein coding regions of the equine genome

    DEFF Research Database (Denmark)

    Hestand, Matthew S.; Kalbfleisch, Theodore S.; Coleman, Stephen J.;

    2015-01-01

    Current gene annotation of the horse genome is largely derived from in silico predictions and cross-species alignments. Only a small number of genes are annotated based on equine EST and mRNA sequences. To expand the number of equine genes annotated from equine experimental evidence, we sequenced...

  15. Automatic annotation of head velocity and acceleration in Anvil

    DEFF Research Database (Denmark)

    Jongejan, Bart

    2012-01-01

    We describe an automatic face tracker plugin for the ANVIL annotation tool. The face tracker produces data for velocity and for acceleration in two dimensions. We compare the annotations generated by the face tracking algorithm with independently made manual annotations for head movements...

  16. Systematic interpretation of microarray data using experiment annotations

    Directory of Open Access Journals (Sweden)

    Frohme Marcus

    2006-12-01

    Full Text Available Abstract Background Up to now, microarray data are mostly assessed in context with only one or few parameters characterizing the experimental conditions under study. More explicit experiment annotations, however, are highly useful for interpreting microarray data, when available in a statistically accessible format. Results We provide means to preprocess these additional data, and to extract relevant traits corresponding to the transcription patterns under study. We found correspondence analysis particularly well-suited for mapping such extracted traits. It visualizes associations both among and between the traits, the hereby annotated experiments, and the genes, revealing how they are all interrelated. Here, we apply our methods to the systematic interpretation of radioactive (single channel and two-channel data, stemming from model organisms such as yeast and drosophila up to complex human cancer samples. Inclusion of technical parameters allows for identification of artifacts and flaws in experimental design. Conclusion Biological and clinical traits can act as landmarks in transcription space, systematically mapping the variance of large datasets from the predominant changes down toward intricate details.

  17. Model and Interoperability using Meta Data Annotations

    Science.gov (United States)

    David, O.

    2011-12-01

    Software frameworks and architectures are in need for meta data to efficiently support model integration. Modelers have to know the context of a model, often stepping into modeling semantics and auxiliary information usually not provided in a concise structure and universal format, consumable by a range of (modeling) tools. XML often seems the obvious solution for capturing meta data, but its wide adoption to facilitate model interoperability is limited by XML schema fragmentation, complexity, and verbosity outside of a data-automation process. Ontologies seem to overcome those shortcomings, however the practical significance of their use remains to be demonstrated. OMS version 3 took a different approach for meta data representation. The fundamental building block of a modular model in OMS is a software component representing a single physical process, calibration method, or data access approach. Here, programing language features known as Annotations or Attributes were adopted. Within other (non-modeling) frameworks it has been observed that annotations lead to cleaner and leaner application code. Framework-supported model integration, traditionally accomplished using Application Programming Interfaces (API) calls is now achieved using descriptive code annotations. Fully annotated components for various hydrological and Ag-system models now provide information directly for (i) model assembly and building, (ii) data flow analysis for implicit multi-threading or visualization, (iii) automated and comprehensive model documentation of component dependencies, physical data properties, (iv) automated model and component testing, calibration, and optimization, and (v) automated audit-traceability to account for all model resources leading to a particular simulation result. Such a non-invasive methodology leads to models and modeling components with only minimal dependencies on the modeling framework but a strong reference to its originating code. Since models and

  18. How well are protein structures annotated in secondary databases?

    Science.gov (United States)

    Rother, Kristian; Michalsky, Elke; Leser, Ulf

    2005-09-01

    We investigated to what extent Protein Data Bank (PDB) entries are annotated with second-party information based on existing cross-references between PDB and 15 other databases. We report 2 interesting findings. First, there is a clear "annotation gap" for structures less than 7 years old for secondary databases that are manually curated. Second, the examined databases overlap with each other quite well, dividing the PDB into 2 well-annotated thirds and one poorly annotated third. Both observations should be taken into account in any study depending on the selection of protein structures by their annotation.

  19. FragKB: structural and literature annotation resource of conserved peptide fragments and residues.

    Directory of Open Access Journals (Sweden)

    Ashish V Tendulkar

    Full Text Available BACKGROUND: FragKB (Fragment Knowledgebase is a repository of clusters of structurally similar fragments from proteins. Fragments are annotated with information at the level of sequence, structure and function, integrating biological descriptions derived from multiple existing resources and text mining. METHODOLOGY: FragKB contains approximately 400,000 conserved fragments from 4,800 representative proteins from PDB. Literature annotations are extracted from more than 1,700 articles and are available for over 12,000 fragments. The underlying systematic annotation workflow of FragKB ensures efficient update and maintenance of this database. The information in FragKB can be accessed through a web interface that facilitates sequence and structural visualization of fragments together with known literature information on the consequences of specific residue mutations and functional annotations of proteins and fragment clusters. FragKB is accessible online at http://ubio.bioinfo.cnio.es/biotools/fragkb/. SIGNIFICANCE: The information presented in FragKB can be used for modeling protein structures, for designing novel proteins and for functional characterization of related fragments. The current release is focused on functional characterization of proteins through inspection of conservation of the fragments.

  20. A Novel Technique to Image Annotation using Neural Network

    Directory of Open Access Journals (Sweden)

    Pankaj Savita

    2013-03-01

    Full Text Available : Automatic annotation of digital pictures is a key technology for managing and retrieving images from large image collection. Traditional image semantics extraction and representation schemes were commonly divided into two categories, namely visual features and text annotations. However, visual feature scheme are difficult to extract and are often semantically inconsistent. On the other hand, the image semantics can be well represented by text annotations. It is also easier to retrieve images according to their annotations. Traditional image annotation techniques are time-consuming and requiring lots of human effort. In this paper we propose Neural Network based a novel approach to the problem of image annotation. These approaches are applied to the Image data set. Our main work is focused on the image annotation by using multilayer perceptron, which exhibits a clear-cut idea on application of multilayer perceptron with special features. MLP Algorithm helps us to discover the concealed relations between image data and annotation data, and annotate image according to such relations. By using this algorithm we can save more memory space, and in case of web applications, transferring of images and download should be fast. This paper reviews 50 image annotation systems using supervised machine learning Techniques to annotate images for image retrieval. Results obtained show that the multi layer perceptron Neural Network classifier outperforms conventional DST Technique.

  1. Cognition inspired framework for indoor scene annotation

    Science.gov (United States)

    Ye, Zhipeng; Liu, Peng; Zhao, Wei; Tang, Xianglong

    2015-09-01

    We present a simple yet effective scene annotation framework based on a combination of bag-of-visual words (BoVW), three-dimensional scene structure estimation, scene context, and cognitive theory. From a macroperspective, the proposed cognition-based hybrid motivation framework divides the annotation problem into empirical inference and real-time classification. Inspired by the inference ability of human beings, common objects of indoor scenes are defined for experience-based inference, while in the real-time classification stage, an improved BoVW-based multilayer abstract semantics labeling method is proposed by introducing abstract semantic hierarchies to narrow the semantic gap and improve the performance of object categorization. The proposed framework was evaluated on a variety of common data sets and experimental results proved its effectiveness.

  2. Image Semantic Automatic Annotation by Relevance Feedback

    Institute of Scientific and Technical Information of China (English)

    ZHANG Tong-zhen; SHEN Rui-min

    2007-01-01

    A large semantic gap exists between content based index retrieval (CBIR) and high-level semantic, additional semantic information should be attached to the images, it refers in three respects including semantic representation model, semantic information building and semantic retrieval techniques. In this paper, we introduce an associated semantic network and an automatic semantic annotation system. In the system, a semantic network model is employed as the semantic representation model, it uses semantic keywords, linguistic ontology and low-level features in semantic similarity calculating. Through several times of users' relevance feedback, semantic network is enriched automatically. To speed up the growth of semantic network and get a balance annotation, semantic seeds and semantic loners are employed especially.

  3. A Concept Annotation System for Clinical Records

    CERN Document Server

    Kang, Ning; Afzal, Zubair; Singh, Bharat; Schuemie, Martijn J; van Mulligen, Erik M; Kors, Jan A

    2010-01-01

    Unstructured information comprises a valuable source of data in clinical records. For text mining in clinical records, concept extraction is the first step in finding assertions and relationships. This study presents a system developed for the annotation of medical concepts, including medical problems, tests, and treatments, mentioned in clinical records. The system combines six publicly available named entity recognition system into one framework, and uses a simple voting scheme that allows to tune precision and recall of the system to specific needs. The system provides both a web service interface and a UIMA interface which can be easily used by other systems. The system was tested in the fourth i2b2 challenge and achieved an F-score of 82.1% for the concept exact match task, a score which is among the top-ranking systems. To our knowledge, this is the first publicly available clinical record concept annotation system.

  4. Cadec: A corpus of adverse drug event annotations.

    Science.gov (United States)

    Karimi, Sarvnaz; Metke-Jimenez, Alejandro; Kemp, Madonna; Wang, Chen

    2015-06-01

    CSIRO Adverse Drug Event Corpus (Cadec) is a new rich annotated corpus of medical forum posts on patient-reported Adverse Drug Events (ADEs). The corpus is sourced from posts on social media, and contains text that is largely written in colloquial language and often deviates from formal English grammar and punctuation rules. Annotations contain mentions of concepts such as drugs, adverse effects, symptoms, and diseases linked to their corresponding concepts in controlled vocabularies, i.e., SNOMED Clinical Terms and MedDRA. The quality of the annotations is ensured by annotation guidelines, multi-stage annotations, measuring inter-annotator agreement, and final review of the annotations by a clinical terminologist. This corpus is useful for studies in the area of information extraction, or more generally text mining, from social media to detect possible adverse drug reactions from direct patient reports. The corpus is publicly available at https://data.csiro.au.(1).

  5. Building a semantically annotated corpus of clinical texts.

    Science.gov (United States)

    Roberts, Angus; Gaizauskas, Robert; Hepple, Mark; Demetriou, George; Guo, Yikun; Roberts, Ian; Setzer, Andrea

    2009-10-01

    In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains.

  6. Deburring: an annotated bibliography. Volume VI

    Energy Technology Data Exchange (ETDEWEB)

    Gillespie, L.K.

    1980-07-01

    An annotated summary of 138 articles and publications on burrs, burr prevention and deburring is presented. Thirty-seven deburring processes are listed. Entries cited include English, Russian, French, Japanese, and German language articles. Entries are indexed by deburring processes, author, and language. Indexes also indicate which references discuss equipment and tooling, how to use a proces economics, burr properties, and how to design to minimize burr problems. Research studies are identified as are the materials deburred.

  7. Deburring: an annotated bibliography. Volume V

    Energy Technology Data Exchange (ETDEWEB)

    Gillespie, L.K.

    1978-01-01

    An annotated summary of 204 articles and publications on burrs, burr prevention and deburring is presented. Thirty-seven deburring processes are listed. Entries cited include English, Russian, French, Japanese and German language articles. Entries are indexed by deburring processes, author, and language. Indexes also indicate which references discuss equipment and tooling, how to use a process, economics, burr properties, and how to design to minimize burr problems. Research studies are identified as are the materials deburred.

  8. Cultural nationalism: a review and annotated bibliography

    OpenAIRE

    Woods, Eric Taylor

    2014-01-01

    This review and annotated bibliography is part of The State of Nationalism (SoN), a comprehensive guide to the study of nationalism. Topic of this first contribution is cultural nationalism. This concept generally refers to ideas and practices that relate to the intended revival of a purported national community’s culture. If political nationalism is focused on the achievement of political autonomy, cultural nationalism is focused on the cultivation of a nation.

  9. Nonlinear Deep Kernel Learning for Image Annotation.

    Science.gov (United States)

    Jiu, Mingyuan; Sahbi, Hichem

    2017-02-08

    Multiple kernel learning (MKL) is a widely used technique for kernel design. Its principle consists in learning, for a given support vector classifier, the most suitable convex (or sparse) linear combination of standard elementary kernels. However, these combinations are shallow and often powerless to capture the actual similarity between highly semantic data, especially for challenging classification tasks such as image annotation. In this paper, we redefine multiple kernels using deep multi-layer networks. In this new contribution, a deep multiple kernel is recursively defined as a multi-layered combination of nonlinear activation functions, each one involves a combination of several elementary or intermediate kernels, and results into a positive semi-definite deep kernel. We propose four different frameworks in order to learn the weights of these networks: supervised, unsupervised, kernel-based semisupervised and Laplacian-based semi-supervised. When plugged into support vector machines (SVMs), the resulting deep kernel networks show clear gain, compared to several shallow kernels for the task of image annotation. Extensive experiments and analysis on the challenging ImageCLEF photo annotation benchmark, the COREL5k database and the Banana dataset validate the effectiveness of the proposed method.

  10. Management Tool for Semantic Annotations in WSDL

    Science.gov (United States)

    Boissel-Dallier, Nicolas; Lorré, Jean-Pierre; Benaben, Frédérick

    Semantic Web Services add features to automate web services discovery and composition. A new standard called SAWSDL emerged recently as a W3C recommendation to add semantic annotations within web service descriptions (WSDL). In order to manipulate such information in Java program we need an XML parser. Two open-source libraries already exist (SAWSDL4J and Woden4SAWSDL) but they don't meet all our specific needs such as support for WSDL 1.1 and 2.0. This paper presents a new tool, called EasyWSDL, which is able to handle semantic annotations as well as to manage the full WSDL description thanks to a plug-in mechanism. This tool allows us to read/edit/create a WSDL description and related annotations thanks to a uniform API, in both 1.1 and 2.0 versions. This document compares these three libraries and presents its integration into Dragon the OW2 open-source SOA governance tool.

  11. Molecular characterization and genetic diversity of Jatropha curcas L. in Costa Rica

    Directory of Open Access Journals (Sweden)

    Marcela Vásquez-Mayorga

    2017-02-01

    Full Text Available We estimated the genetic diversity of 50 Jatropha curcas samples from the Costa Rican germplasm bank using 18 EST-SSR, one G-SSR and nrDNA-ITS markers. We also evaluated the phylogenetic relationships among samples using nuclear ribosomal ITS markers. Non-toxicity was evaluated using G-SSRs and SCARs markers. A Neighbor-Joining (NJ tree and a Maximum Likelihood (ML tree were constructed using SSR markers and ITS sequences, respectively. Heterozygosity was moderate (He = 0.346, but considerable compared to worldwide values for J. curcas. The PIC (PIC = 0.274 and inbreeding coefficient (f =  − 0.102 were both low. Clustering was not related to the geographical origin of accessions. International accessions clustered independently of collection sites, suggesting a lack of genetic structure, probably due to the wide distribution of this crop and ample gene flow. Molecular markers identified only one non-toxic accession (JCCR-24 from Mexico. This work is part of a countrywide effort to characterize the genetic diversity of the Jatropha curcas germplasm bank in Costa Rica.

  12. A relation based measure of semantic similarity for Gene Ontology annotations

    Directory of Open Access Journals (Sweden)

    Gaudin Benoit

    2008-11-01

    similarity between annotations that exploits all available information without introducing assumptions about the nature of the ontology or data. We preserve the principles underlying instance based methods of semantic similarity of terms at the annotation level. As a result our measure better describes the information contained in annotations associated with gene products and as a result is better suited to characterizing and classifying gene products through their annotations.

  13. An Unsupervised Model for Exploring Hierarchical Semantics from Social Annotations

    Science.gov (United States)

    Zhou, Mianwei; Bao, Shenghua; Wu, Xian; Yu, Yong

    This paper deals with the problem of exploring hierarchical semantics from social annotations. Recently, social annotation services have become more and more popular in Semantic Web. It allows users to arbitrarily annotate web resources, thus, largely lowers the barrier to cooperation. Furthermore, through providing abundant meta-data resources, social annotation might become a key to the development of Semantic Web. However, on the other hand, social annotation has its own apparent limitations, for instance, 1) ambiguity and synonym phenomena and 2) lack of hierarchical information. In this paper, we propose an unsupervised model to automatically derive hierarchical semantics from social annotations. Using a social bookmark service Del.icio.us as example, we demonstrate that the derived hierarchical semantics has the ability to compensate those shortcomings. We further apply our model on another data set from Flickr to testify our model's applicability on different environments. The experimental results demonstrate our model's efficiency.

  14. Applying Reliability Metrics to Co-Reference Annotation

    CERN Document Server

    Passonneau, R J

    1997-01-01

    Studies of the contextual and linguistic factors that constrain discourse phenomena such as reference are coming to depend increasingly on annotated language corpora. In preparing the corpora, it is important to evaluate the reliability of the annotation, but methods for doing so have not been readily available. In this report, I present a method for computing reliability of coreference annotation. First I review a method for applying the information retrieval metrics of recall and precision to coreference annotation proposed by Marc Vilain and his collaborators. I show how this method makes it possible to construct contingency tables for computing Cohen's Kappa, a familiar reliability metric. By comparing recall and precision to reliability on the same data sets, I also show that recall and precision can be misleadingly high. Because Kappa factors out chance agreement among coders, it is a preferable measure for developing annotated corpora where no pre-existing target annotation exists.

  15. A robust data-driven approach for gene ontology annotation

    OpenAIRE

    2014-01-01

    Gene ontology (GO) and GO annotation are important resources for biological information management and knowledge discovery, but the speed of manual annotation became a major bottleneck of database curation. BioCreative IV GO annotation task aims to evaluate the performance of system that automatically assigns GO terms to genes based on the narrative sentences in biomedical literature. This article presents our work in this task as well as the experimental results after the competition. For th...

  16. Updating RNA-Seq analyses after re-annotation

    OpenAIRE

    Roberts, Adam; Schaeffer, Lorian; Pachter, Lior

    2013-01-01

    The estimation of isoform abundances from RNA-Seq data requires a time-intensive step of mapping reads to either an assembled or previously annotated transcriptome, followed by an optimization procedure for deconvolution of multi-mapping reads. These procedures are essential for downstream analysis such as differential expression. In cases where it is desirable to adjust the underlying annotation, for example, on the discovery of novel isoforms or errors in existing annotations, current pipel...

  17. TTC’15 Live Contest Case Study: Transformation of Java Annotations

    OpenAIRE

    Křikava, Filip; Monperrus, Martin

    2015-01-01

    International audience; Java 5 introduced annotations as a systematic mean to attach syntactic meta-data to various elements of Java source code. Since then, annotations have been extensively used by a number of libraries, frameworks and tools to conveniently extend behaviour of Java programs that would otherwise have to be done manually or synthesised from external resources. The annotations are usually processed through reflection and the extended behaviour is injected into Java classes usi...

  18. Literacy and Basic Education: A Selected, Annotated Bibliography. Annotated Bibliography #3.

    Science.gov (United States)

    Michigan State Univ., East Lansing. Non-Formal Education Information Center.

    A selected annotated bibliography on literacy and basic education, including contributions from practitioners in the worldwide non-formal education network and compiled for them, has three interrelated themes: integration of literacy programs with broader development efforts; the learner-centered or "psycho-social" approach to literacy,…

  19. Semantator: semantic annotator for converting biomedical text to linked data.

    Science.gov (United States)

    Tao, Cui; Song, Dezhao; Sharma, Deepak; Chute, Christopher G

    2013-10-01

    More than 80% of biomedical data is embedded in plain text. The unstructured nature of these text-based documents makes it challenging to easily browse and query the data of interest in them. One approach to facilitate browsing and querying biomedical text is to convert the plain text to a linked web of data, i.e., converting data originally in free text to structured formats with defined meta-level semantics. In this paper, we introduce Semantator (Semantic Annotator), a semantic-web-based environment for annotating data of interest in biomedical documents, browsing and querying the annotated data, and interactively refining annotation results if needed. Through Semantator, information of interest can be either annotated manually or semi-automatically using plug-in information extraction tools. The annotated results will be stored in RDF and can be queried using the SPARQL query language. In addition, semantic reasoners can be directly applied to the annotated data for consistency checking and knowledge inference. Semantator has been released online and was used by the biomedical ontology community who provided positive feedbacks. Our evaluation results indicated that (1) Semantator can perform the annotation functionalities as designed; (2) Semantator can be adopted in real applications in clinical and transactional research; and (3) the annotated results using Semantator can be easily used in Semantic-web-based reasoning tools for further inference.

  20. Automatic medical X-ray image classification using annotation.

    Science.gov (United States)

    Zare, Mohammad Reza; Mueen, Ahmed; Seng, Woo Chaw

    2014-02-01

    The demand for automatically classification of medical X-ray images is rising faster than ever. In this paper, an approach is presented to gain high accuracy rate for those classes of medical database with high ratio of intraclass variability and interclass similarities. The classification framework was constructed via annotation using the following three techniques: annotation by binary classification, annotation by probabilistic latent semantic analysis, and annotation using top similar images. Next, final annotation was constructed by applying ranking similarity on annotated keywords made by each technique. The final annotation keywords were then divided into three levels according to the body region, specific bone structure in body region as well as imaging direction. Different weights were given to each level of the keywords; they are then used to calculate the weightage for each category of medical images based on their ground truth annotation. The weightage computed from the generated annotation of query image was compared with the weightage of each category of medical images, and then the query image would be assigned to the category with closest weightage to the query image. The average accuracy rate reported is 87.5 %.

  1. Ontology-Based Prediction and Prioritization of Gene Functional Annotations.

    Science.gov (United States)

    Chicco, Davide; Masseroli, Marco

    2016-01-01

    Genes and their protein products are essential molecular units of a living organism. The knowledge of their functions is key for the understanding of physiological and pathological biological processes, as well as in the development of new drugs and therapies. The association of a gene or protein with its functions, described by controlled terms of biomolecular terminologies or ontologies, is named gene functional annotation. Very many and valuable gene annotations expressed through terminologies and ontologies are available. Nevertheless, they might include some erroneous information, since only a subset of annotations are reviewed by curators. Furthermore, they are incomplete by definition, given the rapidly evolving pace of biomolecular knowledge. In this scenario, computational methods that are able to quicken the annotation curation process and reliably suggest new annotations are very important. Here, we first propose a computational pipeline that uses different semantic and machine learning methods to predict novel ontology-based gene functional annotations; then, we introduce a new semantic prioritization rule to categorize the predicted annotations by their likelihood of being correct. Our tests and validations proved the effectiveness of our pipeline and prioritization of predicted annotations, by selecting as most likely manifold predicted annotations that were later confirmed.

  2. A Novel Approach to Semantic and Coreference Annotation at LLNL

    Energy Technology Data Exchange (ETDEWEB)

    Firpo, M

    2005-02-04

    A case is made for the importance of high quality semantic and coreference annotation. The challenges of providing such annotation are described. Asperger's Syndrome is introduced, and the connections are drawn between the needs of text annotation and the abilities of persons with Asperger's Syndrome to meet those needs. Finally, a pilot program is recommended wherein semantic annotation is performed by people with Asperger's Syndrome. The primary points embodied in this paper are as follows: (1) Document annotation is essential to the Natural Language Processing (NLP) projects at Lawrence Livermore National Laboratory (LLNL); (2) LLNL does not currently have a system in place to meet its need for text annotation; (3) Text annotation is challenging for a variety of reasons, many related to its very rote nature; (4) Persons with Asperger's Syndrome are particularly skilled at rote verbal tasks, and behavioral experts agree that they would excel at text annotation; and (6) A pilot study is recommend in which two to three people with Asperger's Syndrome annotate documents and then the quality and throughput of their work is evaluated relative to that of their neuro-typical peers.

  3. Review of actinide-sediment reactions with an annotated bibliography

    Energy Technology Data Exchange (ETDEWEB)

    Ames, L.L.; Rai, D.; Serne, R.J.

    1976-02-10

    The annotated bibliography is divided into sections on chemistry and geochemistry, migration and accumulation, cultural distributions, natural distributions, and bibliographies and annual reviews. (LK)

  4. Supporting One-Time Point Annotations for Gesture Recognition.

    Science.gov (United States)

    Nguyen-Dinh, Long-Van; Calatroni, Alberto; Troester, Gerhard

    2016-12-08

    This paper investigates a new annotation technique that reduces significantly the amount of time to annotate training data for gesture recognition. Conventionally, the annotations comprise the start and end times, and the corresponding labels of gestures in sensor recordings. In this work, we propose a one-time point annotation in which labelers do not have to select the start and end time carefully, but just mark a one-time point within the time a gesture is happening. The technique gives more freedom and reduces significantly the burden for labelers. To make the one-time point annotations applicable, we propose a novel BoundarySearch algorithm to find automatically the correct temporal boundaries of gestures by discovering data patterns around their given one-time point annotations. The corrected annotations are then used to train gesture models. We evaluate the method on three applications from wearable gesture recognition with various gesture classes (10-17 classes) recorded with different sensor modalities. The results show that training on the corrected annotations can achieve performances close to a fully supervised training on clean annotations (lower by just up to 5% F1-score on average). Furthermore, the BoundarySearch algorithm is also evaluated on the ChaLearn 2014 multi-modal gesture recognition challenge recorded with Kinect sensors from computer vision and achieves similar results.

  5. Correction of the Caulobacter crescentus NA1000 genome annotation.

    Directory of Open Access Journals (Sweden)

    Bert Ely

    Full Text Available Bacterial genome annotations are accumulating rapidly in the GenBank database and the use of automated annotation technologies to create these annotations has become the norm. However, these automated methods commonly result in a small, but significant percentage of genome annotation errors. To improve accuracy and reliability, we analyzed the Caulobacter crescentus NA1000 genome utilizing computer programs Artemis and MICheck to manually examine the third codon position GC content, alignment to a third codon position GC frame plot peak, and matches in the GenBank database. We identified 11 new genes, modified the start site of 113 genes, and changed the reading frame of 38 genes that had been incorrectly annotated. Furthermore, our manual method of identifying protein-coding genes allowed us to remove 112 non-coding regions that had been designated as coding regions. The improved NA1000 genome annotation resulted in a reduction in the use of rare codons since noncoding regions with atypical codon usage were removed from the annotation and 49 new coding regions were added to the annotation. Thus, a more accurate codon usage table was generated as well. These results demonstrate that a comparison of the location of peaks third codon position GC content to the location of protein coding regions could be used to verify the annotation of any genome that has a GC content that is greater than 60%.

  6. Correction of the Caulobacter crescentus NA1000 genome annotation.

    Science.gov (United States)

    Ely, Bert; Scott, LaTia Etheredge

    2014-01-01

    Bacterial genome annotations are accumulating rapidly in the GenBank database and the use of automated annotation technologies to create these annotations has become the norm. However, these automated methods commonly result in a small, but significant percentage of genome annotation errors. To improve accuracy and reliability, we analyzed the Caulobacter crescentus NA1000 genome utilizing computer programs Artemis and MICheck to manually examine the third codon position GC content, alignment to a third codon position GC frame plot peak, and matches in the GenBank database. We identified 11 new genes, modified the start site of 113 genes, and changed the reading frame of 38 genes that had been incorrectly annotated. Furthermore, our manual method of identifying protein-coding genes allowed us to remove 112 non-coding regions that had been designated as coding regions. The improved NA1000 genome annotation resulted in a reduction in the use of rare codons since noncoding regions with atypical codon usage were removed from the annotation and 49 new coding regions were added to the annotation. Thus, a more accurate codon usage table was generated as well. These results demonstrate that a comparison of the location of peaks third codon position GC content to the location of protein coding regions could be used to verify the annotation of any genome that has a GC content that is greater than 60%.

  7. Introduction to annotated logics foundations for paracomplete and paraconsistent reasoning

    CERN Document Server

    Abe, Jair Minoro; Nakamatsu, Kazumi

    2015-01-01

    This book is written as an introduction to annotated logics. It provides logical foundations for annotated logics, discusses some interesting applications of these logics and also includes the authors' contributions to annotated logics. The central idea of the book is to show how annotated logic can be applied as a tool to solve problems of technology and of applied science. The book will be of interest to pure and applied logicians, philosophers, and computer scientists as a monograph on a kind of paraconsistent logic. But, the layman will also take profit from its reading.

  8. Towards a Library of Standard Operating Procedures (SOPs) for (meta)genomic annotation

    Energy Technology Data Exchange (ETDEWEB)

    Kyrpides, Nikos; Angiuoli, Samuel V.; Cochrane, Guy; Field, Dawn; Garrity, George; Gussman, Aaron; Kodira, Chinnappa D.; Klimke, William; Kyrpides, Nikos; Madupu, Ramana; Markowitz, Victor; Tatusova, Tatiana; Thomson, Nick; White, Owen

    2008-04-01

    Genome annotations describe the features of genomes and accompany sequences in genome databases. The methodologies used to generate genome annotation are diverse and typically vary amongst groups. Descriptions of the annotation procedure are helpful in interpreting genome annotation data. Standard Operating Procedures (SOPs) for genome annotation describe the processes that generate genome annotations. Some groups are currently documenting procedures but standards are lacking for structure and content of annotation SOPs. In addition, there is no central repository to store and disseminate procedures and protocols for genome annotation. We highlight the importance of SOPs for genome annotation and endorse a central online repository of SOPs.

  9. EFFICIENT VIDEO ANNOTATIONS BY AN IMAGE GROUPS

    Directory of Open Access Journals (Sweden)

    K . Mahi balan

    2015-10-01

    Full Text Available Searching desirable events in uncontrolled videos is a challenging task. So, researches mainly focus on obtaining concepts from numerous labelled videos. But it is time consuming and labour expensive to collect a large amount of required labelled videos for training event models under various condition. To avoid this problem, we propose to leverage abundant Web images for videos since Web images contain a rich source of information with many events roughly annotated and taken under various conditions. However, information from the Web is difficult .so,brute force knowledge transfer of images may hurt the video annotation performance. so, we propose a novel Group-based Domain Adaptation learning framework to leverage different groups of knowledge (source target queried from the Web image search engine to consumer videos (domain target. Different from old methods using multiple source domains of images, our method makes the Web images according to their intrinsic semantic relationships instead of source. Specifically, two different types of groups ( event-specific groups and concept-specific groups are exploited to respectively describe the event-level and concept-level semantic meanings of target-domain videos.

  10. Identification and systematic annotation of tissue-specific differentially methylated regions using the Illumina 450k array

    NARCIS (Netherlands)

    Slieker, R.C.; Bos, S.D.; Goeman, J.J.; Bovee, J.V.; Talens, R.P.; Breggen, R. van der; Suchiman, H.E.; Lameijer, E.W.; Putter, H.; Akker, E.B. van den; Zhang, Y.; Jukema, J.W.; Slagboom, P.E.; Meulenbelt, I.; Heijmans, B.T.

    2013-01-01

    BACKGROUND: DNA methylation has been recognized as a key mechanism in cell differentiation. Various studies have compared tissues to characterize epigenetically regulated genomic regions, but due to differences in study design and focus there still is no consensus as to the annotation of genomic reg

  11. Identification and systematic annotation of tissue-specific differentially methylated regions using the Illumina 450k array

    NARCIS (Netherlands)

    Slieker, R.C.; Bos, S.D.; Goeman, J,J; Bovee, J.V.M.G.; Talens, R.P.; Van der Breggen, R.; Suchiman, H.E.D.; Lameijer, E.W.; Putter, H.; Van dern Akker, E.B.; Zhang, Y.; Jukema, J.W.; Slagboom, P.E.; Meulenbelt, I.; Heijmans, B.T.

    2013-01-01

    Background: DNA methylation has been recognized as a key mechanism in cell differentiation. Various studies have compared tissues to characterize epigenetically regulated genomic regions, but due to differences in study design and focus there still is no consensus as to the annotation of genomic reg

  12. Annotate-it: a Swiss-knife approach to annotation, analysis and interpretation of single nucleotide variation in human disease.

    Science.gov (United States)

    Sifrim, Alejandro; Van Houdt, Jeroen Kj; Tranchevent, Leon-Charles; Nowakowska, Beata; Sakai, Ryo; Pavlopoulos, Georgios A; Devriendt, Koen; Vermeesch, Joris R; Moreau, Yves; Aerts, Jan

    2012-01-01

    The increasing size and complexity of exome/genome sequencing data requires new tools for clinical geneticists to discover disease-causing variants. Bottlenecks in identifying the causative variation include poor cross-sample querying, constantly changing functional annotation and not considering existing knowledge concerning the phenotype. We describe a methodology that facilitates exploration of patient sequencing data towards identification of causal variants under different genetic hypotheses. Annotate-it facilitates handling, analysis and interpretation of high-throughput single nucleotide variant data. We demonstrate our strategy using three case studies. Annotate-it is freely available and test data are accessible to all users at http://www.annotate-it.org.

  13. Evolutionary interrogation of human biology in well-annotated genomic framework of rhesus macaque.

    Science.gov (United States)

    Zhang, Shi-Jian; Liu, Chu-Jun; Yu, Peng; Zhong, Xiaoming; Chen, Jia-Yu; Yang, Xinzhuang; Peng, Jiguang; Yan, Shouyu; Wang, Chenqu; Zhu, Xiaotong; Xiong, Jingwei; Zhang, Yong E; Tan, Bertrand Chin-Ming; Li, Chuan-Yun

    2014-05-01

    With genome sequence and composition highly analogous to human, rhesus macaque represents a unique reference for evolutionary studies of human biology. Here, we developed a comprehensive genomic framework of rhesus macaque, the RhesusBase2, for evolutionary interrogation of human genes and the associated regulations. A total of 1,667 next-generation sequencing (NGS) data sets were processed, integrated, and evaluated, generating 51.2 million new functional annotation records. With extensive NGS annotations, RhesusBase2 refined the fine-scale structures in 30% of the macaque Ensembl transcripts, reporting an accurate, up-to-date set of macaque gene models. On the basis of these annotations and accurate macaque gene models, we further developed an NGS-oriented Molecular Evolution Gateway to access and visualize macaque annotations in reference to human orthologous genes and associated regulations (www.rhesusbase.org/molEvo). We highlighted the application of this well-annotated genomic framework in generating hypothetical link of human-biased regulations to human-specific traits, by using mechanistic characterization of the DIEXF gene as an example that provides novel clues to the understanding of digestive system reduction in human evolution. On a global scale, we also identified a catalog of 9,295 human-biased regulatory events, which may represent novel elements that have a substantial impact on shaping human transcriptome and possibly underpin recent human phenotypic evolution. Taken together, we provide an NGS data-driven, information-rich framework that will broadly benefit genomics research in general and serves as an important resource for in-depth evolutionary studies of human biology.

  14. Annotated receipts capture household food purchases from a broad range of sources

    Directory of Open Access Journals (Sweden)

    Shimotsu Scott T

    2009-07-01

    Full Text Available Abstract Background Accurate measurement of household food purchase behavior (HFPB is important for understanding its association with household characteristics, individual dietary intake and neighborhood food retail outlets. However, little research has been done to develop measures of HFPB. The main objective of this paper is to describe the development of a measure of HFPB using annotated food purchase receipts. Methods Households collected and annotated food purchase receipts for a four-week period as part of the baseline assessment of a household nutrition intervention. Receipts were collected from all food sources, including grocery stores and restaurants. Households (n = 90 were recruited from the community as part of an obesity prevention intervention conducted in 2007–2008 in Minneapolis, Minnesota, USA. Household primary shoppers were trained to follow a standardized receipt collection and annotation protocol. Annotated receipts were mailed weekly to research staff. Staff coded the receipt data and entered it into a database. Total food dollars, proportion of food dollars, and ounces of food purchased were examined for different food sources and food categories. Descriptive statistics and correlations are presented. Results A total of 2,483 receipts were returned by 90 households. Home sources comprised 45% of receipts and eating-out sources 55%. Eating-out entrees were proportionally the largest single food category based on counts (16.6% and dollars ($106 per month. Two-week expenditures were highly correlated (r = 0.83 with four-week expenditures. Conclusion Receipt data provided important quantitative information about HFPB from a wide range of sources and food categories. Two weeks may be adequate to reliably characterize HFPB using annotated receipts.

  15. Semi-automatic semantic annotation of PubMed queries: a study on quality, efficiency, satisfaction.

    Science.gov (United States)

    Névéol, Aurélie; Islamaj Doğan, Rezarta; Lu, Zhiyong

    2011-04-01

    Information processing algorithms require significant amounts of annotated data for training and testing. The availability of such data is often hindered by the complexity and high cost of production. In this paper, we investigate the benefits of a state-of-the-art tool to help with the semantic annotation of a large set of biomedical queries. Seven annotators were recruited to annotate a set of 10,000 PubMed® queries with 16 biomedical and bibliographic categories. About half of the queries were annotated from scratch, while the other half were automatically pre-annotated and manually corrected. The impact of the automatic pre-annotations was assessed on several aspects of the task: time, number of actions, annotator satisfaction, inter-annotator agreement, quality and number of the resulting annotations. The analysis of annotation results showed that the number of required hand annotations is 28.9% less when using pre-annotated results from automatic tools. As a result, the overall annotation time was substantially lower when pre-annotations were used, while inter-annotator agreement was significantly higher. In addition, there was no statistically significant difference in the semantic distribution or number of annotations produced when pre-annotations were used. The annotated query corpus is freely available to the research community. This study shows that automatic pre-annotations are found helpful by most annotators. Our experience suggests using an automatic tool to assist large-scale manual annotation projects. This helps speed-up the annotation time and improve annotation consistency while maintaining high quality of the final annotations.

  16. Mastery Learning and Mastery Testing: An Annotated ERIC Bibliography.

    Science.gov (United States)

    Wildemuth, Barbara M., Comp.

    This 136-item annotated bibliography on mastery learning and mastery testing is the result of a computer search of the ERIC data base in February 1977. All entries are listed alphabetically by author. An abstract or annotation is provided for each entry. A subject index is included reflecting the major emphasis of each citation. (RC)

  17. Gene calling and bacterial genome annotation with BG7.

    Science.gov (United States)

    Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo

    2015-01-01

    New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).

  18. Collaborative Paper-Based Annotation of Lecture Slides

    Science.gov (United States)

    Steimle, Jurgen; Brdiczka, Oliver; Muhlhauser, Max

    2009-01-01

    In a study of notetaking in university courses, we found that the large majority of students prefer paper to computer-based media like Tablet PCs for taking notes and making annotations. Based on this finding, we developed CoScribe, a concept and system which supports students in making collaborative handwritten annotations on printed lecture…

  19. Product annotations - KOME | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us ...ile name: kome_product_annotation.zip File URL: ftp://ftp.biosciencedbc.jp/archiv...ate History of This Database Site Policy | Contact Us Product annotations - KOME | LSDB Archive ...

  20. Orienteering: An Annotated Bibliography = Orientierungslauf: Eine kommentierte Bibliographie.

    Science.gov (United States)

    Seiler, Roland, Ed.; Hartmann, Wolfgang, Ed.

    1994-01-01

    Annotated bibliography of 220 books, monographs, and journal articles on orienteering published 1984-94, from SPOLIT database of the Federal Institute of Sport Science (Cologne, Germany). Annotations in English or German. Ten sections including psychological, physiological, health, sociological, and environmental aspects; training and coaching;…

  1. Behavioral Contributions to "Teaching of Psychology": An Annotated Bibliography

    Science.gov (United States)

    Karsten, A. M.; Carr, J. E.

    2008-01-01

    An annotated bibliography that summarizes behavioral contributions to the journal "Teaching of Psychology" from 1974 to 2006 is provided. A total of 116 articles of potential utility to college-level instructors of behavior analysis and related areas were identified, annotated, and organized into nine categories for ease of accessibility.…

  2. Annotation-Based Whole Genomic Prediction and Selection

    DEFF Research Database (Denmark)

    Kadarmideen, Haja; Do, Duy Ngoc; Janss, Luc;

    in their contribution to estimated genomic variances and in prediction of genomic breeding values by applying SNP annotation approaches to feed efficiency. Ensembl Variant Predictor (EVP) and Pig QTL database were used as the source of genomic annotation for 60K chip. Genomic prediction was performed using the Bayes...... prove useful for less heritable traits such as diseases and fertility...

  3. The RAST Server: Rapid Annotations using Subsystems Technology

    Directory of Open Access Journals (Sweden)

    Overbeek Ross A

    2008-02-01

    Full Text Available Abstract Background The number of prokaryotic genome sequences becoming available is growing steadily and is growing faster than our ability to accurately annotate them. Description We describe a fully automated service for annotating bacterial and archaeal genomes. The service identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user. In addition, the annotated genome can be browsed in an environment that supports comparative analysis with the annotated genomes maintained in the SEED environment. The service normally makes the annotated genome available within 12–24 hours of submission, but ultimately the quality of such a service will be judged in terms of accuracy, consistency, and completeness of the produced annotations. We summarize our attempts to address these issues and discuss plans for incrementally enhancing the service. Conclusion By providing accurate, rapid annotation freely to the community we have created an important community resource. The service has now been utilized by over 120 external users annotating over 350 distinct genomes.

  4. Solving the Problem: Genome Annotation Standards before the Data Deluge

    Science.gov (United States)

    Klimke, William; O'Donovan, Claire; White, Owen; Brister, J. Rodney; Clark, Karen; Fedorov, Boris; Mizrachi, Ilene; Pruitt, Kim D.; Tatusova, Tatiana

    2011-01-01

    The promise of genome sequencing was that the vast undiscovered country would be mapped out by comparison of the multitude of sequences available and would aid researchers in deciphering the role of each gene in every organism. Researchers recognize that there is a need for high quality data. However, different annotation procedures, numerous databases, and a diminishing percentage of experimentally determined gene functions have resulted in a spectrum of annotation quality. NCBI in collaboration with sequencing centers, archival databases, and researchers, has developed the first international annotation standards, a fundamental step in ensuring that high quality complete prokaryotic genomes are available as gold standard references. Highlights include the development of annotation assessment tools, community acceptance of protein naming standards, comparison of annotation resources to provide consistent annotation, and improved tracking of the evidence used to generate a particular annotation. The development of a set of minimal standards, including the requirement for annotated complete prokaryotic genomes to contain a full set of ribosomal RNAs, transfer RNAs, and proteins encoding core conserved functions, is an historic milestone. The use of these standards in existing genomes and future submissions will increase the quality of databases, enabling researchers to make accurate biological discoveries. PMID:22180819

  5. JAFA: a protein function annotation meta-server

    DEFF Research Database (Denmark)

    Friedberg, Iddo; Harder, Tim; Godzik, Adam

    2006-01-01

    With the high number of sequences and structures streaming in from genomic projects, there is a need for more powerful and sophisticated annotation tools. Most problematic of the annotation efforts is predicting gene and protein function. Over the past few years there has been considerable progre...

  6. Bioinformatics Assisted Gene Discovery and Annotation of Human Genome

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    As the sequencing stage of human genome project is near the end, the work has begun for discovering novel genes from genome sequences and annotating their biological functions. Here are reviewed current major bioinformatics tools and technologies available for large scale gene discovery and annotation from human genome sequences. Some ideas about possible future development are also provided.

  7. From the Margins to the Center: The Future of Annotation.

    Science.gov (United States)

    Wolfe, Joanna L.; Neuwirth, Christine M.

    2001-01-01

    Describes the importance of annotation to reading and writing practices and reviews new technologies that complicate the ways annotation can be used to support and enhance traditional reading, writing, and collaboration processes. Emphasizes issues and methods that will be productive for enhancing theories of workplace and classroom communication…

  8. The GATO gene annotation tool for research laboratories

    Directory of Open Access Journals (Sweden)

    A. Fujita

    2005-11-01

    Full Text Available Large-scale genome projects have generated a rapidly increasing number of DNA sequences. Therefore, development of computational methods to rapidly analyze these sequences is essential for progress in genomic research. Here we present an automatic annotation system for preliminary analysis of DNA sequences. The gene annotation tool (GATO is a Bioinformatics pipeline designed to facilitate routine functional annotation and easy access to annotated genes. It was designed in view of the frequent need of genomic researchers to access data pertaining to a common set of genes. In the GATO system, annotation is generated by querying some of the Web-accessible resources and the information is stored in a local database, which keeps a record of all previous annotation results. GATO may be accessed from everywhere through the internet or may be run locally if a large number of sequences are going to be annotated. It is implemented in PHP and Perl and may be run on any suitable Web server. Usually, installation and application of annotation systems require experience and are time consuming, but GATO is simple and practical, allowing anyone with basic skills in informatics to access it without any special training. GATO can be downloaded at [http://mariwork.iq.usp.br/gato/]. Minimum computer free space required is 2 MB.

  9. Online Metacognitive Strategies, Hypermedia Annotations, and Motivation on Hypertext Comprehension

    Science.gov (United States)

    Shang, Hui-Fang

    2016-01-01

    This study examined the effect of online metacognitive strategies, hypermedia annotations, and motivation on reading comprehension in a Taiwanese hypertext environment. A path analysis model was proposed based on the assumption that if English as a foreign language learners frequently use online metacognitive strategies and hypermedia annotations,…

  10. Semantic annotation of clinical events for generating a problem list.

    Science.gov (United States)

    Mowery, Danielle L; Jordan, Pamela; Wiebe, Janyce; Harkema, Henk; Dowling, John; Chapman, Wendy W

    2013-01-01

    We present a pilot study of an annotation schema representing problems and their attributes, along with their relationship to temporal modifiers. We evaluated the ability for humans to annotate clinical reports using the schema and assessed the contribution of semantic annotations in determining the status of a problem mention as active, inactive, proposed, resolved, negated, or other. Our hypothesis is that the schema captures semantic information useful for generating an accurate problem list. Clinical named entities such as reference events, time points, time durations, aspectual phase, ordering words and their relationships including modifications and ordering relations can be annotated by humans with low to moderate recall. Once identified, most attributes can be annotated with low to moderate agreement. Some attributes - Experiencer, Existence, and Certainty - are more informative than other attributes - Intermittency and Generalized/Conditional - for predicting a problem mention's status. Support vector machine outperformed Naïve Bayes and Decision Tree for predicting a problem's status.

  11. The CLEF corpus: semantic annotation of clinical text.

    Science.gov (United States)

    Roberts, Angus; Gaizauskas, Robert; Hepple, Mark; Davis, Neil; Demetriou, George; Guo, Yikun; Kola, Jay; Roberts, Ian; Setzer, Andrea; Tapuria, Archana; Wheeldin, Bill

    2007-10-11

    The Clinical E-Science Framework (CLEF) project is building a framework for the capture, integration and presentation of clinical information: for clinical research, evidence-based health care and genotype-meets-phenotype informatics. A significant portion of the information required by such a framework originates as text, even in EHR-savvy organizations. CLEF uses Information Extraction (IE) to make this unstructured information available. An important part of IE is the identification of semantic entities and relationships. Typical approaches require human annotated documents to provide both evaluation standards and material for system development. CLEF has a corpus of clinical narratives, histopathology reports and imaging reports from 20 thousand patients. We describe the selection of a subset of this corpus for manual annotation of clinical entities and relationships. We describe an annotation methodology and report encouraging initial results of inter-annotator agreement. Comparisons are made between different text sub-genres, and between annotators with different skills.

  12. On Semantic Annotation in Clarin-PL Parallel Corpora

    Directory of Open Access Journals (Sweden)

    Violetta Koseska-Toszewa

    2015-12-01

    Full Text Available On Semantic Annotation in Clarin-PL Parallel Corpora In the article, the authors present a proposal for semantic annotation in Clarin-PL parallel corpora: Polish-Bulgarian-Russian and Polish-Lithuanian ones. Semantic annotation of quantification is a novum in developing sentence level semantics in multilingual parallel corpora. This is why our semantic annotation is manual. The authors hope it will be interesting to IT specialists working on automatic processing of the given natural languages. Semantic annotation defined the way it is defined here will make contrastive studies of natural languages more efficient, which in turn will help verify the results of those studies, and will certainly improve human and machine translations.

  13. Annotated bibliography of software engineering laboratory literature

    Science.gov (United States)

    Kistler, David; Bristow, John; Smith, Don

    1994-01-01

    This document is an annotated bibliography of technical papers, documents, and memorandums produced by or related to the Software Engineering Laboratory. Nearly 200 publications are summarized. These publications cover many areas of software engineering and range from research reports to software documentation. This document has been updated and reorganized substantially since the original version (SEL-82-006, November 1982). All materials have been grouped into eight general subject areas for easy reference: (1) The Software Engineering Laboratory; (2) The Software Engineering Laboratory: Software Development Documents; (3) Software Tools; (4) Software Models; (5) Software Measurement; (6) Technology Evaluations; (7) Ada Technology; and (8) Data Collection. This document contains an index of these publications classified by individual author.

  14. Annotation of selection strengths in viral genomes

    DEFF Research Database (Denmark)

    McCauley, Stephen; de Groot, Saskia; Mailund, Thomas;

    2007-01-01

    Motivation: Viral genomes tend to code in overlapping reading frames to maximize information content. This may result in atypical codon bias and particular evolutionary constraints. Due to the fast mutation rate of viruses, there is additional strong evidence for varying selection between intra...... reading frames. We introduce an evolutionary model capable of accounting for varying levels of selection along the genome, and incorporate it into our prior single sequence HMM methodology, extending it now to a phylogenetic HMM. Given an alignment of several homologous viruses to a reference sequence, we...... may thus achieve an annotation both of coding regions as well as selection strengths, allowing us to investigate different selection patterns and hypotheses. Results: We illustrate our method by applying it to a multiple alignment of four HIV2 sequences, as well as four Hepatitis B sequences. We...

  15. The High Throughput Sequence Annotation Service (HT-SAS – the shortcut from sequence to true Medline words

    Directory of Open Access Journals (Sweden)

    Siedlecki Pawel

    2009-05-01

    Full Text Available Abstract Background Advances in high-throughput technologies available to modern biology have created an increasing flood of experimentally determined facts. Ordering, managing and describing these raw results is the first step which allows facts to become knowledge. Currently there are limited ways to automatically annotate such data, especially utilizing information deposited in published literature. Results To aid researchers in describing results from high-throughput experiments we developed HT-SAS, a web service for automatic annotation of proteins using general English words. For each protein a poll of Medline abstracts connected to homologous proteins is gathered using the UniProt-Medline link. Overrepresented words are detected using binomial statistics approximation. We tested our automatic approach with a protein test set from SGD to determine the accuracy and usefulness of our approach. We also applied the automatic annotation service to improve annotations of proteins from Plasmodium bergei expressed exclusively during the blood stage. Conclusion Using HT-SAS we created new, or enriched already established annotations for over 20% of proteins from Plasmodium bergei expressed in the blood stage, deposited in PlasmoDB. Our tests show this approach to information extraction provides highly specific keywords, often also when the number of abstracts is limited. Our service should be useful for manual curators, as a complement to manually curated information sources and for researchers working with protein datasets, especially from poorly characterized organisms.

  16. About Certain Semantic Annotation in Parallel Corpora

    Directory of Open Access Journals (Sweden)

    Violetta Koseska-Toszewa

    2015-06-01

    Full Text Available About Certain Semantic Annotation in Parallel Corpora The semantic notation analyzed in this works is contained in the second stream of semantic theories presented here – in the direct approach semantics. We used this stream in our work on the Bulgarian-Polish Contrastive Grammar. Our semantic notation distinguishes quantificational meanings of names and predicates, and indicates aspectual and temporal meanings of verbs. It relies on logical scope-based quantification and on the contemporary theory of processes, known as “Petri nets”. Thanks to it, we can distinguish precisely between a language form and its contents, e.g. a perfective verb form has two meanings: an event or a sequence of events and states, finally ended with an event. An imperfective verb form also has two meanings: a state or a sequence of states and events, finally ended with a state. In turn, names are quantified universally or existentially when they are “undefined”, and uniquely (using the iota operator when they are “defined”. A fact worth emphasizing is the possibility of quantifying not only names, but also the predicate, and then quantification concerns time and aspect.  This is a novum in elaborating sentence-level semantics in parallel corpora. For this reason, our semantic notation is manual. We are hoping that it will raise the interest of computer scientists working on automatic methods for processing the given natural languages. Semantic annotation defined like in this work will facilitate contrastive studies of natural languages, and this in turn will verify the results of those studies, and will certainly facilitate human and machine translations.

  17. Evaluation of probabilistic and logical inference for a SNP annotation system.

    Science.gov (United States)

    Shen, Terry H; Tarczy-Hornoch, Peter; Detwiler, Landon T; Cadag, Eithon; Carlson, Christopher S

    2010-06-01

    Genome wide association studies (GWAS) are an important approach to understanding the genetic mechanisms behind human diseases. Single nucleotide polymorphisms (SNPs) are the predominant markers used in genome wide association studies, and the ability to predict which SNPs are likely to be functional is important for both a priori and a posteriori analyses of GWA studies. This article describes the design, implementation and evaluation of a family of systems for the purpose of identifying SNPs that may cause a change in phenotypic outcomes. The methods described in this article characterize the feasibility of combinations of logical and probabilistic inference with federated data integration for both point and regional SNP annotation and analysis. Evaluations of the methods demonstrate the overall strong predictive value of logical, and logical with probabilistic, inference applied to the domain of SNP annotation.

  18. Metalloproteomics: High-Throughput Structural and Functional Annotation of Proteins in Structural Genomics

    Energy Technology Data Exchange (ETDEWEB)

    Shi,W.; Zhan, C.; Lgnatov, A.; Manjasetty, B.; Marinkovic, N.; Sullivan, M.; Huang, R.; Chance, M.; Li, H.; et al.

    2005-01-01

    A high-throughput method for measuring transition metal content based on quantitation of X-ray fluorescence signals was used to analyze 654 proteins selected as targets by the New York Structural GenomiX Research Consortium. Over 10% showed the presence of transition metal atoms in stoichiometric amounts; these totals as well as the abundance distribution are similar to those of the Protein Data Bank. Bioinformatics analysis of the identified metalloproteins in most cases supported the metalloprotein annotation; identification of the conserved metal binding motif was also shown to be useful in verifying structural models of the proteins. Metalloproteomics provides a rapid structural and functional annotation for these sequences and is shown to be {approx}95% accurate in predicting the presence or absence of stoichiometric metal content. The project's goal is to assay at least 1 member from each Pfam family; approximately 500 Pfam families have been characterized with respect to transition metal content so far.

  19. cDNA2Genome: A tool for mapping and annotating cDNAs

    Directory of Open Access Journals (Sweden)

    Suhai Sandor

    2003-09-01

    Full Text Available Abstract Background In the last years several high-throughput cDNA sequencing projects have been funded worldwide with the aim of identifying and characterizing the structure of complete novel human transcripts. However some of these cDNAs are error prone due to frameshifts and stop codon errors caused by low sequence quality, or to cloning of truncated inserts, among other reasons. Therefore, accurate CDS prediction from these sequences first require the identification of potentially problematic cDNAs in order to speed up the posterior annotation process. Results cDNA2Genome is an application for the automatic high-throughput mapping and characterization of cDNAs. It utilizes current annotation data and the most up to date databases, especially in the case of ESTs and mRNAs in conjunction with a vast number of approaches to gene prediction in order to perform a comprehensive assessment of the cDNA exon-intron structure. The final result of cDNA2Genome is an XML file containing all relevant information obtained in the process. This XML output can easily be used for further analysis such us program pipelines, or the integration of results into databases. The web interface to cDNA2Genome also presents this data in HTML, where the annotation is additionally shown in a graphical form. cDNA2Genome has been implemented under the W3H task framework which allows the combination of bioinformatics tools in tailor-made analysis task flows as well as the sequential or parallel computation of many sequences for large-scale analysis. Conclusions cDNA2Genome represents a new versatile and easily extensible approach to the automated mapping and annotation of human cDNAs. The underlying approach allows sequential or parallel computation of sequences for high-throughput analysis of cDNAs.

  20. Current and future trends in marine image annotation software

    Science.gov (United States)

    Gomes-Pereira, Jose Nuno; Auger, Vincent; Beisiegel, Kolja; Benjamin, Robert; Bergmann, Melanie; Bowden, David; Buhl-Mortensen, Pal; De Leo, Fabio C.; Dionísio, Gisela; Durden, Jennifer M.; Edwards, Luke; Friedman, Ariell; Greinert, Jens; Jacobsen-Stout, Nancy; Lerner, Steve; Leslie, Murray; Nattkemper, Tim W.; Sameoto, Jessica A.; Schoening, Timm; Schouten, Ronald; Seager, James; Singh, Hanumant; Soubigou, Olivier; Tojeira, Inês; van den Beld, Inge; Dias, Frederico; Tempera, Fernando; Santos, Ricardo S.

    2016-12-01

    Given the need to describe, analyze and index large quantities of marine imagery data for exploration and monitoring activities, a range of specialized image annotation tools have been developed worldwide. Image annotation - the process of transposing objects or events represented in a video or still image to the semantic level, may involve human interactions and computer-assisted solutions. Marine image annotation software (MIAS) have enabled over 500 publications to date. We review the functioning, application trends and developments, by comparing general and advanced features of 23 different tools utilized in underwater image analysis. MIAS requiring human input are basically a graphical user interface, with a video player or image browser that recognizes a specific time code or image code, allowing to log events in a time-stamped (and/or geo-referenced) manner. MIAS differ from similar software by the capability of integrating data associated to video collection, the most simple being the position coordinates of the video recording platform. MIAS have three main characteristics: annotating events in real time, posteriorly to annotation and interact with a database. These range from simple annotation interfaces, to full onboard data management systems, with a variety of toolboxes. Advanced packages allow to input and display data from multiple sensors or multiple annotators via intranet or internet. Posterior human-mediated annotation often include tools for data display and image analysis, e.g. length, area, image segmentation, point count; and in a few cases the possibility of browsing and editing previous dive logs or to analyze the annotations. The interaction with a database allows the automatic integration of annotations from different surveys, repeated annotation and collaborative annotation of shared datasets, browsing and querying of data. Progress in the field of automated annotation is mostly in post processing, for stable platforms or still images

  1. Fuzzy Emotional Semantic Analysis and Automated Annotation of Scene Images

    Directory of Open Access Journals (Sweden)

    Jianfang Cao

    2015-01-01

    Full Text Available With the advances in electronic and imaging techniques, the production of digital images has rapidly increased, and the extraction and automated annotation of emotional semantics implied by images have become issues that must be urgently addressed. To better simulate human subjectivity and ambiguity for understanding scene images, the current study proposes an emotional semantic annotation method for scene images based on fuzzy set theory. A fuzzy membership degree was calculated to describe the emotional degree of a scene image and was implemented using the Adaboost algorithm and a back-propagation (BP neural network. The automated annotation method was trained and tested using scene images from the SUN Database. The annotation results were then compared with those based on artificial annotation. Our method showed an annotation accuracy rate of 91.2% for basic emotional values and 82.4% after extended emotional values were added, which correspond to increases of 5.5% and 8.9%, respectively, compared with the results from using a single BP neural network algorithm. Furthermore, the retrieval accuracy rate based on our method reached approximately 89%. This study attempts to lay a solid foundation for the automated emotional semantic annotation of more types of images and therefore is of practical significance.

  2. Ontology modularization to improve semantic medical image annotation.

    Science.gov (United States)

    Wennerberg, Pinar; Schulz, Klaus; Buitelaar, Paul

    2011-02-01

    Searching for medical images and patient reports is a significant challenge in a clinical setting. The contents of such documents are often not described in sufficient detail thus making it difficult to utilize the inherent wealth of information contained within them. Semantic image annotation addresses this problem by describing the contents of images and reports using medical ontologies. Medical images and patient reports are then linked to each other through common annotations. Subsequently, search algorithms can more effectively find related sets of documents on the basis of these semantic descriptions. A prerequisite to realizing such a semantic search engine is that the data contained within should have been previously annotated with concepts from medical ontologies. One major challenge in this regard is the size and complexity of medical ontologies as annotation sources. Manual annotation is particularly time consuming labor intensive in a clinical environment. In this article we propose an approach to reducing the size of clinical ontologies for more efficient manual image and text annotation. More precisely, our goal is to identify smaller fragments of a large anatomy ontology that are relevant for annotating medical images from patients suffering from lymphoma. Our work is in the area of ontology modularization, which is a recent and active field of research. We describe our approach, methods and data set in detail and we discuss our results.

  3. Fuzzy emotional semantic analysis and automated annotation of scene images.

    Science.gov (United States)

    Cao, Jianfang; Chen, Lichao

    2015-01-01

    With the advances in electronic and imaging techniques, the production of digital images has rapidly increased, and the extraction and automated annotation of emotional semantics implied by images have become issues that must be urgently addressed. To better simulate human subjectivity and ambiguity for understanding scene images, the current study proposes an emotional semantic annotation method for scene images based on fuzzy set theory. A fuzzy membership degree was calculated to describe the emotional degree of a scene image and was implemented using the Adaboost algorithm and a back-propagation (BP) neural network. The automated annotation method was trained and tested using scene images from the SUN Database. The annotation results were then compared with those based on artificial annotation. Our method showed an annotation accuracy rate of 91.2% for basic emotional values and 82.4% after extended emotional values were added, which correspond to increases of 5.5% and 8.9%, respectively, compared with the results from using a single BP neural network algorithm. Furthermore, the retrieval accuracy rate based on our method reached approximately 89%. This study attempts to lay a solid foundation for the automated emotional semantic annotation of more types of images and therefore is of practical significance.

  4. Open semantic annotation of scientific publications using DOMEO

    Directory of Open Access Journals (Sweden)

    Ciccarese Paolo

    2012-04-01

    Full Text Available Abstract Background Our group has developed a useful shared software framework for performing, versioning, sharing and viewing Web annotations of a number of kinds, using an open representation model. Methods The Domeo Annotation Tool was developed in tandem with this open model, the Annotation Ontology (AO. Development of both the Annotation Framework and the open model was driven by requirements of several different types of alpha users, including bench scientists and biomedical curators from university research labs, online scientific communities, publishing and pharmaceutical companies. Several use cases were incrementally implemented by the toolkit. These use cases in biomedical communications include personal note-taking, group document annotation, semantic tagging, claim-evidence-context extraction, reagent tagging, and curation of textmining results from entity extraction algorithms. Results We report on the Domeo user interface here. Domeo has been deployed in beta release as part of the NIH Neuroscience Information Framework (NIF, http://www.neuinfo.org and is scheduled for production deployment in the NIF’s next full release. Future papers will describe other aspects of this work in detail, including Annotation Framework Services and components for integrating with external textmining services, such as the NCBO Annotator web service, and with other textmining applications using the Apache UIMA framework.

  5. MimoSA: a system for minimotif annotation

    Directory of Open Access Journals (Sweden)

    Kundeti Vamsi

    2010-06-01

    Full Text Available Abstract Background Minimotifs are short peptide sequences within one protein, which are recognized by other proteins or molecules. While there are now several minimotif databases, they are incomplete. There are reports of many minimotifs in the primary literature, which have yet to be annotated, while entirely novel minimotifs continue to be published on a weekly basis. Our recently proposed function and sequence syntax for minimotifs enables us to build a general tool that will facilitate structured annotation and management of minimotif data from the biomedical literature. Results We have built the MimoSA application for minimotif annotation. The application supports management of the Minimotif Miner database, literature tracking, and annotation of new minimotifs. MimoSA enables the visualization, organization, selection and editing functions of minimotifs and their attributes in the MnM database. For the literature components, Mimosa provides paper status tracking and scoring of papers for annotation through a freely available machine learning approach, which is based on word correlation. The paper scoring algorithm is also available as a separate program, TextMine. Form-driven annotation of minimotif attributes enables entry of new minimotifs into the MnM database. Several supporting features increase the efficiency of annotation. The layered architecture of MimoSA allows for extensibility by separating the functions of paper scoring, minimotif visualization, and database management. MimoSA is readily adaptable to other annotation efforts that manually curate literature into a MySQL database. Conclusions MimoSA is an extensible application that facilitates minimotif annotation and integrates with the Minimotif Miner database. We have built MimoSA as an application that integrates dynamic abstract scoring with a high performance relational model of minimotif syntax. MimoSA's TextMine, an efficient paper-scoring algorithm, can be used to

  6. An annotation system for 3D fluid flow visualization

    Science.gov (United States)

    Loughlin, Maria M.; Hughes, John F.

    1995-01-01

    Annotation is a key activity of data analysis. However, current systems for data analysis focus almost exclusively on visualization. We propose a system which integrates annotations into a visualization system. Annotations are embedded in 3D data space, using the Post-it metaphor. This embedding allows contextual-based information storage and retrieval, and facilitates information sharing in collaborative environments. We provide a traditional database filter and a Magic Lens filter to create specialized views of the data. The system has been customized for fluid flow applications, with features which allow users to store parameters of visualization tools and sketch 3D volumes.

  7. A resource-based Korean morphological annotation system

    CERN Document Server

    Huh, Hyun-Gue

    2007-01-01

    We describe a resource-based method of morphological annotation of written Korean text. Korean is an agglutinative language. The output of our system is a graph of morphemes annotated with accurate linguistic information. The language resources used by the system can be easily updated, which allows us-ers to control the evolution of the per-formances of the system. We show that morphological annotation of Korean text can be performed directly with a lexicon of words and without morpho-logical rules.

  8. Annotation of the protein coding regions of the equine genome

    DEFF Research Database (Denmark)

    Hestand, Matthew S.; Kalbfleisch, Theodore S.; Coleman, Stephen J.

    2015-01-01

    Current gene annotation of the horse genome is largely derived from in silico predictions and cross-species alignments. Only a small number of genes are annotated based on equine EST and mRNA sequences. To expand the number of equine genes annotated from equine experimental evidence, we sequenced m...... and appear to be small errors in the equine reference genome, since they are also identified as homozygous variants by genomic DNA resequencing of the reference horse. Taken together, we provide a resource of equine mRNA structures and protein coding variants that will enhance equine and cross...

  9. Functional annotation and identification of candidate disease genes by computational analysis of normal tissue gene expression data.

    Directory of Open Access Journals (Sweden)

    Laura Miozzi

    Full Text Available BACKGROUND: High-throughput gene expression data can predict gene function through the "guilt by association" principle: coexpressed genes are likely to be functionally associated. METHODOLOGY/PRINCIPAL FINDINGS: We analyzed publicly available expression data on normal human tissues. The analysis is based on the integration of data obtained with two experimental platforms (microarrays and SAGE and of various measures of dissimilarity between expression profiles. The building blocks of the procedure are the Ranked Coexpression Groups (RCG, small sets of tightly coexpressed genes which are analyzed in terms of functional annotation. Functionally characterized RCGs are selected by means of the majority rule and used to predict new functional annotations. Functionally characterized RCGs are enriched in groups of genes associated to similar phenotypes. We exploit this fact to find new candidate disease genes for many OMIM phenotypes of unknown molecular origin. CONCLUSIONS/SIGNIFICANCE: We predict new functional annotations for many human genes, showing that the integration of different data sets and coexpression measures significantly improves the scope of the results. Combining gene expression data, functional annotation and known phenotype-gene associations we provide candidate genes for several genetic diseases of unknown molecular basis.

  10. RepeatsDB 2.0: improved annotation, classification, search and visualization of repeat protein structures

    Science.gov (United States)

    Paladin, Lisanna; Hirsh, Layla; Piovesan, Damiano; Andrade-Navarro, Miguel A.; Kajava, Andrey V.; Tosatto, Silvio C.E.

    2017-01-01

    RepeatsDB 2.0 (URL: http://repeatsdb.bio.unipd.it/) is an update of the database of annotated tandem repeat protein structures. Repeat proteins are a widespread class of non-globular proteins carrying heterogeneous functions involved in several diseases. Here we provide a new version of RepeatsDB with an improved classification schema including high quality annotations for ∼5400 protein structures. RepeatsDB 2.0 features information on start and end positions for the repeat regions and units for all entries. The extensive growth of repeat unit characterization was possible by applying the novel ReUPred annotation method over the entire Protein Data Bank, with data quality is guaranteed by an extensive manual validation for >60% of the entries. The updated web interface includes a new search engine for complex queries and a fully re-designed entry page for a better overview of structural data. It is now possible to compare unit positions, together with secondary structure, fold information and Pfam domains. Moreover, a new classification level has been introduced on top of the existing scheme as an independent layer for sequence similarity relationships at 40%, 60% and 90% identity. PMID:27899671

  11. Annotation and Curation of Uncharacterized proteins- Challenges

    Directory of Open Access Journals (Sweden)

    Johny eIjaq

    2015-03-01

    Full Text Available Hypothetical Proteins are the proteins that are predicted to be expressed from an open reading frame (ORF, constituting a substantial fraction of proteomes in both prokaryotes and eukaryotes. Genome projects have led to the identification of many therapeutic targets, the putative function of the protein and their interactions. In this review we have enlisted various methods. Annotation linked to structural and functional prediction of hypothetical proteins assist in the discovery of new structures and functions serving as markers and pharmacological targets for drug designing, discovery and screening. Mass spectrometry is an analytical technique for validating protein characterisation. Matrix-assisted laser desorption ionization–mass spectrometry (MALDI-MS is an efficient analytical method. Microarrays and Protein expression profiles help understanding the biological systems through a systems-wide study of proteins and their interactions with other proteins and non-proteinaceous molecules to control complex processes in cells and tissues and even whole organism. Next generation sequencing technology accelerates multiple areas of genomics research.

  12. Semantic annotation for biological information retrieval system.

    Science.gov (United States)

    Oshaiba, Mohamed Marouf Z; El Houby, Enas M F; Salah, Akram

    2015-01-01

    Online literatures are increasing in a tremendous rate. Biological domain is one of the fast growing domains. Biological researchers face a problem finding what they are searching for effectively and efficiently. The aim of this research is to find documents that contain any combination of biological process and/or molecular function and/or cellular component. This research proposes a framework that helps researchers to retrieve meaningful documents related to their asserted terms based on gene ontology (GO). The system utilizes GO by semantically decomposing it into three subontologies (cellular component, biological process, and molecular function). Researcher has the flexibility to choose searching terms from any combination of the three subontologies. Document annotation is taking a place in this research to create an index of biological terms in documents to speed the searching process. Query expansion is used to infer semantically related terms to asserted terms. It increases the search meaningful results using the term synonyms and term relationships. The system uses a ranking method to order the retrieved documents based on the ranking weights. The proposed system achieves researchers' needs to find documents that fit the asserted terms semantically.

  13. MitoBamAnnotator: A web-based tool for detecting and annotating heteroplasmy in human mitochondrial DNA sequences.

    Science.gov (United States)

    Zhidkov, Ilia; Nagar, Tal; Mishmar, Dan; Rubin, Eitan

    2011-11-01

    The use of Next-Generation Sequencing of mitochondrial DNA is becoming widespread in biological and clinical research. This, in turn, creates a need for a convenient tool that detects and analyzes heteroplasmy. Here we present MitoBamAnnotator, a user friendly web-based tool that allows maximum flexibility and control in heteroplasmy research. MitoBamAnnotator provides the user with a comprehensively annotated overview of mitochondrial genetic variation, allowing for an in-depth analysis with no prior knowledge in programming.

  14. Evolutionarily conserved substrate substructures for automated annotation of enzyme superfamilies.

    Directory of Open Access Journals (Sweden)

    Ranyee A Chiang

    Full Text Available The evolution of enzymes affects how well a species can adapt to new environmental conditions. During enzyme evolution, certain aspects of molecular function are conserved while other aspects can vary. Aspects of function that are more difficult to change or that need to be reused in multiple contexts are often conserved, while those that vary may indicate functions that are more easily changed or that are no longer required. In analogy to the study of conservation patterns in enzyme sequences and structures, we have examined the patterns of conservation and variation in enzyme function by analyzing graph isomorphisms among enzyme substrates of a large number of enzyme superfamilies. This systematic analysis of substrate substructures establishes the conservation patterns that typify individual superfamilies. Specifically, we determined the chemical substructures that are conserved among all known substrates of a superfamily and the substructures that are reacting in these substrates and then examined the relationship between the two. Across the 42 superfamilies that were analyzed, substantial variation was found in how much of the conserved substructure is reacting, suggesting that superfamilies may not be easily grouped into discrete and separable categories. Instead, our results suggest that many superfamilies may need to be treated individually for analyses of evolution, function prediction, and guiding enzyme engineering strategies. Annotating superfamilies with these conserved and reacting substructure patterns provides information that is orthogonal to information provided by studies of conservation in superfamily sequences and structures, thereby improving the precision with which we can predict the functions of enzymes of unknown function and direct studies in enzyme engineering. Because the method is automated, it is suitable for large-scale characterization and comparison of fundamental functional capabilities of both characterized

  15. Annotation et rature Annotation and Deletion: Outline of a Sociology of Forms

    Directory of Open Access Journals (Sweden)

    Axel Pohn-Weidinger

    2012-05-01

    Full Text Available Ce texte interroge les traces graphiques laissées sur un corpus de formulaires de demande de logement social telles qu’annotations, ratures, biffures et commentaires griffonnés. L’étude de ces traces, laissées en marge des catégories de l’imprimé administratif lors du remplissage, montre le recours au droit comme une opération problématique. Pour les administrés, il s’agit de décrire leur situation de vie de sorte à établir l’éligibilité à un droit, mais bien souvent il est impossible de traduire celle-ci dans les catégories préétablies du formulaire. Les annotations et commentaires laissés sur le formulaire tentent alors d’ouvrir la catégorisation juridique des situations à une prise en compte de la singularité des circonstances de vie du demandeur. Elles montrent le recours au droit comme un accomplissement réflexif, un travail à la fois sur sa propre perception de sa situation et sur celle que l’institution offre à travers le formulaire, et dont la négociation et la mise en œuvre sont au cœur de la production du dossier administratif.This text examines the graphical traces left on a collection of social housing application forms: annotations, erasures, crossed-out words and scribbled-out comments. The study of these traces, left in the margins of the categories on printed administrative forms in the process of being completed, shows the exercising of a right as a problematic operation. Citizens making applications must describe their living situation in a way that will establish their eligibility for a right, but quite often it is impossible to convey this through the form’s predetermined categories. The annotations and comments left on the form attempt to open the legal classification of situations to considering the uniqueness of the applicant’s living circumstances. They show the use of a right as an introspective accomplishment, requiring applicants to work both on their own perception of

  16. Transcriptator: An Automated Computational Pipeline to Annotate Assembled Reads and Identify Non Coding RNA.

    Directory of Open Access Journals (Sweden)

    Kumar Parijat Tripathi

    Full Text Available RNA-seq is a new tool to measure RNA transcript counts, using high-throughput sequencing at an extraordinary accuracy. It provides quantitative means to explore the transcriptome of an organism of interest. However, interpreting this extremely large data into biological knowledge is a problem, and biologist-friendly tools are lacking. In our lab, we developed Transcriptator, a web application based on a computational Python pipeline with a user-friendly Java interface. This pipeline uses the web services available for BLAST (Basis Local Search Alignment Tool, QuickGO and DAVID (Database for Annotation, Visualization and Integrated Discovery tools. It offers a report on statistical analysis of functional and Gene Ontology (GO annotation's enrichment. It helps users to identify enriched biological themes, particularly GO terms, pathways, domains, gene/proteins features and protein-protein interactions related informations. It clusters the transcripts based on functional annotations and generates a tabular report for functional and gene ontology annotations for each submitted transcript to the web server. The implementation of QuickGo web-services in our pipeline enable the users to carry out GO-Slim analysis, whereas the integration of PORTRAIT (Prediction of transcriptomic non coding RNA (ncRNA by ab initio methods helps to identify the non coding RNAs and their regulatory role in transcriptome. In summary, Transcriptator is a useful software for both NGS and array data. It helps the users to characterize the de-novo assembled reads, obtained from NGS experiments for non-referenced organisms, while it also performs the functional enrichment analysis of differentially expressed transcripts/genes for both RNA-seq and micro-array experiments. It generates easy to read tables and interactive charts for better understanding of the data. The pipeline is modular in nature, and provides an opportunity to add new plugins in the future. Web application is

  17. Experimental annotation of post-translational features and translated coding regions in the pathogen Salmonella Typhimurium

    Directory of Open Access Journals (Sweden)

    Smith Richard D

    2011-08-01

    Full Text Available Abstract Background Complete and accurate genome annotation is crucial for comprehensive and systematic studies of biological systems. However, determining protein-coding genes for most new genomes is almost completely performed by inference using computational predictions with significant documented error rates (> 15%. Furthermore, gene prediction programs provide no information on biologically important post-translational processing events critical for protein function. Results We experimentally annotated the bacterial pathogen Salmonella Typhimurium 14028, using "shotgun" proteomics to accurately uncover the translational landscape and post-translational features. The data provide protein-level experimental validation for approximately half of the predicted protein-coding genes in Salmonella and suggest revisions to several genes that appear to have incorrectly assigned translational start sites, including a potential novel alternate start codon. Additionally, we uncovered 12 non-annotated genes missed by gene prediction programs, as well as evidence suggesting a role for one of these novel ORFs in Salmonella pathogenesis. We also characterized post-translational features in the Salmonella genome, including chemical modifications and proteolytic cleavages. We find that bacteria have a much larger and more complex repertoire of chemical modifications than previously thought including several novel modifications. Our in vivo proteolysis data identified more than 130 signal peptide and N-terminal methionine cleavage events critical for protein function. Conclusion This work highlights several ways in which application of proteomics data can improve the quality of genome annotations to facilitate novel biological insights and provides a comprehensive proteome map of Salmonella as a resource for systems analysis.

  18. Analysis of Annotation on Documents for Recycling Information

    Science.gov (United States)

    Nakai, Tomohiro; Kondo, Nobuyuki; Kise, Koichi; Matsumoto, Keinosuke

    In order to make collaborative business activities fruitful, it is essential to know characteristics of organizations and persons in more details and to gather information relevant to the activities. In this paper, we describe a notion of “information recycle" that actualizes these requirements by analyzing documents. The key of recycling information is to utilize annotations on documents as clues for generating users' profiles and for weighting contents in the context of the activities. We also propose a method of extracting annotations on paper documents just by pressing one button with the help of techniques of camera-based document image analysis. Experimental results demonstrate that it is fundamentally capable of acquiring annotations on paper documents on condition that their electronic versions without annotations are available for the processing.

  19. Collaborative Semantic Annotation of Images : Ontology-Based Model

    Directory of Open Access Journals (Sweden)

    Damien E. ZOMAHOUN

    2015-12-01

    Full Text Available In the quest for models that could help to represen t the meaning of images, some approaches have used contextual knowledge by building semantic hierarchi es. Others have resorted to the integration of imag es analysis improvement knowledge and images interpret ation using ontologies. The images are often annotated with a set of keywords (or ontologies, w hose relevance remains highly subjective and relate d to only one interpretation (one annotator. However , an image can get many associated semantics because annotators can interpret it differently. Th e purpose of this paper is to propose a collaborati ve annotation system that brings out the meaning of im ages from the different interpretations of annotato rs. The different works carried out in this paper lead to a semantic model of an image, i.e. the different means that a picture may have. This method relies o n the different tools of the Semantic Web, especial ly ontologies.

  20. Creating New Medical Ontologies for Image Annotation A Case Study

    CERN Document Server

    Stanescu, Liana; Brezovan, Marius; Mihai, Cristian Gabriel

    2012-01-01

    Creating New Medical Ontologies for Image Annotation focuses on the problem of the medical images automatic annotation process, which is solved in an original manner by the authors. All the steps of this process are described in detail with algorithms, experiments and results. The original algorithms proposed by authors are compared with other efficient similar algorithms. In addition, the authors treat the problem of creating ontologies in an automatic way, starting from Medical Subject Headings (MESH). They have presented some efficient and relevant annotation models and also the basics of the annotation model used by the proposed system: Cross Media Relevance Models. Based on a text query the system will retrieve the images that contain objects described by the keywords.

  1. Bayesian Framework for Automatic Image Annotation Using Visual Keywords

    Science.gov (United States)

    Agrawal, Rajeev; Wu, Changhua; Grosky, William; Fotouhi, Farshad

    In this paper, we propose a Bayesian probability based framework, which uses visual keywords and already available text keywords to automatically annotate the images. Taking the cue from document classification, an image can be considered as a document and objects present in it as words. Using this concept, we can create visual keywords by dividing an image into tiles based on a certain template size. Visual keywords are simple vector quantization of small-sized image tiles. We estimate the conditional probability of a text keyword in the presence of visual keywords, described by a multivariate Gaussian distribution. We demonstrate the effectiveness of our approach by comparing predicted text annotations with manual annotations and analyze the effect of text annotation length on the performance.

  2. Effects of dehydration on performance in man: Annotated bibliography

    Science.gov (United States)

    Greenleaf, J. E.

    1973-01-01

    A compilation of studies on the effect of dehydration on human performance and related physiological mechanisms. The annotations are listed in alphabetical order by first author and cover material through June 1973.

  3. An Annotated Checklist of the Fishes of Samoa

    Data.gov (United States)

    US Fish and Wildlife Service, Department of the Interior — All fishes currently known from the Samoan Islands are listed by their scientific and Samoan names. Species entries are annotated to include the initial Samoan...

  4. Homosexuality in Young Adult Fiction and Nonfiction: An Annotated Bibliography.

    Science.gov (United States)

    Webunder, Dave; Woodard, Sarah

    1996-01-01

    Contains a bibliography of 14 adolescent novels and journal articles (published between 1976 and 1994) relating to homosexuality and homophobia, with evaluative annotations outlining themes and plots and offering suggestions for instruction. (TB)

  5. Responsibility in Governmental-Political Communication: A Selected, Annotated Bibliography.

    Science.gov (United States)

    Johannesen, Richard L.

    This annotated bibliography lists 43 books, periodicals, and essays in the area of governmental-political communication. Topics include: social justice, lying, cheating, ethics, public duties, public policy, language, rhetorical strategies, and propaganda. (MS)

  6. Freedom of Speech: A Selected, Annotated Basic Bibliography.

    Science.gov (United States)

    Tedford, Thomas L.

    This bibliography lists 36 books related to problems of freedom of speech. General sources (history, analyses, texts, and anthologies) are listed separately from those dealing with censorship of obscenity and pornography. Each entry is briefly annotated. (AA)

  7. Using Apollo to browse and edit genome annotations.

    Science.gov (United States)

    Misra, Sima; Harris, Nomi

    2006-01-01

    An annotation is any feature that can be tied to genomic sequence, such as an exon, transcript, promoter, or transposable element. As biological knowledge increases, annotations of different types need to be added and modified, and links to other sources of information need to be incorporated, to allow biologists to easily access all of the available sequence analysis data and design appropriate experiments. The Apollo genome browser and editor offers biologists these capabilities. Apollo can display many different types of computational evidence, such as alignments and similarities based on BLAST searches (UNITS 3.3 & 3.4), and enables biologists to utilize computational evidence to create and edit gene models and other genomic features, e.g., using experimental evidence to refine exon-intron structures predicted by gene prediction algorithms. This protocol describes simple ways to browse genome annotation data, as well as techniques for editing annotations and loading data from different sources.

  8. OntoELAN: An Ontology-based Linguistic Multimedia Annotator

    CERN Document Server

    Chebotko, Artem; Lu, Shiyong; Fotouhi, Farshad; Aristar, Anthony; Brugman, Hennie; Klassmann, Alexander; Sloetjes, Han; Russel, Albert; Wittenburg, Peter

    2009-01-01

    Despite its scientific, political, and practical value, comprehensive information about human languages, in all their variety and complexity, is not readily obtainable and searchable. One reason is that many language data are collected as audio and video recordings which imposes a challenge to document indexing and retrieval. Annotation of multimedia data provides an opportunity for making the semantics explicit and facilitates the searching of multimedia documents. We have developed OntoELAN, an ontology-based linguistic multimedia annotator that features: (1) support for loading and displaying ontologies specified in OWL; (2) creation of a language profile, which allows a user to choose a subset of terms from an ontology and conveniently rename them if needed; (3) creation of ontological tiers, which can be annotated with profile terms and, therefore, corresponding ontological terms; and (4) saving annotations in the XML format as Multimedia Ontology class instances and, linked to them, class instances of o...

  9. Geothermal wetlands: an annotated bibliography of pertinent literature

    Energy Technology Data Exchange (ETDEWEB)

    Stanley, N.E.; Thurow, T.L.; Russell, B.F.; Sullivan, J.F.

    1980-05-01

    This annotated bibliography covers the following topics: algae, wetland ecosystems; institutional aspects; macrophytes - general, production rates, and mineral absorption; trace metal absorption; wetland soils; water quality; and other aspects of marsh ecosystems. (MHR)

  10. A Machine Learning Based Analytical Framework for Semantic Annotation Requirements

    CERN Document Server

    Hassanzadeh, Hamed; 10.5121/ijwest.2011.2203

    2011-01-01

    The Semantic Web is an extension of the current web in which information is given well-defined meaning. The perspective of Semantic Web is to promote the quality and intelligence of the current web by changing its contents into machine understandable form. Therefore, semantic level information is one of the cornerstones of the Semantic Web. The process of adding semantic metadata to web resources is called Semantic Annotation. There are many obstacles against the Semantic Annotation, such as multilinguality, scalability, and issues which are related to diversity and inconsistency in content of different web pages. Due to the wide range of domains and the dynamic environments that the Semantic Annotation systems must be performed on, the problem of automating annotation process is one of the significant challenges in this domain. To overcome this problem, different machine learning approaches such as supervised learning, unsupervised learning and more recent ones like, semi-supervised learning and active learn...

  11. GIFtS: annotation landscape analysis with GeneCards

    Directory of Open Access Journals (Sweden)

    Dalah Irina

    2009-10-01

    Full Text Available Abstract Background Gene annotation is a pivotal component in computational genomics, encompassing prediction of gene function, expression analysis, and sequence scrutiny. Hence, quantitative measures of the annotation landscape constitute a pertinent bioinformatics tool. GeneCards® is a gene-centric compendium of rich annotative information for over 50,000 human gene entries, building upon 68 data sources, including Gene Ontology (GO, pathways, interactions, phenotypes, publications and many more. Results We present the GeneCards Inferred Functionality Score (GIFtS which allows a quantitative assessment of a gene's annotation status, by exploiting the unique wealth and diversity of GeneCards information. The GIFtS tool, linked from the GeneCards home page, facilitates browsing the human genome by searching for the annotation level of a specified gene, retrieving a list of genes within a specified range of GIFtS value, obtaining random genes with a specific GIFtS value, and experimenting with the GIFtS weighting algorithm for a variety of annotation categories. The bimodal shape of the GIFtS distribution suggests a division of the human gene repertoire into two main groups: the high-GIFtS peak consists almost entirely of protein-coding genes; the low-GIFtS peak consists of genes from all of the categories. Cluster analysis of GIFtS annotation vectors provides the classification of gene groups by detailed positioning in the annotation arena. GIFtS also provide measures which enable the evaluation of the databases that serve as GeneCards sources. An inverse correlation is found (for GIFtS>25 between the number of genes annotated by each source, and the average GIFtS value of genes associated with that source. Three typical source prototypes are revealed by their GIFtS distribution: genome-wide sources, sources comprising mainly highly annotated genes, and sources comprising mainly poorly annotated genes. The degree of accumulated knowledge for a

  12. Annotating abstract pronominal anaphora in the DAD project

    DEFF Research Database (Denmark)

    Navarretta, Costanza; Olsen, Sussi Anni

    2008-01-01

    n this paper we present an extension of the MATE/GNOME annotation scheme for anaphora (Poesio 2004) which accounts for abstract anaphora in Danish and Italian. By abstract anaphora it is here meant pronouns whose linguistic antecedents are verbal phrases, clauses and discourse segments. The exten...... by applying the DAD annotation scheme on texts and dialogues in the two languages are given and show that th information proposed in the scheme can be recognised in a reliable way....

  13. Experimental Polish-Lithuanian Corpus with the Semantic Annotation Elements

    Directory of Open Access Journals (Sweden)

    Danuta Roszko

    2015-06-01

    Full Text Available Experimental Polish-Lithuanian Corpus with the Semantic Annotation Elements In the article the authors present the experimental Polish-Lithuanian corpus (ECorpPL-LT formed for the idea of Polish-Lithuanian theoretical contrastive studies, a Polish-Lithuanian electronic dictionary, and as help for a sworn translator. The semantic annotation being brought into ECorpPL-LT is extremely useful in Polish-Lithuanian contrastive studies, and also proves helpful in translation work.

  14. AmiGO: online access to ontology and annotation data

    Energy Technology Data Exchange (ETDEWEB)

    Carbon, Seth; Ireland, Amelia; Mungall, Christopher J.; Shu, ShengQiang; Marshall, Brad; Lewis, Suzanna

    2009-01-15

    AmiGO is a web application that allows users to query, browse, and visualize ontologies and related gene product annotation (association) data. AmiGO can be used online at the Gene Ontology (GO) website to access the data provided by the GO Consortium; it can also be downloaded and installed to browse local ontologies and annotations. AmiGO is free open source software developed and maintained by the GO Consortium.

  15. Improving the Caenorhabditis elegans genome annotation using machine learning.

    Directory of Open Access Journals (Sweden)

    Gunnar Rätsch

    2007-02-01

    Full Text Available For modern biology, precise genome annotations are of prime importance, as they allow the accurate definition of genic regions. We employ state-of-the-art machine learning methods to assay and improve the accuracy of the genome annotation of the nematode Caenorhabditis elegans. The proposed machine learning system is trained to recognize exons and introns on the unspliced mRNA, utilizing recent advances in support vector machines and label sequence learning. In 87% (coding and untranslated regions and 95% (coding regions only of all genes tested in several out-of-sample evaluations, our method correctly identified all exons and introns. Notably, only 37% and 50%, respectively, of the presently unconfirmed genes in the C. elegans genome annotation agree with our predictions, thus we hypothesize that a sizable fraction of those genes are not correctly annotated. A retrospective evaluation of the Wormbase WS120 annotation [] of C. elegans reveals that splice form predictions on unconfirmed genes in WS120 are inaccurate in about 18% of the considered cases, while our predictions deviate from the truth only in 10%-13%. We experimentally analyzed 20 controversial genes on which our system and the annotation disagree, confirming the superiority of our predictions. While our method correctly predicted 75% of those cases, the standard annotation was never completely correct. The accuracy of our system is further corroborated by a comparison with two other recently proposed systems that can be used for splice form prediction: SNAP and ExonHunter. We conclude that the genome annotation of C. elegans and other organisms can be greatly enhanced using modern machine learning technology.

  16. Semantic annotation of biological concepts interplaying microbial cellular responses

    Directory of Open Access Journals (Sweden)

    Carreira Rafael

    2011-11-01

    Full Text Available Abstract Background Automated extraction systems have become a time saving necessity in Systems Biology. Considerable human effort is needed to model, analyse and simulate biological networks. Thus, one of the challenges posed to Biomedical Text Mining tools is that of learning to recognise a wide variety of biological concepts with different functional roles to assist in these processes. Results Here, we present a novel corpus concerning the integrated cellular responses to nutrient starvation in the model-organism Escherichia coli. Our corpus is a unique resource in that it annotates biomedical concepts that play a functional role in expression, regulation and metabolism. Namely, it includes annotations for genetic information carriers (genes and DNA, RNA molecules, proteins (transcription factors, enzymes and transporters, small metabolites, physiological states and laboratory techniques. The corpus consists of 130 full-text papers with a total of 59043 annotations for 3649 different biomedical concepts; the two dominant classes are genes (highest number of unique concepts and compounds (most frequently annotated concepts, whereas other important cellular concepts such as proteins account for no more than 10% of the annotated concepts. Conclusions To the best of our knowledge, a corpus that details such a wide range of biological concepts has never been presented to the text mining community. The inter-annotator agreement statistics provide evidence of the importance of a consolidated background when dealing with such complex descriptions, the ambiguities naturally arising from the terminology and their impact for modelling purposes. Availability is granted for the full-text corpora of 130 freely accessible documents, the annotation scheme and the annotation guidelines. Also, we include a corpus of 340 abstracts.

  17. An Annotation Scheme for Free Word Order Languages

    CERN Document Server

    Skut, W; Brants, T; Uszkoreit, H; Skut, Wojciech; Krenn, Brigitte; Brants, Thorsten; Uszkoreit, Hans

    1997-01-01

    We describe an annotation scheme and a tool developed for creating linguistically annotated corpora for non-configurational languages. Since the requirements for such a formalism differ from those posited for configurational languages, several features have been added, influencing the architecture of the scheme. The resulting scheme reflects a stratificational notion of language, and makes only minimal assumptions about the interrelation of the particular representational strata.

  18. Challenges in Whole-Genome Annotation of Pyrosequenced Eukaryotic Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Kuo, Alan; Grigoriev, Igor

    2009-04-17

    Pyrosequencing technologies such as 454/Roche and Solexa/Illumina vastly lower the cost of nucleotide sequencing compared to the traditional Sanger method, and thus promise to greatly expand the number of sequenced eukaryotic genomes. However, the new technologies also bring new challenges such as shorter reads and new kinds and higher rates of sequencing errors, which complicate genome assembly and gene prediction. At JGI we are deploying 454 technology for the sequencing and assembly of ever-larger eukaryotic genomes. Here we describe our first whole-genome annotation of a purely 454-sequenced fungal genome that is larger than a yeast (>30 Mbp). The pezizomycotine (filamentous ascomycote) Aspergillus carbonarius belongs to the Aspergillus section Nigri species complex, members of which are significant as platforms for bioenergy and bioindustrial technology, as members of soil microbial communities and players in the global carbon cycle, and as agricultural toxigens. Application of a modified version of the standard JGI Annotation Pipeline has so far predicted ~;;10k genes. ~;;12percent of these preliminary annotations suffer a potential frameshift error, which is somewhat higher than the ~;;9percent rate in the Sanger-sequenced and conventionally assembled and annotated genome of fellow Aspergillus section Nigri member A. niger. Also,>90percent of A. niger genes have potential homologs in the A. carbonarius preliminary annotation. Weconclude, and with further annotation and comparative analysis expect to confirm, that 454 sequencing strategies provide a promising substrate for annotation of modestly sized eukaryotic genomes. We will also present results of annotation of a number of other pyrosequenced fungal genomes of bioenergy interest.

  19. Annotation Method (AM): SE16_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE16_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  20. Annotation Method (AM): SE41_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available se search. Peaks with no hit to these databases are then selected to secondary sear...ch using EX-HR2 (http://webs2.kazusa.or.jp/mfsearcher/) databases. After the database search processes, each database...SE41_AM1 PowerGet annotation In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary databa

  1. Annotation Method (AM): SE1_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE1_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  2. Annotation Method (AM): SE29_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE29_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  3. Annotation Method (AM): SE28_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE28_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  4. Annotation Method (AM): SE25_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE25_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  5. Annotation Method (AM): SE40_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available se search. Peaks with no hit to these databases are then selected to secondary sear...ch using EX-HR2 (http://webs2.kazusa.or.jp/mfsearcher/) databases. After the database search processes, each database...SE40_AM1 PowerGet annotation In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary databa

  6. Annotation Method (AM): SE32_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE32_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  7. Annotation Method (AM): SE12_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE12_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  8. Annotation Method (AM): SE14_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE14_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  9. Annotation Method (AM): SE8_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE8_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  10. Annotation Method (AM): SE9_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE9_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  11. Annotation Method (AM): SE27_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE27_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  12. Annotation Method (AM): SE33_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE33_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  13. Annotation Method (AM): SE15_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE15_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  14. Annotation Method (AM): SE4_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE4_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  15. Annotation Method (AM): SE30_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE30_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  16. Annotation Method (AM): SE13_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE13_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  17. Annotation Method (AM): SE11_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE11_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  18. Annotation Method (AM): SE34_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE34_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  19. Annotation Method (AM): SE7_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE7_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  20. Annotation Method (AM): SE5_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE5_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  1. Annotation Method (AM): SE2_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE2_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  2. Annotation Method (AM): SE17_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE17_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  3. Annotation Method (AM): SE20_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE20_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  4. Annotation Method (AM): SE3_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE3_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  5. Annotation Method (AM): SE35_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE35_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  6. Annotation Method (AM): SE36_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE36_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  7. Annotation Method (AM): SE6_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE6_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  8. Annotation Method (AM): SE31_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE31_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  9. Annotation Method (AM): SE10_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE10_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  10. Annotation Method (AM): SE26_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE26_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  11. Semantator: Annotating Clinical Narratives with Semantic Web Ontologies

    OpenAIRE

    Song, Dezhao; Chute, Christopher G; Tao, Cui

    2012-01-01

    To facilitate clinical research, clinical data needs to be stored in a machine processable and understandable way. Manual annotating clinical data is time consuming. Automatic approaches (e.g., Natural Language Processing systems) have been adopted to convert such data into structured formats; however, the quality of such automatically extracted data may not always be satisfying. In this paper, we propose Semantator, a semi-automatic tool for document annotation with Semantic Web ontologies. ...

  12. The Annotation Motivations and Characteristics of HUANG Jie ’s Annotation of XIE Kang - le’ s Poems%黄节《谢康乐诗注》的笺注动机和特色

    Institute of Scientific and Technical Information of China (English)

    李凤娇; 吉定

    2014-01-01

    HUANG Jie was devoted to the poetic annotations of the Han ,Wei and Six Dynasties and made a lot of achievements . Annotation of XIE Kang- le’ s Poems is among the classic ones .There are four reasons for HUANG Jie’s focus on poetic writings and poetic annotations :one is to educate and reform people and rectify public morals ;the second is to preserve the quintessence of Chinese culture ,to revive the ancient study ,and to revive“the special spirit of the country”;the third is to pay great attention to the decline of Kangle’s poetry and his worries which were always misunderstood ,so as to correct mistakes and commentate misun-derstandings ;the fourth is the result of the spiritual echo and the emotional alignment with XIE .Annotation of XIE Kang- le’ s Po-ems inherited the traditions of LI Shan’s annotation of Wen Xuan ,played up strengths and avoided weakness ;it commented on the poems with historical facts ,and annotated the poems in a poetic way ;this annotation had a very careful and objective language , pooled the good qualities of the masses .This annotation is also characterized by its annotation of the names of places and the annota-tion of consonants and vowels .Thus the above characteristics of HUANG’s annotation can considerably help people understand XIE’ s poems and the annotation has great academic values .%黄节先生曾致力于汉魏六朝诗注,成就斐然。《谢康乐诗注》便是其中的经典注释之一种。黄节先生之所以以诗为业,写诗注诗,一是为教化人心、匡正世风;二为保存国粹、复兴古学,重振“国家特别之精神”;三是心系康乐诗散亡、不易为人理解之忧而纠谬释误;四是与谢灵运精神相通、情感契合所致。《谢康乐诗注》继承李善《文选》注的传统,扬长避短;以史论诗、以诗注诗;案语严谨折衷、集众家所长。对于地名和声韵的注释,也颇具特点。以上笺注特色均为今人

  13. MetaStorm: A Public Resource for Customizable Metagenomics Annotation

    Science.gov (United States)

    Arango-Argoty, Gustavo; Singh, Gargi; Heath, Lenwood S.; Pruden, Amy; Xiao, Weidong; Zhang, Liqing

    2016-01-01

    Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/), which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution. PMID:27632579

  14. MITOS: improved de novo metazoan mitochondrial genome annotation.

    Science.gov (United States)

    Bernt, Matthias; Donath, Alexander; Jühling, Frank; Externbrink, Fabian; Florentz, Catherine; Fritzsch, Guido; Pütz, Joern; Middendorf, Martin; Stadler, Peter F

    2013-11-01

    About 2000 completely sequenced mitochondrial genomes are available from the NCBI RefSeq data base together with manually curated annotations of their protein-coding genes, rRNAs, and tRNAs. This annotation information, which has accumulated over two decades, has been obtained with a diverse set of computational tools and annotation strategies. Despite all efforts of manual curation it is still plagued by misassignments of reading directions, erroneous gene names, and missing as well as false positive annotations in particular for the RNA genes. Taken together, this causes substantial problems for fully automatic pipelines that aim to use these data comprehensively for studies of animal phylogenetics and the molecular evolution of mitogenomes. The MITOS pipeline is designed to compute a consistent de novo annotation of the mitogenomic sequences. We show that the results of MITOS match RefSeq and MitoZoa in terms of annotation coverage and quality. At the same time we avoid biases, inconsistencies of nomenclature, and typos originating from manual curation strategies. The MITOS pipeline is accessible online at http://mitos.bioinf.uni-leipzig.de.

  15. Metagenomic gene annotation by a homology-independent approach

    Energy Technology Data Exchange (ETDEWEB)

    Froula, Jeff; Zhang, Tao; Salmeen, Annette; Hess, Matthias; Kerfeld, Cheryl A.; Wang, Zhong; Du, Changbin

    2011-06-02

    Fully understanding the genetic potential of a microbial community requires functional annotation of all the genes it encodes. The recently developed deep metagenome sequencing approach has enabled rapid identification of millions of genes from a complex microbial community without cultivation. Current homology-based gene annotation fails to detect distantly-related or structural homologs. Furthermore, homology searches with millions of genes are very computational intensive. To overcome these limitations, we developed rhModeller, a homology-independent software pipeline to efficiently annotate genes from metagenomic sequencing projects. Using cellulases and carbonic anhydrases as two independent test cases, we demonstrated that rhModeller is much faster than HMMER but with comparable accuracy, at 94.5percent and 99.9percent accuracy, respectively. More importantly, rhModeller has the ability to detect novel proteins that do not share significant homology to any known protein families. As {approx}50percent of the 2 million genes derived from the cow rumen metagenome failed to be annotated based on sequence homology, we tested whether rhModeller could be used to annotate these genes. Preliminary results suggest that rhModeller is robust in the presence of missense and frameshift mutations, two common errors in metagenomic genes. Applying the pipeline to the cow rumen genes identified 4,990 novel cellulases candidates and 8,196 novel carbonic anhydrase candidates.In summary, we expect rhModeller to dramatically increase the speed and quality of metagnomic gene annotation.

  16. Multiple ontologies in action: composite annotations for biosimulation models.

    Science.gov (United States)

    Gennari, John H; Neal, Maxwell L; Galdzicki, Michal; Cook, Daniel L

    2011-02-01

    There now exists a rich set of ontologies that provide detailed semantics for biological entities of interest. However, there is not (nor should there be) a single source ontology that provides all the necessary semantics for describing biological phenomena. In the domain of physiological biosimulation models, researchers use annotations to convey semantics, and many of these annotations require the use of multiple reference ontologies. Therefore, we have developed the idea of composite annotations that access multiple ontologies to capture the physics-based meaning of model variables. These composite annotations provide the semantic expressivity needed to disambiguate the often-complex features of biosimulation models, and can be used to assist with model merging and interoperability. In this paper, we demonstrate the utility of composite annotations for model merging by describing their use within SemGen, our semantics-based model composition software. More broadly, if orthogonal reference ontologies are to meet their full potential, users need tools and methods to connect and link these ontologies. Our composite annotations and the SemGen tool provide one mechanism for leveraging multiple reference ontologies.

  17. AutoFACT: An Automatic Functional Annotation and Classification Tool

    Directory of Open Access Journals (Sweden)

    Lang B Franz

    2005-06-01

    Full Text Available Abstract Background Assignment of function to new molecular sequence data is an essential step in genomics projects. The usual process involves similarity searches of a given sequence against one or more databases, an arduous process for large datasets. Results We present AutoFACT, a fully automated and customizable annotation tool that assigns biologically informative functions to a sequence. Key features of this tool are that it (1 analyzes nucleotide and protein sequence data; (2 determines the most informative functional description by combining multiple BLAST reports from several user-selected databases; (3 assigns putative metabolic pathways, functional classes, enzyme classes, GeneOntology terms and locus names; and (4 generates output in HTML, text and GFF formats for the user's convenience. We have compared AutoFACT to four well-established annotation pipelines. The error rate of functional annotation is estimated to be only between 1–2%. Comparison of AutoFACT to the traditional top-BLAST-hit annotation method shows that our procedure increases the number of functionally informative annotations by approximately 50%. Conclusion AutoFACT will serve as a useful annotation tool for smaller sequencing groups lacking dedicated bioinformatics staff. It is implemented in PERL and runs on LINUX/UNIX platforms. AutoFACT is available at http://megasun.bch.umontreal.ca/Software/AutoFACT.htm.

  18. MIPS: analysis and annotation of genome information in 2007.

    Science.gov (United States)

    Mewes, H W; Dietmann, S; Frishman, D; Gregory, R; Mannhaupt, G; Mayer, K F X; Münsterkötter, M; Ruepp, A; Spannagl, M; Stümpflen, V; Rattei, T

    2008-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) combines automatic processing of large amounts of sequences with manual annotation of selected model genomes. Due to the massive growth of the available data, the depth of annotation varies widely between independent databases. Also, the criteria for the transfer of information from known to orthologous sequences are diverse. To cope with the task of global in-depth genome annotation has become unfeasible. Therefore, our efforts are dedicated to three levels of annotation: (i) the curation of selected genomes, in particular from fungal and plant taxa (e.g. CYGD, MNCDB, MatDB), (ii) the comprehensive, consistent, automatic annotation employing exhaustive methods for the computation of sequence similarities and sequence-related attributes as well as the classification of individual sequences (SIMAP, PEDANT and FunCat) and (iii) the compilation of manually curated databases for protein interactions based on scrutinized information from the literature to serve as an accepted set of reliable annotated interaction data (MPACT, MPPI, CORUM). All databases and tools described as well as the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).

  19. A semi-automatic annotation tool for cooking video

    Science.gov (United States)

    Bianco, Simone; Ciocca, Gianluigi; Napoletano, Paolo; Schettini, Raimondo; Margherita, Roberto; Marini, Gianluca; Gianforme, Giorgio; Pantaleo, Giuseppe

    2013-03-01

    In order to create a cooking assistant application to guide the users in the preparation of the dishes relevant to their profile diets and food preferences, it is necessary to accurately annotate the video recipes, identifying and tracking the foods of the cook. These videos present particular annotation challenges such as frequent occlusions, food appearance changes, etc. Manually annotate the videos is a time-consuming, tedious and error-prone task. Fully automatic tools that integrate computer vision algorithms to extract and identify the elements of interest are not error free, and false positive and false negative detections need to be corrected in a post-processing stage. We present an interactive, semi-automatic tool for the annotation of cooking videos that integrates computer vision techniques under the supervision of the user. The annotation accuracy is increased with respect to completely automatic tools and the human effort is reduced with respect to completely manual ones. The performance and usability of the proposed tool are evaluated on the basis of the time and effort required to annotate the same video sequences.

  20. Morphological annotation of Korean with Directly Maintainable Resources

    CERN Document Server

    Berlocher, Ivan; Laporte, Eric; Nam, Jee-Sun

    2007-01-01

    This article describes an exclusively resource-based method of morphological annotation of written Korean text. Korean is an agglutinative language. Our annotator is designed to process text before the operation of a syntactic parser. In its present state, it annotates one-stem words only. The output is a graph of morphemes annotated with accurate linguistic information. The granularity of the tagset is 3 to 5 times higher than usual tagsets. A comparison with a reference annotated corpus showed that it achieves 89% recall without any corpus training. The language resources used by the system are lexicons of stems, transducers of suffixes and transducers of generation of allomorphs. All can be easily updated, which allows users to control the evolution of the performances of the system. It has been claimed that morphological annotation of Korean text could only be performed by a morphological analysis module accessing a lexicon of morphemes. We show that it can also be performed directly with a lexicon of wor...

  1. Experiments with crowdsourced re-annotation of a POS tagging data set

    DEFF Research Database (Denmark)

    Hovy, Dirk; Plank, Barbara; Søgaard, Anders

    2014-01-01

    Crowdsourcing lets us collect multiple annotations for an item from several annotators. Typically, these are annotations for non-sequential classification tasks. While there has been some work on crowdsourcing named entity annotations, researchers have assumed that syntactic tasks such as part......-of-speech (POS) tagging cannot be crowdsourced. This paper shows that workers can actually annotate sequential data almost as well as experts. Further, we show that the models learned from crowdsourced annotations fare as well as the models learned from expert annotations in downstream tasks....

  2. Enriching a biomedical event corpus with meta-knowledge annotation

    Directory of Open Access Journals (Sweden)

    Thompson Paul

    2011-10-01

    Full Text Available Abstract Background Biomedical papers contain rich information about entities, facts and events of biological relevance. To discover these automatically, we use text mining techniques, which rely on annotated corpora for training. In order to extract protein-protein interactions, genotype-phenotype/gene-disease associations, etc., we rely on event corpora that are annotated with classified, structured representations of important facts and findings contained within text. These provide an important resource for the training of domain-specific information extraction (IE systems, to facilitate semantic-based searching of documents. Correct interpretation of these events is not possible without additional information, e.g., does an event describe a fact, a hypothesis, an experimental result or an analysis of results? How confident is the author about the validity of her analyses? These and other types of information, which we collectively term meta-knowledge, can be derived from the context of the event. Results We have designed an annotation scheme for meta-knowledge enrichment of biomedical event corpora. The scheme is multi-dimensional, in that each event is annotated for 5 different aspects of meta-knowledge that can be derived from the textual context of the event. Textual clues used to determine the values are also annotated. The scheme is intended to be general enough to allow integration with different types of bio-event annotation, whilst being detailed enough to capture important subtleties in the nature of the meta-knowledge expressed in the text. We report here on both the main features of the annotation scheme, as well as its application to the GENIA event corpus (1000 abstracts with 36,858 events. High levels of inter-annotator agreement have been achieved, falling in the range of 0.84-0.93 Kappa. Conclusion By augmenting event annotations with meta-knowledge, more sophisticated IE systems can be trained, which allow interpretative

  3. Establishment of the U.S. castor (Ricinus communis L.) core collection using seed chemical composition analysis and genotyping with EST-SSR markers

    Science.gov (United States)

    Natural genetic variation exists in the plant germplasm collections. Normally the germplasm collection for a specific species encompasses many accessions. Due to its large number of accessions, the entire collection is hard to handle and can’t be easily utilized. To facilitate the end-users (such as...

  4. Construction of an annotated corpus to support biomedical information extraction

    Directory of Open Access Journals (Sweden)

    McNaught John

    2009-10-01

    Full Text Available Abstract Background Information Extraction (IE is a component of text mining that facilitates knowledge discovery by automatically locating instances of interesting biomedical events from huge document collections. As events are usually centred on verbs and nominalised verbs, understanding the syntactic and semantic behaviour of these words is highly important. Corpora annotated with information concerning this behaviour can constitute a valuable resource in the training of IE components and resources. Results We have defined a new scheme for annotating sentence-bound gene regulation events, centred on both verbs and nominalised verbs. For each event instance, all participants (arguments in the same sentence are identified and assigned a semantic role from a rich set of 13 roles tailored to biomedical research articles, together with a biological concept type linked to the Gene Regulation Ontology. To our knowledge, our scheme is unique within the biomedical field in terms of the range of event arguments identified. Using the scheme, we have created the Gene Regulation Event Corpus (GREC, consisting of 240 MEDLINE abstracts, in which events relating to gene regulation and expression have been annotated by biologists. A novel method of evaluating various different facets of the annotation task showed that average inter-annotator agreement rates fall within the range of 66% - 90%. Conclusion The GREC is a unique resource within the biomedical field, in that it annotates not only core relationships between entities, but also a range of other important details about these relationships, e.g., location, temporal, manner and environmental conditions. As such, it is specifically designed to support bio-specific tool and resource development. It has already been used to acquire semantic frames for inclusion within the BioLexicon (a lexical, terminological resource to aid biomedical text mining. Initial experiments have also shown that the corpus may

  5. Using deep RNA sequencing for the structural annotation of the Laccaria bicolor mycorrhizal transcriptome.

    Directory of Open Access Journals (Sweden)

    Peter E Larsen

    Full Text Available BACKGROUND: Accurate structural annotation is important for prediction of function and required for in vitro approaches to characterize or validate the gene expression products. Despite significant efforts in the field, determination of the gene structure from genomic data alone is a challenging and inaccurate process. The ease of acquisition of transcriptomic sequence provides a direct route to identify expressed sequences and determine the correct gene structure. METHODOLOGY: We developed methods to utilize RNA-seq data to correct errors in the structural annotation and extend the boundaries of current gene models using assembly approaches. The methods were validated with a transcriptomic data set derived from the fungus Laccaria bicolor, which develops a mycorrhizal symbiotic association with the roots of many tree species. Our analysis focused on the subset of 1501 gene models that are differentially expressed in the free living vs. mycorrhizal transcriptome and are expected to be important elements related to carbon metabolism, membrane permeability and transport, and intracellular signaling. Of the set of 1501 gene models, 1439 (96% successfully generated modified gene models in which all error flags were successfully resolved and the sequences aligned to the genomic sequence. The remaining 4% (62 gene models either had deviations from transcriptomic data that could not be spanned or generated sequence that did not align to genomic sequence. The outcome of this process is a set of high confidence gene models that can be reliably used for experimental characterization of protein function. CONCLUSIONS: 69% of expressed mycorrhizal JGI "best" gene models deviated from the transcript sequence derived by this method. The transcriptomic sequence enabled correction of a majority of the structural inconsistencies and resulted in a set of validated models for 96% of the mycorrhizal genes. The method described here can be applied to improve gene

  6. Using deep RNA sequencing for the structural annotation of the laccaria bicolor mycorrhizal transcriptome.

    Energy Technology Data Exchange (ETDEWEB)

    Larsen, P. E.; Trivedi, G.; Sreedasyam, A.; Lu, V.; Podila, G. K.; Collart, F. R.; Biosciences Division; Univ. of Alabama

    2010-07-06

    Accurate structural annotation is important for prediction of function and required for in vitro approaches to characterize or validate the gene expression products. Despite significant efforts in the field, determination of the gene structure from genomic data alone is a challenging and inaccurate process. The ease of acquisition of transcriptomic sequence provides a direct route to identify expressed sequences and determine the correct gene structure. We developed methods to utilize RNA-seq data to correct errors in the structural annotation and extend the boundaries of current gene models using assembly approaches. The methods were validated with a transcriptomic data set derived from the fungus Laccaria bicolor, which develops a mycorrhizal symbiotic association with the roots of many tree species. Our analysis focused on the subset of 1501 gene models that are differentially expressed in the free living vs. mycorrhizal transcriptome and are expected to be important elements related to carbon metabolism, membrane permeability and transport, and intracellular signaling. Of the set of 1501 gene models, 1439 (96%) successfully generated modified gene models in which all error flags were successfully resolved and the sequences aligned to the genomic sequence. The remaining 4% (62 gene models) either had deviations from transcriptomic data that could not be spanned or generated sequence that did not align to genomic sequence. The outcome of this process is a set of high confidence gene models that can be reliably used for experimental characterization of protein function. 69% of expressed mycorrhizal JGI 'best' gene models deviated from the transcript sequence derived by this method. The transcriptomic sequence enabled correction of a majority of the structural inconsistencies and resulted in a set of validated models for 96% of the mycorrhizal genes. The method described here can be applied to improve gene structural annotation in other species, provided

  7. Effective and Efficient Multi-Facet Web Image Annotation

    Institute of Scientific and Technical Information of China (English)

    Jia Chen; Yi-He Zhu; Hao-Fen Wang; Wei Jin; Yong Yu

    2012-01-01

    The vast amount of images available on the Web request for an effective and efficient search service to help users find relevant images.The prevalent way is to provide a keyword interface for users to submit queries.However,the amount of images without any tags or annotations are beyond the reach of manual efforts.To overcome this,automatic image annotation techniques emerge,which are generally a process of selecting a suitable set of tags for a given image without user intervention.However,there are three main challenges with respect to Web-scale image annotation:scalability,noiseresistance and diversity.Scalability has a twofold meaning:first an automatic image annotation system should be scalable with respect to billions of images on the Web; second it should be able to automatically identify several relevant tags among a huge tag set for a given image within seconds or even faster.Noise-resistance means that the system should be robust enough against typos and ambiguous terms used in tags.Diversity represents that image content may include both scenes and objects,which are further described by multiple different image features constituting different facets in annotation.In this paper,we propose a unified framework to tackle the above three challenges for automatic Web image annotation.It mainly involves two components:tag candidate retrieval and multi-facet annotation.In the former content-based indexing and concept-based codebook are leveraged to solve scalability and noise-resistance issues.In the latter the joint feature map has been designed to describe different facets of tags in annotations and the relations between these facets.Tag graph is adopted to represent tags in the entire annotation and the structured learning technique is employed to construct a learning model on top of the tag graph based on the generated joint feature map.Millions of images from Flickr are used in our evaluation.Experimental results show that we have achieved 33% performance

  8. ESLO: from transcription to speakers' personal information annotation

    CERN Document Server

    Eshkol, Iris; Friburger, Nathalie

    2011-01-01

    This paper presents the preliminary works to put online a French oral corpus and its transcription. This corpus is the Socio-Linguistic Survey in Orleans, realized in 1968. First, we numerized the corpus, then we handwritten transcribed it with the Transcriber software adding different tags about speakers, time, noise, etc. Each document (audio file and XML file of the transcription) was described by a set of metadata stored in an XML format to allow an easy consultation. Second, we added different levels of annotations, recognition of named entities and annotation of personal information about speakers. This two annotation tasks used the CasSys system of transducer cascades. We used and modified a first cascade to recognize named entities. Then we built a second cascade to annote the designating entities, i.e. information about the speaker. These second cascade parsed the named entity annotated corpus. The objective is to locate information about the speaker and, also, what kind of information can designate ...

  9. DisBlue+: A distributed annotation-based C# compiler

    Directory of Open Access Journals (Sweden)

    Samir E. AbdelRahman

    2010-06-01

    Full Text Available Many programming languages utilize annotations to add useful information to the program but they still result in more tokens to be compiled and hence slower compilation time. Any current distributed compiler breaks the program into scattered disjoint pieces to speed up the compilation. However, these pieces cooperate synchronously and depend highly on each other. This causes massive overhead since messages, symbols, or codes must be roamed throughout the network. This paper presents two promising compilers named annotation-based C# (Blue+ and distributed annotation-based C# (DisBlue+. The proposed Blue+ annotation is based on axiomatic semantics to replace the if/loop constructs. As the developer tends to use many (complex conditions and repeat them in the program, such annotations reduce the compilation scanning time and increases the whole code readability. Built on the top of Blue+, DisBlue+ presents its proposed distributed concept which is to divide each program class to its prototype and definition, as disjoint distributed pieces, such that each class definition is compiled with only its related compiled prototypes (interfaces. Such concept reduces the amount of code transferred over the network, minimizes the dependencies among the disjoint pieces, and removes any possible synchronization between them. To test their efficiencies, Blue+ and DisBlue+ were verified with large-size codes against some existing compilers namely Javac, DJavac, and CDjava.

  10. Automatic annotation of lecture videos for multimedia driven pedagogical platforms

    Directory of Open Access Journals (Sweden)

    Ali Shariq Imran

    2016-12-01

    Full Text Available Today’s eLearning websites are heavily loaded with multimedia contents, which are often unstructured, unedited, unsynchronized, and lack inter-links among different multimedia components. Hyperlinking different media modality may provide a solution for quick navigation and easy retrieval of pedagogical content in media driven eLearning websites. In addition, finding meta-data information to describe and annotate media content in eLearning platforms is challenging, laborious, prone to errors, and time-consuming task. Thus annotations for multimedia especially of lecture videos became an important part of video learning objects. To address this issue, this paper proposes three major contributions namely, automated video annotation, the 3-Dimensional (3D tag clouds, and the hyper interactive presenter (HIP eLearning platform. Combining existing state-of-the-art SIFT together with tag cloud, a novel approach for automatic lecture video annotation for the HIP is proposed. New video annotations are implemented automatically providing the needed random access in lecture videos within the platform, and a 3D tag cloud is proposed as a new way of user interaction mechanism. A preliminary study of the usefulness of the system has been carried out, and the initial results suggest that 70% of the students opted for using HIP as their preferred eLearning platform at Gjøvik University College (GUC.

  11. Supporting Personal Semantic Annotations in P2P Semantic Wikis

    Science.gov (United States)

    Torres, Diego; Skaf-Molli, Hala; Díaz, Alicia; Molli, Pascal

    In this paper, we propose to extend Peer-to-Peer Semantic Wikis with personal semantic annotations. Semantic Wikis are one of the most successful Semantic Web applications. In semantic wikis, wikis pages are annotated with semantic data to facilitate the navigation, information retrieving and ontology emerging. Semantic data represents the shared knowledge base which describes the common understanding of the community. However, in a collaborative knowledge building process the knowledge is basically created by individuals who are involved in a social process. Therefore, it is fundamental to support personal knowledge building in a differentiated way. Currently there are no available semantic wikis that support both personal and shared understandings. In order to overcome this problem, we propose a P2P collaborative knowledge building process and extend semantic wikis with personal annotations facilities to express personal understanding. In this paper, we detail the personal semantic annotation model and show its implementation in P2P semantic wikis. We also detail an evaluation study which shows that personal annotations demand less cognitive efforts than semantic data and are very useful to enrich the shared knowledge base.

  12. ESTExplorer: an expressed sequence tag (EST) assembly and annotation platform.

    Science.gov (United States)

    Nagaraj, Shivashankar H; Deshpande, Nandan; Gasser, Robin B; Ranganathan, Shoba

    2007-07-01

    The analysis of expressed sequence tag (EST) datasets offers a rapid and cost-effective approach to elucidate the transcriptome of an organism, but requiring several computational methods for assembly and annotation. ESTExplorer is a comprehensive workflow system for EST data management and analysis. The pipeline uses a 'distributed control approach' in which the most appropriate bioinformatics tools are implemented over different dedicated processors. Species-specific repeat masking and conceptual translation are in-built. ESTExplorer accepts a set of ESTs in FASTA format which can be analysed using programs selected by the user. After pre-processing and assembly, the dataset is annotated at the nucleotide and protein levels, following conceptual translation. Users may optionally provide ESTExplorer with assembled contigs for annotation purposes. Functionally annotated contigs/ESTs can be analysed individually. The overall outputs are gene ontologies, protein functional identifications in terms of mapping to protein domains and metabolic pathways. ESTExplorer has been applied successfully to annotate large EST datasets from parasitic nematodes and to identify novel genes as potential targets for parasite intervention. ESTExplorer runs on a Linux cluster and is freely available for the academic community at http://estexplorer.biolinfo.org.

  13. Graph-based sequence annotation using a data integration approach.

    Science.gov (United States)

    Pesch, Robert; Lysenko, Artem; Hindle, Matthew; Hassani-Pak, Keywan; Thiele, Ralf; Rawlings, Christopher; Köhler, Jacob; Taubert, Jan

    2008-08-25

    The automated annotation of data from high throughput sequencing and genomics experiments is a significant challenge for bioinformatics. Most current approaches rely on sequential pipelines of gene finding and gene function prediction methods that annotate a gene with information from different reference data sources. Each function prediction method contributes evidence supporting a functional assignment. Such approaches generally ignore the links between the information in the reference datasets. These links, however, are valuable for assessing the plausibility of a function assignment and can be used to evaluate the confidence in a prediction. We are working towards a novel annotation system that uses the network of information supporting the function assignment to enrich the annotation process for use by expert curators and predicting the function of previously unannotated genes. In this paper we describe our success in the first stages of this development. We present the data integration steps that are needed to create the core database of integrated reference databases (UniProt, PFAM, PDB, GO and the pathway database Ara-Cyc) which has been established in the ONDEX data integration system. We also present a comparison between different methods for integration of GO terms as part of the function assignment pipeline and discuss the consequences of this analysis for improving the accuracy of gene function annotation. The methods and algorithms presented in this publication are an integral part of the ONDEX system which is freely available from http://ondex.sf.net/.

  14. Annotations of Mexican bullfighting videos for semantic index

    Science.gov (United States)

    Montoya Obeso, Abraham; Oropesa Morales, Lester Arturo; Fernando Vázquez, Luis; Cocolán Almeda, Sara Ivonne; Stoian, Andrei; García Vázquez, Mireya Saraí; Zamudio Fuentes, Luis Miguel; Montiel Perez, Jesús Yalja; de la O Torres, Saul; Ramírez Acosta, Alejandro Alvaro

    2015-09-01

    The video annotation is important for web indexing and browsing systems. Indeed, in order to evaluate the performance of video query and mining techniques, databases with concept annotations are required. Therefore, it is necessary generate a database with a semantic indexing that represents the digital content of the Mexican bullfighting atmosphere. This paper proposes a scheme to make complex annotations in a video in the frame of multimedia search engine project. Each video is partitioned using our segmentation algorithm that creates shots of different length and different number of frames. In order to make complex annotations about the video, we use ELAN software. The annotations are done in two steps: First, we take note about the whole content in each shot. Second, we describe the actions as parameters of the camera like direction, position and deepness. As a consequence, we obtain a more complete descriptor of every action. In both cases we use the concepts of the TRECVid 2014 dataset. We also propose new concepts. This methodology allows to generate a database with the necessary information to create descriptors and algorithms capable to detect actions to automatically index and classify new bullfighting multimedia content.

  15. Evaluating Functional Annotations of Enzymes Using the Gene Ontology.

    Science.gov (United States)

    Holliday, Gemma L; Davidson, Rebecca; Akiva, Eyal; Babbitt, Patricia C

    2017-01-01

    The Gene Ontology (GO) (Ashburner et al., Nat Genet 25(1):25-29, 2000) is a powerful tool in the informatics arsenal of methods for evaluating annotations in a protein dataset. From identifying the nearest well annotated homologue of a protein of interest to predicting where misannotation has occurred to knowing how confident you can be in the annotations assigned to those proteins is critical. In this chapter we explore what makes an enzyme unique and how we can use GO to infer aspects of protein function based on sequence similarity. These can range from identification of misannotation or other errors in a predicted function to accurate function prediction for an enzyme of entirely unknown function. Although GO annotation applies to any gene products, we focus here a describing our approach for hierarchical classification of enzymes in the Structure-Function Linkage Database (SFLD) (Akiva et al., Nucleic Acids Res 42(Database issue):D521-530, 2014) as a guide for informed utilisation of annotation transfer based on GO terms.

  16. Annotation of the Asian Citrus Psyllid Genome Reveals a Reduced Innate Immune System.

    Science.gov (United States)

    Arp, Alex P; Hunter, Wayne B; Pelz-Stelinski, Kirsten S

    2016-01-01

    Citrus production worldwide is currently facing significant losses due to citrus greening disease, also known as Huanglongbing. The citrus greening bacteria, Candidatus Liberibacter asiaticus (CLas), is a persistent propagative pathogen transmitted by the Asian citrus psyllid, Diaphorina citri Kuwayama (Hemiptera: Liviidae). Hemipterans characterized to date lack a number of insect immune genes, including those associated with the Imd pathway targeting Gram-negative bacteria. The D. citri draft genome was used to characterize the immune defense genes present in D. citri. Predicted mRNAs identified by screening the published D. citri annotated draft genome were manually searched using a custom database of immune genes from previously annotated insect genomes. Toll and JAK/STAT pathways, general defense genes Dual oxidase, Nitric oxide synthase, prophenoloxidase, and cellular immune defense genes were present in D. citri. In contrast, D. citri lacked genes for the Imd pathway, most antimicrobial peptides, 1,3-β-glucan recognition proteins (GNBPs), and complete peptidoglycan recognition proteins. These data suggest that D. citri has a reduced immune capability similar to that observed in A. pisum, P. humanus, and R. prolixus. The absence of immune system genes from the D. citri genome may facilitate CLas infections, and is possibly compensated for by their relationship with their microbial endosymbionts.

  17. Pattern matching in indeterminate and Arc-annotated sequences.

    Science.gov (United States)

    Aumi, Md Tanvir Islam; Moosa, Tanaeem M; Rahman, M Sohel

    2013-08-01

    In this paper, we present efficient algorithms for finding indeterminate Arc-Annotated patterns in indeterminate Arc-Annotated references. Our algorithms run in O(m+ (nm) w) time where n and m are respectively the length of our reference and pattern strings and w is the target machine word size. Here we have assumed the alphabet size to be constant, because, indeterminate Arc-Annotated sequences are used to model biological sequences. Clearly, for short patterns, our algorithms run in linear time and efficient algorithms for matching short patterns to reference genomes have huge applications in practical settings. We have also applied our algorithms to scan the ncRNAs without pseudoknots. We scanned three whole human chromosomes and it took only 2.5 - 4 minutes to scan one whole chromosome for an ncRNA family. Some relevant patents are discussed in.

  18. Semantic annotation for live and posterity logging of video documents

    Science.gov (United States)

    Bertini, Marco; Del Bimbo, Alberto; Nunziati, W.

    2003-06-01

    Broadcasters usually envision two basic applications for video databases: Live Logging and Posterity Logging. The former aims at providing effective annotation of video in quasi-real time and supports extraction of meaningful clips from the live stream; it is usually performed by assistant producers working at the same location of the event. The latter provides annotation for later reuse of video material and is the prerequisite for retrieval by content from video digital libraries; it is performed by trained librarians. Both require that annotation is performed, at a great extent, automatically. Video information structure must encompass both low-intermediate level video organization and event relationships that define specific highlights and situations. Analysis of the visual data of the video stream permits to extract hints, identify events and detect highlights. All of this must be supported by a-priori knowledge of the video domain and effective reasoning engines capable to capture the inherent semantics of the visual events.

  19. Automatically Annotated Mapping for Indoor Mobile Robot Applications

    DEFF Research Database (Denmark)

    Özkil, Ali Gürcan; Howard, Thomas J.

    2012-01-01

    localization and mapping in topology space, and fuses camera and robot pose estimations to build an automatically annotated global topo-metric map. It is developed as a framework for a hospital service robot and tested in a real hospital. Experiments show that the method is capable of producing globally...... consistent, automatically annotated hybrid metric-topological maps that is needed by mobile service robots.......This paper presents a new and practical method for mapping and annotating indoor environments for mobile robot use. The method makes use of 2D occupancy grid maps for metric representation, and topology maps to indicate the connectivity of the ‘places-of-interests’ in the environment. Novel use...

  20. META2: Intercellular DNA Methylation Pairwise Annotation and Integrative Analysis

    Directory of Open Access Journals (Sweden)

    Binhua Tang

    2016-01-01

    Full Text Available Genome-wide deciphering intercellular differential DNA methylation as well as its roles in transcriptional regulation remains elusive in cancer epigenetics. Here we developed a toolkit META2 for DNA methylation annotation and analysis, which aims to perform integrative analysis on differentially methylated loci and regions through deep mining and statistical comparison methods. META2 contains multiple versatile functions for investigating and annotating DNA methylation profiles. Benchmarked with T-47D cell, we interrogated the association within differentially methylated CpG (DMC and region (DMR candidate count and region length and identified major transition zones as clues for inferring statistically significant DMRs; together we validated those DMRs with the functional annotation. Thus META2 can provide a comprehensive analysis approach for epigenetic research and clinical study.

  1. What a catch! traits that define good annotators.

    Science.gov (United States)

    Forbush, Tyler B; Shen, Shuying; South, Brett R; Duvalla, Scott L

    2013-01-01

    Human annotation and chart review is an important process in biomedical informatics research, but which humans are best suited for the job? Domain expertise, such as medical or linguistic knowledge is desirable, but other factors may be equally important. The University of Utah has a group of 20+ reviewers with backgrounds in medicine and linguistics, and 10 key traits have surfaced in those best able to annotate quickly and with high quality. To identify reviewers with these key traits, we created a hiring process that includes interviewing candidates, testing their medical and linguistic knowledge, and having them complete an annotation exercise on realistic medical text. Each step is designed to assess the key traits and allow the investigator to choose the skill set required for each project.

  2. Use of Annotations for Component and Framework Interoperability

    Science.gov (United States)

    David, O.; Lloyd, W.; Carlson, J.; Leavesley, G. H.; Geter, F.

    2009-12-01

    The popular programming languages Java and C# provide annotations, a form of meta-data construct. Software frameworks for web integration, web services, database access, and unit testing now take advantage of annotations to reduce the complexity of APIs and the quantity of integration code between the application and framework infrastructure. Adopting annotation features in frameworks has been observed to lead to cleaner and leaner application code. The USDA Object Modeling System (OMS) version 3.0 fully embraces the annotation approach and additionally defines a meta-data standard for components and models. In version 3.0 framework/model integration previously accomplished using API calls is now achieved using descriptive annotations. This enables the framework to provide additional functionality non-invasively such as implicit multithreading, and auto-documenting capabilities while achieving a significant reduction in the size of the model source code. Using a non-invasive methodology leads to models and modeling components with only minimal dependencies on the modeling framework. Since models and modeling components are not directly bound to framework by the use of specific APIs and/or data types they can more easily be reused both within the framework as well as outside of it. To study the effectiveness of an annotation based framework approach with other modeling frameworks, a framework-invasiveness study was conducted to evaluate the effects of framework design on model code quality. A monthly water balance model was implemented across several modeling frameworks and several software metrics were collected. The metrics selected were measures of non-invasive design methods for modeling frameworks from a software engineering perspective. It appears that the use of annotations positively impacts several software quality measures. In a next step, the PRMS model was implemented in OMS 3.0 and is currently being implemented for water supply forecasting in the

  3. Arc-preserving subsequences of arc-annotated sequences

    CERN Document Server

    Popov, Vladimir Yu

    2011-01-01

    Arc-annotated sequences are useful in representing the structural information of RNA and protein sequences. The longest arc-preserving common subsequence problem has been introduced as a framework for studying the similarity of arc-annotated sequences. In this paper, we consider arc-annotated sequences with various arc structures. We consider the longest arc preserving common subsequence problem. In particular, we show that the decision version of the 1-{\\sc fragment LAPCS(crossing,chain)} and the decision version of the 0-{\\sc diagonal LAPCS(crossing,chain)} are {\\bf NP}-complete for some fixed alphabet $\\Sigma$ such that $|\\Sigma| = 2$. Also we show that if $|\\Sigma| = 1$, then the decision version of the 1-{\\sc fragment LAPCS(unlimited, plain)} and the decision version of the 0-{\\sc diagonal LAPCS(unlimited, plain)} are {\\bf NP}-complete.

  4. Scripps Genome ADVISER: Annotation and Distributed Variant Interpretation SERver.

    Directory of Open Access Journals (Sweden)

    Phillip H Pham

    Full Text Available Interpretation of human genomes is a major challenge. We present the Scripps Genome ADVISER (SG-ADVISER suite, which aims to fill the gap between data generation and genome interpretation by performing holistic, in-depth, annotations and functional predictions on all variant types and effects. The SG-ADVISER suite includes a de-identification tool, a variant annotation web-server, and a user interface for inheritance and annotation-based filtration. SG-ADVISER allows users with no bioinformatics expertise to manipulate large volumes of variant data with ease--without the need to download large reference databases, install software, or use a command line interface. SG-ADVISER is freely available at genomics.scripps.edu/ADVISER.

  5. Hypertext Annotation: Effects of Presentation Formats and Learner Proficiency on Reading Comprehension and Vocabulary Learning in Foreign Languages

    Science.gov (United States)

    Chen, I-Jung; Yen, Jung-Chuan

    2013-01-01

    This study extends current knowledge by exploring the effect of different annotation formats, namely in-text annotation, glossary annotation, and pop-up annotation, on hypertext reading comprehension in a foreign language and vocabulary acquisition across student proficiencies. User attitudes toward the annotation presentation were also…

  6. Comparative genomic analysis of the family Iridoviridae: re-annotating and defining the core set of iridovirus genes

    Directory of Open Access Journals (Sweden)

    Upton Chris

    2007-01-01

    Full Text Available Abstract Background Members of the family Iridoviridae can cause severe diseases resulting in significant economic and environmental losses. Very little is known about how iridoviruses cause disease in their host. In the present study, we describe the re-analysis of the Iridoviridae family of complex DNA viruses using a variety of comparative genomic tools to yield a greater consensus among the annotated sequences of its members. Results A series of genomic sequence comparisons were made among, and between the Ranavirus and Megalocytivirus genera in order to identify novel conserved ORFs. Of these two genera, the Megalocytivirus genomes required the greatest number of altered annotations. Prior to our re-analysis, the Megalocytivirus species orange-spotted grouper iridovirus and rock bream iridovirus shared 99% sequence identity, but only 82 out of 118 potential ORFs were annotated; in contrast, we predict that these species share an identical complement of genes. These annotation changes allowed the redefinition of the group of core genes shared by all iridoviruses. Seven new core genes were identified, bringing the total number to 26. Conclusion Our re-analysis of genomes within the Iridoviridae family provides a unifying framework to understand the biology of these viruses. Further re-defining the core set of iridovirus genes will continue to lead us to a better understanding of the phylogenetic relationships between individual iridoviruses as well as giving us a much deeper understanding of iridovirus replication. In addition, this analysis will provide a better framework for characterizing and annotating currently unclassified iridoviruses.

  7. Determining similarity of scientific entities in annotation datasets.

    Science.gov (United States)

    Palma, Guillermo; Vidal, Maria-Esther; Haag, Eric; Raschid, Louiqa; Thor, Andreas

    2015-01-01

    Linked Open Data initiatives have made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms from ontologies. Annotations encode scientific knowledge, which is captured in annotation datasets. Determining relatedness between annotated entities becomes a building block for pattern mining, e.g. identifying drug-drug relationships may depend on the similarity of the targets that interact with each drug. A diversity of similarity measures has been proposed in the literature to compute relatedness between a pair of entities. Each measure exploits some knowledge including the name, function, relationships with other entities, taxonomic neighborhood and semantic knowledge. We propose a novel general-purpose annotation similarity measure called 'AnnSim' that measures the relatedness between two entities based on the similarity of their annotations. We model AnnSim as a 1-1 maximum weight bipartite match and exploit properties of existing solvers to provide an efficient solution. We empirically study the performance of AnnSim on real-world datasets of drugs and disease associations from clinical trials and relationships between drugs and (genomic) targets. Using baselines that include a variety of measures, we identify where AnnSim can provide a deeper understanding of the semantics underlying the relatedness of a pair of entities or where it could lead to predicting new links or identifying potential novel patterns. Although AnnSim does not exploit knowledge or properties of a particular domain, its performance compares well with a variety of state-of-the-art domain-specific measures. Database URL: http://www.yeastgenome.org/

  8. BRILIA: Integrated Tool for High-Throughput Annotation and Lineage Tree Assembly of B-Cell Repertoires

    Science.gov (United States)

    Lee, Donald W.; Khavrutskii, Ilja V.; Wallqvist, Anders; Bavari, Sina; Cooper, Christopher L.; Chaudhury, Sidhartha

    2017-01-01

    The somatic diversity of antigen-recognizing B-cell receptors (BCRs) arises from Variable (V), Diversity (D), and Joining (J) (VDJ) recombination and somatic hypermutation (SHM) during B-cell development and affinity maturation. The VDJ junction of the BCR heavy chain forms the highly variable complementarity determining region 3 (CDR3), which plays a critical role in antigen specificity and binding affinity. Tracking the selection and mutation of the CDR3 can be useful in characterizing humoral responses to infection and vaccination. Although tens to hundreds of thousands of unique BCR genes within an expressed B-cell repertoire can now be resolved with high-throughput sequencing, tracking SHMs is still challenging because existing annotation methods are often limited by poor annotation coverage, inconsistent SHM identification across the VDJ junction, or lack of B-cell lineage data. Here, we present B-cell repertoire inductive lineage and immunosequence annotator (BRILIA), an algorithm that leverages repertoire-wide sequencing data to globally improve the VDJ annotation coverage, lineage tree assembly, and SHM identification. On benchmark tests against simulated human and mouse BCR repertoires, BRILIA correctly annotated germline and clonally expanded sequences with 94 and 70% accuracy, respectively, and it has a 90% SHM-positive prediction rate in the CDR3 of heavily mutated sequences; these are substantial improvements over existing methods. We used BRILIA to process BCR sequences obtained from splenic germinal center B cells extracted from C57BL/6 mice. BRILIA returned robust B-cell lineage trees and yielded SHM patterns that are consistent across the VDJ junction and agree with known biological mechanisms of SHM. By contrast, existing BCR annotation tools, which do not account for repertoire-wide clonal relationships, systematically underestimated both the size of clonally related B-cell clusters and yielded inconsistent SHM frequencies. We demonstrate

  9. Web Image Retrieval Search Engine based on Semantically Shared Annotation

    Directory of Open Access Journals (Sweden)

    Alaa Riad

    2012-03-01

    Full Text Available This paper presents a new majority voting technique that combines the two basic modalities of Web images textual and visual features of image in a re-annotation and search based framework. The proposed framework considers each web page as a voter to vote the relatedness of keyword to the web image, the proposed approach is not only pure combination between image low level feature and textual feature but it take into consideration the semantic meaning of each keyword that expected to enhance the retrieval accuracy. The proposed approach is not used only to enhance the retrieval accuracy of web images; but also able to annotated the unlabeled images.

  10. Semantic Annotations and Querying of Web Data Sources

    Science.gov (United States)

    Hornung, Thomas; May, Wolfgang

    A large part of the Web, actually holding a significant portion of the useful information throughout the Web, consists of views on hidden databases, provided by numerous heterogeneous interfaces that are partly human-oriented via Web forms ("Deep Web"), and partly based on Web Services (only machine accessible). In this paper we present an approach for annotating these sources in a way that makes them citizens of the Semantic Web. We illustrate how queries can be stated in terms of the ontology, and how the annotations are used to selected and access appropriate sources and to answer the queries.

  11. EuCAP, a Eukaryotic Community Annotation Package, and its application to the rice genome

    Directory of Open Access Journals (Sweden)

    Hamilton John P

    2007-10-01

    Full Text Available Abstract Background Despite the improvements of tools for automated annotation of genome sequences, manual curation at the structural and functional level can provide an increased level of refinement to genome annotation. The Institute for Genomic Research Rice Genome Annotation (hereafter named the Osa1 Genome Annotation is the product of an automated pipeline and, for this reason, will benefit from the input of biologists with expertise in rice and/or particular gene families. Leveraging knowledge from a dispersed community of scientists is a demonstrated way of improving a genome annotation. This requires tools that facilitate 1 the submission of gene annotation to an annotation project, 2 the review of the submitted models by project annotators, and 3 the incorporation of the submitted models in the ongoing annotation effort. Results We have developed the Eukaryotic Community Annotation Package (EuCAP, an annotation tool, and have applied it to the rice genome. The primary level of curation by community annotators (CA has been the annotation of gene families. Annotation can be submitted by email or through the EuCAP Web Tool. The CA models are aligned to the rice pseudomolecules and the coordinates of these alignments, along with functional annotation, are stored in the MySQL EuCAP Gene Model database. Web pages displaying the alignments of the CA models to the Osa1 Genome models are automatically generated from the EuCAP Gene Model database. The alignments are reviewed by the project annotators (PAs in the context of experimental evidence. Upon approval by the PAs, the CA models, along with the corresponding functional annotations, are integrated into the Osa1 Genome Annotation. The CA annotations, grouped by family, are displayed on the Community Annotation pages of the project website http://rice.tigr.org, as well as in the Community Annotation track of the Genome Browser. Conclusion We have applied EuCAP to rice. As of July 2007, the

  12. Statistical approaches to use a model organism for regulatory sequences annotation of newly sequenced species.

    Directory of Open Access Journals (Sweden)

    Pietro Liò

    Full Text Available A major goal of bioinformatics is the characterization of transcription factors and the transcriptional programs they regulate. Given the speed of genome sequencing, we would like to quickly annotate regulatory sequences in newly-sequenced genomes. In such cases, it would be helpful to predict sequence motifs by using experimental data from closely related model organism. Here we present a general algorithm that allow to identify transcription factor binding sites in one newly sequenced species by performing Bayesian regression on the annotated species. First we set the rationale of our method by applying it within the same species, then we extend it to use data available in closely related species. Finally, we generalise the method to handle the case when a certain number of experiments, from several species close to the species on which to make inference, are available. In order to show the performance of the method, we analyse three functionally related networks in the Ascomycota. Two gene network case studies are related to the G2/M phase of the Ascomycota cell cycle; the third is related to morphogenesis. We also compared the method with MatrixReduce and discuss other types of validation and tests. The first network is well known and provides a biological validation test of the method. The two cell cycle case studies, where the gene network size is conserved, demonstrate an effective utility in annotating new species sequences using all the available replicas from model species. The third case, where the gene network size varies among species, shows that the combination of information is less powerful but is still informative. Our methodology is quite general and could be extended to integrate other high-throughput data from model organisms.

  13. Transcriptator: An Automated Computational Pipeline to Annotate Assembled Reads and Identify Non Coding RNA

    Science.gov (United States)

    Zuccaro, Antonio; Guarracino, Mario Rosario

    2015-01-01

    RNA-seq is a new tool to measure RNA transcript counts, using high-throughput sequencing at an extraordinary accuracy. It provides quantitative means to explore the transcriptome of an organism of interest. However, interpreting this extremely large data into biological knowledge is a problem, and biologist-friendly tools are lacking. In our lab, we developed Transcriptator, a web application based on a computational Python pipeline with a user-friendly Java interface. This pipeline uses the web services available for BLAST (Basis Local Search Alignment Tool), QuickGO and DAVID (Database for Annotation, Visualization and Integrated Discovery) tools. It offers a report on statistical analysis of functional and Gene Ontology (GO) annotation’s enrichment. It helps users to identify enriched biological themes, particularly GO terms, pathways, domains, gene/proteins features and protein—protein interactions related informations. It clusters the transcripts based on functional annotations and generates a tabular report for functional and gene ontology annotations for each submitted transcript to the web server. The implementation of QuickGo web-services in our pipeline enable the users to carry out GO-Slim analysis, whereas the integration of PORTRAIT (Prediction of transcriptomic non coding RNA (ncRNA) by ab initio methods) helps to identify the non coding RNAs and their regulatory role in transcriptome. In summary, Transcriptator is a useful software for both NGS and array data. It helps the users to characterize the de-novo assembled reads, obtained from NGS experiments for non-referenced organisms, while it also performs the functional enrichment analysis of differentially expressed transcripts/genes for both RNA-seq and micro-array experiments. It generates easy to read tables and interactive charts for better understanding of the data. The pipeline is modular in nature, and provides an opportunity to add new plugins in the future. Web application is freely

  14. ChIP-Seq-Annotated Heliconius erato Genome Highlights Patterns of cis-Regulatory Evolution in Lepidoptera.

    Science.gov (United States)

    Lewis, James J; van der Burg, Karin R L; Mazo-Vargas, Anyi; Reed, Robert D

    2016-09-13

    Uncovering phylogenetic patterns of cis-regulatory evolution remains a fundamental goal for evolutionary and developmental biology. Here, we characterize the evolution of regulatory loci in butterflies and moths using chromatin immunoprecipitation sequencing (ChIP-seq) annotation of regulatory elements across three stages of head development. In the process we provide a high-quality, functionally annotated genome assembly for the butterfly, Heliconius erato. Comparing cis-regulatory element conservation across six lepidopteran genomes, we find that regulatory sequences evolve at a pace similar to that of protein-coding regions. We also observe that elements active at multiple developmental stages are markedly more conserved than elements with stage-specific activity. Surprisingly, we also find that stage-specific proximal and distal regulatory elements evolve at nearly identical rates. Our study provides a benchmark for genome-wide patterns of regulatory element evolution in insects, and it shows that developmental timing of activity strongly predicts patterns of regulatory sequence evolution.

  15. Religion and Communication: A Selected, Annotated Basic Bibliography.

    Science.gov (United States)

    Tate, Eugene D.; McConnell, Kathleen

    This annotated bibliography provides a broad perspective on the way religion and communication relate to one another. Forty-two references are listed in the areas of (1) "Religion and Communication Theory"; (2) "Religion and Language"; (3) "Televangelism and Televangelists"; (4) "Historical Roots of Religion and…

  16. A Selected, Annotated Bibliography on Employment of Minority Engineers.

    Science.gov (United States)

    National Academy of Sciences - National Research Council, Washington, DC. Assembly of Engineering.

    The annotated bibliography is intended to inform those concerned with personnel guidance, recruiting, and hiring in industry, research, education, and government about the available publications relating to the employment of Black, Mexican American, Puerto Rican, and American Indian engineers. To facilitate its usefulness, the bibliography is…

  17. Shakespeare Is Alive and Well in Cyberspace: An Annotated Bibliography.

    Science.gov (United States)

    Hett, Dorothy Marie

    2002-01-01

    Suggests that in addition to using books and movies to enhance students' understanding of Shakespeare, teachers can add the World Wide Web to their repertoire to help students connect to Shakespeare. Presents annotations of 12 websites to use for teaching Shakespeare. (SG)

  18. Ethical Issues in Health Services: A Report and Annotated Bibliography.

    Science.gov (United States)

    Carmody, James

    This publication identifies, discusses, and lists areas for further research for five ethical issues related to health services: 1) the right to health care; 2) death and euthanasia; 3) human experimentation; 4) genetic engineering; and, 5) abortion. Following a discussion of each issue is a selected annotated bibliography covering the years 1967…

  19. Program Evaluation in the Arts. An Annotated Bibliography.

    Science.gov (United States)

    Wildemuth, Barbara M., Comp.; Eichinger, Debra S., Comp.

    This 56-item annotated bibliography gives teachers and administrators access to information on the evaluation of school fine arts programs. Based upon a computer search of the Educational Resources Information Center (ERIC) data base, it cites project reports, journal articles, and dissertations published from 1963 to 1976. Citations of elementary…

  20. From sensors to human spatial concepts: An annotated data set

    NARCIS (Netherlands)

    Zivkovic, Z.; Booij, O.; Kröse, B.; Topp, E.A.; Christensen, H.I.

    2008-01-01

    An annotated data set is presented meant to help researchers in developing, evaluating, and comparing various approaches in robotics for building space representations appropriate for communicating with humans. The data consist of omnidirectional images, laser range scans, sonar readings, and robot

  1. An Annotated Bibliography on the Severely and Profoundly Mentally Retarded.

    Science.gov (United States)

    Cass, Michael, Comp.; Schilit, Jeffrey, Comp.

    Presented is an annotated bibliography with approximately 250 entries relating to the severely and profoundly retarded. Citations are listed alphabetically by author under the following categories: assessments, measurements, evaluations; associations; attending behavior; behavior modification; books; classical conditioning; cognitive development;…

  2. Genome Annotation in a Community College Cell Biology Lab

    Science.gov (United States)

    Beagley, C. Timothy

    2013-01-01

    The Biology Department at Salt Lake Community College has used the IMG-ACT toolbox to introduce a genome mapping and annotation exercise into the laboratory portion of its Cell Biology course. This project provides students with an authentic inquiry-based learning experience while introducing them to computational biology and contemporary learning…

  3. An annotated bibliography of parasitic Isopoda (Crustacea of Chondrichthyes

    Directory of Open Access Journals (Sweden)

    Plínio Soares Moreira

    1978-01-01

    Full Text Available This annotated bibliography is an attempt to bring together all available published records on the parasitic isopods of Chondrichthian fishes as a basic reference source. An effort was made to synonymise old names according to the presently accepted scientific names.

  4. Supervised learning of semantic classes for image annotation and retrieval.

    Science.gov (United States)

    Carneiro, Gustavo; Chan, Antoni B; Moreno, Pedro J; Vasconcelos, Nuno

    2007-03-01

    A probabilistic formulation for semantic image annotation and retrieval is proposed. Annotation and retrieval are posed as classification problems where each class is defined as the group of database images labeled with a common semantic label. It is shown that, by establishing this one-to-one correspondence between semantic labels and semantic classes, a minimum probability of error annotation and retrieval are feasible with algorithms that are 1) conceptually simple, 2) computationally efficient, and 3) do not require prior semantic segmentation of training images. In particular, images are represented as bags of localized feature vectors, a mixture density estimated for each image, and the mixtures associated with all images annotated with a common semantic label pooled into a density estimate for the corresponding semantic class. This pooling is justified by a multiple instance learning argument and performed efficiently with a hierarchical extension of expectation-maximization. The benefits of the supervised formulation over the more complex, and currently popular, joint modeling of semantic label and visual feature distributions are illustrated through theoretical arguments and extensive experiments. The supervised formulation is shown to achieve higher accuracy than various previously published methods at a fraction of their computational cost. Finally, the proposed method is shown to be fairly robust to parameter tuning.

  5. Intra-species sequence comparisons for annotating genomes

    Energy Technology Data Exchange (ETDEWEB)

    Boffelli, Dario; Weer, Claire V.; Weng, Li; Lewis, Keith D.; Shoukry, Malak I.; Pachter, Lior; Keys, David N.; Rubin, Edward M.

    2004-07-15

    Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its high rate of allelic polymorphism and ease of genetic manipulability, we chose the sea squirt, Ciona intestinalis, to explore intra-species sequence comparisons for genome annotation. A large number of C. intestinalis specimens were collected from four continents and a set of genomic intervals amplified, resequenced and analyzed to determine the mutation rates at each nucleotide in the sequence. We found that regions with low mutation rates efficiently demarcated functionally constrained sequences: these include a set of noncoding elements, which we showed in C intestinalis transgenic assays to act as tissue-specific enhancers, as well as the location of coding sequences. This illustrates that comparisons of multiple members of a species can be used for genome annotation, suggesting a path for the annotation of the sequenced genomes of organisms occupying uncharacterized phylogenetic branches of the animal kingdom and raises the possibility that the resequencing of a large number of Homo sapiens individuals might be used to annotate the human genome and identify sequences defining traits unique to our species. The sequence data from this study has been submitted to GenBank under accession nos. AY667278-AY667407.

  6. Personalization in crowd-driven annotation for cultural heritage collections

    NARCIS (Netherlands)

    Dijkshoorn, C.; Oosterman, J.; Aroyo, L.; Houben, G.J.

    2012-01-01

    Many cultural heritage institutions are confronted with a big challenge when it comes to adapting the process of registration, annotation and digitization of their collections to meet the new technological demands for providing their collections online with Web and mobile technologies. With limited

  7. Annotated Bibliography on the Teaching of Psychology: 1999.

    Science.gov (United States)

    Johnson, David E.; Schroder, Simone I.

    2000-01-01

    Presents an annotated bibliography covering awards, computers and technology, critical thinking, developmental psychology and aging, ethics, graduate education and training issues, high school psychology, history, introductory psychology, learning and cognition, perception/physiological/comparative psychology, research methods and research-related…

  8. Postsecondary Peer Cooperative Learning Programs: Annotated Bibliography 2016

    Science.gov (United States)

    Arendale, David R., Comp.

    2016-01-01

    Purpose: This 2016 annotated bibliography reviews seven postsecondary peer cooperative learning programs that have been implemented nationally to increase student achievement. Methodology: An extensive literature search was conducted of published journal articles, newspaper accounts, book chapters, books, ERIC documents, thesis and dissertations,…

  9. Effects of Teaching Strategies in Annotated Bibliography Writing

    Science.gov (United States)

    Tan-de Ramos, Jennifer

    2015-01-01

    The study examines the effect of teaching strategies to improved writing of students in the tertiary level. Specifically, three teaching approaches--the use of modelling, grammar-based, and information element-focused--were tested on their effect on the writing of annotated bibliography in three research classes at a university in Manila.…

  10. Cartel: Annotated Bibliography of Bilingual Bicultural Materials, No. 7.

    Science.gov (United States)

    Dissemination and Assessment Center for Bilingual Education, Austin, TX.

    This booklet contains an annotated listing of instructional materials for use in bilingual-bicultural programs. Each entry includes the following information: title, author or developing agency, name and address of the publisher, publication date, number of pages, language(s) used, intended audience or level, and a descriptive statement. Any…

  11. Cartel: Annotated Bibliography of Bilingual Bicultural Materials, No. 11.

    Science.gov (United States)

    Dissemination and Assessment Center for Bilingual Education, Austin, TX.

    This booklet contains an annotated listing of instructional materials for use in bilingual-bicultural programs. Each entry includes the following information: title, author or developing agency, name and address of the publisher, publication date, number of pages, language(s) used, intended audience or level, and a descriptive statement. Any…

  12. Cartel: Annotated Bibliography of Bilingual Bicultural Materials, No. 3.

    Science.gov (United States)

    Dissemination and Assessment Center for Bilingual Education, Austin, TX.

    This booklet contains an annotated listing of instructional materials for use in bilingual-bicultural programs. Each entry includes the following information: title, author or developing agency, name and address of the publisher, publication date, number of pages, language(s) used, intended audience or level, and a descriptive statement. Any…

  13. Cartel: Annotated Bibliography of Bilingual Bicultural Materials, No. 10.

    Science.gov (United States)

    Dissemination and Assessment Center for Bilingual Education, Austin, TX.

    This booklet contains an annotated listing of instructional materials for use in bilingual-bicultural programs. Each entry includes the following information: title, author or developing agency, name and address of the publisher, publication date, number of pages, language(s) used, intended audience or level, and a descriptive statement. Any…

  14. Cartel: Annotated Bibliography of Bilingual Bicultural Materials, No. 9.

    Science.gov (United States)

    Dissemination and Assessment Center for Bilingual Education, Austin, TX.

    This booklet contains an annotated listing of instructional materials for use in bilingual-bicultural programs. Each entry includes the following information: title, author or developing agency, name and address of the publisher, publication date, number of pages, language(s) used, intended audience or level, and a descriptive statement. Any…

  15. Cartel: Annotated Bibliography of Bilingual Bicultural Materials, No. 6.

    Science.gov (United States)

    Dissemination and Assessment Center for Bilingual Education, Austin, TX.

    This booklet contains an annotated listing of instructional materials for use in bilingual-bicultural programs. Each entry includes the following information: title, author or developing agency, name and address of the publisher, publication date, number of pages, language(s) used, intended audience or level, and a descriptive statement. Any…

  16. [Cartel]: Monthly Annotated Bibliography of Bilingual Bicultural Materials, No. 1.

    Science.gov (United States)

    Dissemination and Assessment Center for Bilingual Education, Austin, TX.

    This booklet contains an annotated listing of instructional materials for use in bilingual-bicultural programs. Each entry includes the following information: title, author or developing agency, name and address of the publisher, publication date, number of pages, language(s) used, intended audience or level, and a descriptive statement. Any…

  17. Cartel: Annotated Bibliography of Bilingual Bicultural Materials, No. 5.

    Science.gov (United States)

    Dissemination and Assessment Center for Bilingual Education, Austin, TX.

    This booklet contains an annotated listing of instructional materials for use in bilingual-bicultural programs. Each entry includes the following information: title, author or developing agency, name and address of the publisher, publication date, number of pages, language(s) used, intended audience or level, and a descriptive statement. Any…

  18. Cartel: Annotated Bibliography of Bilingual Bicultural Materials, No. 8.

    Science.gov (United States)

    Dissemination and Assessment Center for Bilingual Education, Austin, TX.

    This booklet contains an annotated listing of instructional materials for use in bilingual-bicultural programs. Each entry includes the following information: title, author or developing agency, name and address of the publisher, publication date, number of pages, language(s) used, intended audience or level, and a descriptive statement. Any…

  19. Cartel: Annotated Bibliography of Bilingual Bicultural Materials, No. 2.

    Science.gov (United States)

    Dissemination and Assessment Center for Bilingual Education, Austin, TX.

    This booklet contains an annotated listing of instructional materials for use in bilingual-bicultural programs. Each entry includes the following information: title, author or developing agency, name and address of the publisher, publication date, number of pages, language(s) used, intended audience or level, and a descriptive statement. Any…

  20. Believe It or Not: Adding Belief Annotations to Databases

    CERN Document Server

    Gatterbauer, Wolfgang; Khoussainova, Nodira; Suciu, Dan

    2009-01-01

    We propose a database model that allows users to annotate data with belief statements. Our motivation comes from scientific database applications where a community of users is working together to assemble, revise, and curate a shared data repository. As the community accumulates knowledge and the database content evolves over time, it may contain conflicting information and members can disagree on the information it should store. For example, Alice may believe that a tuple should be in the database, whereas Bob disagrees. He may also insert the reason why he thinks Alice believes the tuple should be in the database, and explain what he thinks the correct tuple should be instead. We propose a formal model for Belief Databases that interprets users' annotations as belief statements. These annotations can refer both to the base data and to other annotations. We give a formal semantics based on a fragment of multi-agent epistemic logic and define a query language over belief databases. We then prove a key technic...

  1. Annotating Evidence Based Clinical Guidelines: A Lightweight Ontology

    NARCIS (Netherlands)

    R. Hoekstra; A. de Waard; R. Vdovjak

    2012-01-01

    This paper describes a lightweight ontology for representing annotations of declarative evidence based clinical guidelines. We present the motivation and requirements for this representation, based on an analysis of several guidelines. The ontology provides the means to connect clinical questions an

  2. Annotated Bibliography of Research in the Teaching of English

    Science.gov (United States)

    Beach, Richard; Bigelow, Martha; Dillon, Deborah; Galda, Lee; Helman, Lori; Kalnin, Julie Shalhope; Lewis, Cynthia; O'Brien, David; Jorgensen, Karen; Liang, Lauren; Rijlaarsdam, Gert; Janssen, Tanja

    2006-01-01

    This annotated bibliography of Research in the Teaching of English addresses the following topics: (1) Discourse/Narrative Analysis/Cultural Difference; (2) Literacy; (3) Literary Response/Literature; (4) Reading; (5) Professional Development/Teacher Education; (6) Second Language Literacy; (7) Technology/Media; and (8) Writing.

  3. The WANDAML Markup Language for Digital Document Annotation

    NARCIS (Netherlands)

    Franke, K.; Guyon, I.; Schomaker, L.; Vuurpijl, L.

    2004-01-01

    WANDAML is an XML-based markup language for the annotation and filter journaling of digital documents. It addresses in particular the needs of forensic handwriting data examination, by allowing experts to enter information about writer, material (pen, paper), script and content, and to record chains

  4. Genome annotations - KOME | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us ....zip File URL: ftp://ftp.biosciencedbc.jp/archive/kome/LATEST/kome_genome_annotat...e Update History of This Database Site Policy | Contact Us Genome annotations - KOME | LSDB Archive ...

  5. The Performance Career of Charles Dickens: An Annotated Bibliography.

    Science.gov (United States)

    Gentile, John Samuel

    Offered in response to the broad appeal of Charles Dickens's performance career to various disciplines, this annotated bibliography lists 40 resources concerned with Dickens's success as a performer interpreting his literary works. The resources are categorized under books, theses and dissertations, articles in scholarly journals, nineteenth…

  6. An Annotated Guide to Periodical Literature: Higher Education.

    Science.gov (United States)

    Diener, Thomas J., Ed.; Trower, David L., Ed.

    Ninety-six periodicals selected for their pertinence to an understanding of higher education are listed in alphabetical order in this annotated guide. These publications either focus on higher education in the United States and other countries, or frequently publish information about colleges and universities. The bibliography is designed as a…

  7. Annotation des structures discursives : l'expérience ANNODIS

    Directory of Open Access Journals (Sweden)

    Ho-Dac Lydia-Mai

    2014-07-01

    Full Text Available La ressource ANNODIS est un corpus diversifié de français écrit enrichi d'annotations concernant le niveau discursif. Son originalité réside dans sa mutualisation de deux approches complémentaires qui permettent, par leur oppositions et rapprochements, de poser un certain nombre de questions concernant l'annotation de structures discursives. cet article propose de revenir sur les enjeux principaux qui ont motivés les membres du projet ANNODIS : 1 stabiliser un certain nombre de définition linguistique de phénomènes discursives ciblées et 2 confronter aux données réelles une certaine modélisation de la construction de la cohérence discursive. Ce double objectif est révélateur des deux approches mises à l'épreuve dans l'expérience ANNODIS. Cet article revient sur les enjeux de cette ressource en terme à la fois de structures discursives et de campagne d'annotation. Un regard particulier sera porté sur la question du devenir des annotations, notamment dans un domaine encore peu stabilisé.

  8. Health Education: Drugs and Alcohol; An Annotated Bibliography.

    Science.gov (United States)

    Aquino, John; Poliakoff, Lorraine

    This document begins with an article on what drug abuse is and how educators can deal with it. The annotated listing which follows is divided into sections on drug abuse, drug education, alcohol abuse, alcohol education, and venereal disease. Journal articles constitute the majority of the generally post-1971 entries; research studies, books,…

  9. An annotated corpus for the analysis of VP ellipsis

    NARCIS (Netherlands)

    Bos, Johan; Spenader, J.

    2011-01-01

    Verb Phrase Ellipsis (VPE) has been studied in great depth in theoretical linguistics, but empirical studies of VPE are rare. We extend the few previous corpus studies with an annotated corpus of VPE in all 25 sections of the Wall Street Journal corpus (WSJ) distributed with the Penn Treebank. We an

  10. Reliability and effectiveness of clickthrough data for automatic image annotation

    NARCIS (Netherlands)

    Tsikrika, T.; Diou, C.; De Vries, A.P.; Delopoulos, A.

    2010-01-01

    Automatic image annotation using supervised learning is performed by concept classifiers trained on labelled example images. This work proposes the use of clickthrough data collected from search logs as a source for the automatic generation of concept training data, thus avoiding the expensive manua

  11. The Open School. An Annotated Bibliography. Current Bibliography No. 4.

    Science.gov (United States)

    Cockburn, Ilze, Comp.

    In 1970, the OISE library published an annotated bibliography entitled, Open Plan (See ED 051 549), that covered the design of open plan schools and the educational practices connected with these facilities. Since then, a clearer distinction has developed between the terms "open plan" and "open education." This revision of the earlier volume…

  12. An Annotated Bibliography on Refugee Mental Health. Volume I.

    Science.gov (United States)

    Williams, Carolyn L.

    This annotated bibliography, spanning a number of relevant disciplines, contains primarily materials in published scientific literature on refugee mental health. References have been grouped into four major sections. Section 1, Understanding Refugees in Context, provides important background material in five categories: cultural and related…

  13. An Annotated Bibliography on Refugee Mental Health. Volume II.

    Science.gov (United States)

    Peterson, Susan C.; And Others

    The second volume of this annotated bibliography contains primarily materials in published scientific literature on refugee mental health. References have been grouped into five major sections. Section 1, Understanding Refugees in Context, provides important background material in five categories: cultural and related information about different…

  14. Children's Books for Times of Stress. An Annotated Bibliography.

    Science.gov (United States)

    Gillis, Ruth J.

    This annotated bibliography evaluates and describes picture books designed to help children, ages three to nine, and their parents cope with a variety of emotionally stressful situations ranging from death, divorce, and hospitalization, to feelings of shyness, jealousy, and fear. Some of the books reflect positive feelings like friendship and…

  15. Annotated Bibliography of Research in the Teaching of English

    Science.gov (United States)

    Beach, Richard; Brendler, Beth; Dillon, Deborah; Dockter, Jessie; Ernst, Stacy; Frederick, Amy; Galda, Lee; Helman, Lori; Kapoor, Richa; Ngo, Bic; O'Brien, David; Scharber, Cassie; Jorgensen, Karen; Liang, Lauren; Braaksma, Martine; Janssen, Tanja

    2010-01-01

    This article presents an annotated bibliography of "Research in the Teaching of English" (RTE). The 2010 version of the bibliography involves a major change--the bibliography is available solely as a downloadable pdf file at http://www.ncte.org/journals/rte/issues/v45-2. As the length of the bibliography has grown from 15 pages in 2003 to 88 pages…

  16. The DNA sequence, annotation and analysis of human chromosome 3

    DEFF Research Database (Denmark)

    Muzny, Donna M; Scherer, Steven E; Kaul, Rajinder

    2006-01-01

    After the completion of a draft human genome sequence, the International Human Genome Sequencing Consortium has proceeded to finish and annotate each of the 24 chromosomes comprising the human genome. Here we describe the sequencing and analysis of human chromosome 3, one of the largest human chr...

  17. An Annotation Scheme for Social Interaction in Digital Playgrounds

    NARCIS (Netherlands)

    Moreno, Alejandro M.; Delden, van Robby; Reidsma, Dennis; Poppe, Ronald; Heylen, Dirk; Herrlich, Marc; Malaka, Rainer; Masuch, Maic

    2012-01-01

    This paper introduces a new annotation scheme, designed specifically to study children's social interactions during play in digital playgrounds. The scheme is motivated by analyzing relevant literature, combined with observations from recordings of play sessions. The scheme allows us to analyze how

  18. Annotated checklist of fungi in Cyprus Island. 1. Larger Basidiomycota

    Directory of Open Access Journals (Sweden)

    Miguel Torrejón

    2014-06-01

    Full Text Available An annotated checklist of wild fungi living in Cyprus Island has been compiled broughting together all the information collected from the different works dealing with fungi in this area throughout the three centuries of mycology in Cyprus. This part contains 363 taxa of macroscopic Basidiomycota.

  19. Annotated Bibliography of Research in Adult Education 1983-1988.

    Science.gov (United States)

    New Brunswick Dept. of Education, Fredericton (Canada).

    Research studies conducted in Canada are emphasized in this annotated bibliography of adult education research conducted from 1983-1988, although a few studies from the United States and England are included. The entries, arranged in alphabetical order by author's last name, are in English and, occasionally, French. (CML)

  20. Fostering Creativity and Innovation in the Workforce: An Annotated Bibliography

    Science.gov (United States)

    Sandeen, Cathy A.

    2010-01-01

    This annotated bibliography is a companion piece to "Putting Creativity and Innovation to Work: Continuing Higher Education's Role in Shifting the Educational Paradigm," also in this edition of "Continuing Higher Education Review." The author has provided citations for a selection of books and a brief description of the main idea in each along…

  1. Certifying and reasoning on cost annotations of functional programs

    CERN Document Server

    Amadio, Roberto M

    2011-01-01

    We present a so-called labelling method to insert cost annotations in a higher-order functional program, to certify their correctness with respect to a standard compilation chain to assembly code, and to reason on them in a higher-order Hoare logic.

  2. Animating Peer-Level Annotations Within Web-Based Multimedia

    NARCIS (Netherlands)

    Bulterman, D.C.A.

    2003-01-01

    The TabletPC is an example of a new generation of user interface device where pen-based manipulation of information is integrated directly into a user’s workflow. Using the TabletPC's existing pen and electronic ink systems, a wide range of static documents can be created or annotated. While the fac

  3. MUTAGEN: Multi-user tool for annotating GENomes

    DEFF Research Database (Denmark)

    Brugger, K.; Redder, P.; Skovgaard, Marie

    2003-01-01

    MUTAGEN is a free prokaryotic annotation system. It offers the advantages of genome comparison, graphical sequence browsers, search facilities and open-source for user-specific adjustments. The web-interface allows several users to access the system from standard desktop computers. The Sulfolobus...

  4. OntoPIN: an ontology-annotated PPI database.

    Science.gov (United States)

    Guzzi, Pietro Hiram; Veltri, Pierangelo; Cannataro, Mario

    2013-09-01

    Protein-protein interaction (PPI) data stored in publicly available databases are queried by the use of simple query interfaces allowing only key-based queries. A typical query on such databases is based on the use of protein identifiers and enables the retrieval of one or more proteins. Nevertheless, a lot of biological information is available and is spread on different sources and encoded in different ontologies such as Gene Ontology. The integration of existing PPI databases and biological information may result in richer querying interfaces and successively could enable the development of novel algorithms that may use biological information. The OntoPIN project showed the effectiveness of the introduction of a framework for the ontology-based management and querying of Protein-Protein Interaction Data. The OntoPIN framework first merges PPI data with annotations extracted from existing ontologies (e.g. Gene Ontology) and stores annotated data into a database. Then, a semantic-based query interface enables users to query these data by using biological concepts. OntoPIN allows: (a) to extend existing PPI databases by using ontologies, (b) to enable a key-based querying of annotated data, and (c) to offer a novel query interface based on semantic similarity among annotations.

  5. Promoting reflection through annotations in formal online learning

    NARCIS (Netherlands)

    Verpoorten, Dominique; Westera, Wim; Specht, Marcus

    2012-01-01

    Verpoorten, D., Westera, W., & Specht, M. (2011). Prompting reflection through annotations in formal online learning. In W. Reinhardt, T. D. Ullmann, P. Scott, V. Pammer, O. Conlan, & A. J. Berlanga (Eds.), Proceedings of the 1st European Workshop on Awareness and Reflection in Learning Networks (AR

  6. Towards Integration of End-User Tags with Professional Annotations

    NARCIS (Netherlands)

    Gligorov, R.; Baltussen, L.B.; Ossenbruggen, J.R. van; Aroyo, L.; Brinkerink, M.; Oomen, J.; Ees, A. van

    2010-01-01

    The goal of the paper is assessing the quality of end-user tags from a video labeling game as a first step in the process of integrating them with the annotations made by professionals. Tags lack precise meaning, whereas the terms and concepts the professionals are used to have a clearly defined sem

  7. Education and Training. Annotated Bibliography. Author and Subject Index.

    Science.gov (United States)

    United Nations Food and Agriculture Organization, Rome (Italy).

    Food and Agriculture Organization (FAO) publications and documents issued by the Human Resources and Institutions division and by other technical divisions in the technical, economic, and social fields are selected, annotated and indexed in this bibliography. Documents issued prior to 1967 are not included but can be found in the Rural…

  8. Higher Education Finance. An Annotated Bibliography, Report 96-2.

    Science.gov (United States)

    Doyle, William

    This annotated bibliography on higher education finance lists 79 journal articles, books, conference papers, and reports originally published from 1973 through 1995 with most published in the 1990s. Citations include lengthy analytical summaries and critiques. The bibliography is presented in six sections which cover the following topics: (1)…

  9. Chicano and Chicana Literature, 1992-96: An Annotated Bibliography.

    Science.gov (United States)

    Salazar, Carmen; Herrera-Sobek, Maria

    1997-01-01

    Presents an updated annotated bibliography on recent Chicano publications that includes anthologies focusing on Latino works. This bibliography contains literary works and criticism written in English and Spanish. Its limited scope does not allow the inclusion of entries for articles published in magazines and journals or for unpublished doctoral…

  10. Classic Religious Books for Children: An Annotated Bibliography.

    Science.gov (United States)

    Campbell, Carol, Comp.

    This annotated bibliography of religious books for children contains approximately 450 books, one-fifth of which are Judaic. The books' current availability has been verified using Web sites such as those of individual publishers, the Library of Congress, Amazon.com, or Barnes&Noble.com. New subject headings have been added, such as Kwanza,…

  11. Censorship of Sexual Materials: A Selected, Annotated Basic Bibliography.

    Science.gov (United States)

    Tedford, Thomas L.

    The 37 references in this annotated bibliography are compiled for researchers of information on the censorship of sexual materials from ancient times to present. The materials include case studies, histories, essays, and opinion pieces about the use and regulation of "obscenity" in literature, pictorial art, radio broadcasting, the mail, film, and…

  12. Communication in a Diverse Classroom: An Annotated Bibliographic Review

    Science.gov (United States)

    Brown, Rachelle

    2016-01-01

    Students have social and personal needs to fulfill and communicate these needs in different ways. This annotated bibliographic review examined communication studies to provide educators of diverse classrooms with ideas to build an environment that contributes to student well-being. Participants in the studies ranged in age, ability, and cultural…

  13. Functional annotation of the human retinal pigment epithelium transcriptome

    NARCIS (Netherlands)

    J.C. Booij (Judith); S. van Soest (Simone); S.M.A. Swagemakers (Sigrid); A.H.W. Essing (Anke); J.H.M. Verkerk (Annemieke); P.J. van der Spek (Peter); T.G.M.F. Gorgels (Theo); A.A.B. Bergen (Arthur)

    2009-01-01

    textabstractBackground: To determine level, variability and functional annotation of gene expression of the human retinal pigment epithelium (RPE), the key tissue involved in retinal diseases like age-related macular degeneration and retinitis pigmentosa. Macular RPE cells from six selected healthy

  14. Functional annotation of the human retinal pigment epithelium transcriptome

    NARCIS (Netherlands)

    Booij, J.C.; van Soest, S.; Swagemakers, S.M.A.; Essing, A.H.W.; Verkerk, A.J.M.H.; van der Spek, P.J.; Gorgels, T.G.M.F.; Bergen, A.A.B.

    2009-01-01

    ABSTRACT: BACKGROUND: To determine level, variability and functional annotation of gene expression of the human retinal pigment epithelium (RPE), the key tissue involved in retinal diseases like age-related macular degeneration and retinitis pigmentosa. Macular RPE cells from six selected healthy hu

  15. Using Online Annotations to Support Error Correction and Corrective Feedback

    Science.gov (United States)

    Yeh, Shiou-Wen; Lo, Jia-Jiunn

    2009-01-01

    Giving feedback on second language (L2) writing is a challenging task. This research proposed an interactive environment for error correction and corrective feedback. First, we developed an online corrective feedback and error analysis system called "Online Annotator for EFL Writing". The system consisted of five facilities: Document Maker,…

  16. Annotation in School English: A Social Semiotic Historical Account

    Science.gov (United States)

    Jewitt, Carey; Bezemer, Jeff; Kress, Gunther

    2011-01-01

    What exactly has changed in the production of secondary school English over the last decade? To provide one part of an answer to that question, this paper takes the practice of annotation--a defining activity of the subject English in the UK seldom researched--and uses it as a device for uncovering aspects of changes in the subject. The…

  17. Annotated Bibliography on Return Migration to Puerto Rico.

    Science.gov (United States)

    Carrasquillo, Angela; Carrasquillo, Ceferino

    This paper is an annotated bibliography on return migration from the mainland United States to Puerto Rico. An introduction defines the term "return migration" in the specific context of the Puerto Rican community. The introduction is followed by the bibliography, which lists and summarizes research studies and works dealing with demographic data…

  18. Recall Oriented Search on the Web using Semantic Annotations

    NARCIS (Netherlands)

    Kaptein, A.M.; Broek, E.L. van den; Koot, G.; Huis in't Veld, M.A.A.

    2013-01-01

    Web search engines are optimized for early precision, which makes it difficult to perform recall oriented tasks with them. In this article, we propose several ways to leverage semantic annotations and, thereby, increase the efficiency of recall oriented search tasks, with a focus on forensic investi

  19. The Challenges of Blended Learning Using a Media Annotation Tool

    Science.gov (United States)

    Douglas, Kathy A.; Lang, Josephine; Colasante, Meg

    2014-01-01

    Blended learning has been evolving as an important approach to learning and teaching in tertiary education. This approach incorporates learning in both online and face-to-face modes and promotes deep learning by incorporating the best of both approaches. An innovation in blended learning is the use of an online media annotation tool (MAT) in…

  20. History of American Communication Education: A Selected, Annotated Basic Bibliography.

    Science.gov (United States)

    Friedrich, Gustav W.

    Noting that only a fraction of the articles in speech journals have been concerned with the history of speech education in the United States, this annotated bibliography provides a broad guide to the materials necessary for understanding that history. The 45 citations are organized in six sections concerned with: (1) historical background; (2)…

  1. The genus Lentinus (Basidiomycetes from India - an annotated checklist

    Directory of Open Access Journals (Sweden)

    Sapan Kumar Sharma

    2015-09-01

    Full Text Available An annotated checklist of species of genus Lentinus has been presented in this paper. On scrutiny of the latest authentic literature and mycobank record, out of a total of 41 documented species from India, 20 were found to be valid species while 21 were invalid species which were found to be synonyms. 

  2. An Annotated Bibliography of the Gestalt Methods, Techniques, and Therapy

    Science.gov (United States)

    Prewitt-Diaz, Joseph O.

    The purpose of this annotated bibliography is to provide the reader with a guide to relevant research in the area of Gestalt therapy, techniques, and methods. The majority of the references are journal articles written within the last 5 years or documents easily obtained through interlibrary loans from local libraries. These references were…

  3. An Annotated Bibliography of Games and Simulations in Consumer Education.

    Science.gov (United States)

    Blucker, Gwen

    Thirty-two games and simulations relating to consumer education comprise this annotated bibliography designed to aid the teacher of adult basic education students and others in their search for teaching devices. Topics covered in the various simulations include money management, insurance, credit, credit unions, consumer law, consumer frauds,…

  4. Thermal Environment in School Facilities. A Selected and Annotated Bibliography.

    Science.gov (United States)

    Hartman, Robert R.

    Contains a selected and annotated listing of source material concerning the thermal environment in school facilities. It is directed toward the school planner, architect, or administrator concerned with developing a more functional classroom environment. Topical coverage includes--(1) The Thermal Environment and Learning, (2) Physiological Factors…

  5. Automatic image annotation and retrieval using group sparsity.

    Science.gov (United States)

    Zhang, Shaoting; Huang, Junzhou; Li, Hongsheng; Metaxas, Dimitris N

    2012-06-01

    Automatically assigning relevant text keywords to images is an important problem. Many algorithms have been proposed in the past decade and achieved good performance. Efforts have focused upon model representations of keywords, whereas properties of features have not been well investigated. In most cases, a group of features is preselected, yet important feature properties are not well used to select features. In this paper, we introduce a regularization-based feature selection algorithm to leverage both the sparsity and clustering properties of features, and incorporate it into the image annotation task. Using this group-sparsity-based method, the whole group of features [e.g., red green blue (RGB) or hue, saturation, and value (HSV)] is either selected or removed. Thus, we do not need to extract this group of features when new data comes. A novel approach is also proposed to iteratively obtain similar and dissimilar pairs from both the keyword similarity and the relevance feedback. Thus, keyword similarity is modeled in the annotation framework. We also show that our framework can be employed in image retrieval tasks by selecting different image pairs. Extensive experiments are designed to compare the performance between features, feature combinations, and regularization-based feature selection methods applied on the image annotation task, which gives insight into the properties of features in the image annotation task. The experimental results demonstrate that the group-sparsity-based method is more accurate and stable than others.

  6. Effects of E-Textbook Instructor Annotations on Learner Performance

    Science.gov (United States)

    Dennis, Alan R.; Abaci, Serdar; Morrone, Anastasia S.; Plaskoff, Joshua; McNamara, Kelly O.

    2016-01-01

    With additional features and increasing cost advantages, e-textbooks are becoming a viable alternative to paper textbooks. One important feature offered by enhanced e-textbooks (e-textbooks with interactive functionality) is the ability for instructors to annotate passages with additional insights. This paper describes a pilot study that examines…

  7. Annotation of Articles from Scientific American and Student Understanding

    Science.gov (United States)

    Knapp, John, II

    1976-01-01

    Reports on a study in which high school biology students were divided into two groups: one read "Scientific American" articles and the other group read annotated "Scientific American" articles. Although there was no significant difference between means on an achievement measure of the groups, the author reports that students preferred the…

  8. Optimizing high performance computing workflow for protein functional annotation.

    Science.gov (United States)

    Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene

    2014-09-10

    Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data.

  9. Bibliographie Annotee de Linguistique Acadienne (Annotated Bibliography of Acadian Linguistics).

    Science.gov (United States)

    Gesner, Edward

    A bibliography of 430 books, journal articles, papers, and other references on Acadian French written in English or French is divided into two principal sections: an annotated bibliography of works focusing on the Acadian French dialect spoken in the Canadian Maritime Provinces, and an unannotated bibliography pertaining to Louisiana Acadian…

  10. Semantic annotation of morphological descriptions: an overall strategy

    Directory of Open Access Journals (Sweden)

    Cui Hong

    2010-05-01

    Full Text Available Abstract Background Large volumes of morphological descriptions of whole organisms have been created as print or electronic text in a human-readable format. Converting the descriptions into computer- readable formats gives a new life to the valuable knowledge on biodiversity. Research in this area started 20 years ago, yet not sufficient progress has been made to produce an automated system that requires only minimal human intervention but works on descriptions of various plant and animal groups. This paper attempts to examine the hindering factors by identifying the mismatches between existing research and the characteristics of morphological descriptions. Results This paper reviews the techniques that have been used for automated annotation, reports exploratory results on characteristics of morphological descriptions as a genre, and identifies challenges facing automated annotation systems. Based on these criteria, the paper proposes an overall strategy for converting descriptions of various taxon groups with the least human effort. Conclusions A combined unsupervised and supervised machine learning strategy is needed to construct domain ontologies and lexicons and to ultimately achieve automated semantic annotation of morphological descriptions. Further, we suggest that each effort in creating a new description or annotating an individual description collection should be shared and contribute to the "biodiversity information commons" for the Semantic Web. This cannot be done without a sound strategy and a close partnership between and among information scientists and biologists.

  11. DIYA: A Bacterial Annotation Pipeline for any Genomics Lab

    Science.gov (United States)

    2009-02-12

    microbial genomes overnight (Mardis, 2008). These technologies have created many new small ‘genome centers’ ( Zwick , 2005). DIYA (Do-It- Yourself...2008) The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation. BMC Bioinformatics, 9, 52. Zwick ,M.E

  12. Annotated Bibliography of Books on How To State Behavioral Objectives.

    Science.gov (United States)

    McGill Univ., Montreal (Quebec). Center for Learning and Development.

    This annotated bibliography contains listings of 17 books on how to state behavioral objectives. Most of the books refer specifically to designing programmed materials, but the procedures and principles apply to general instructional design and evaluation. Books and articles about behavioral objectives are not included. (MBM)

  13. Fluid annotations through open hypermedia: Using and extending emerging Web standards

    DEFF Research Database (Denmark)

    Bouvin, Niels Olof; Zellweger, Polle Trescott; Grønbæk, Kaj

    2002-01-01

    The Fluid Documents project has developed various research prototypes that show that powerful annotation techniques based on animated typographical changes can help readers utilize annotations more effectively. Our recently-developed Fluid Open Hypermedia prototype supports the authoring and brow...

  14. Laughter annotations in conversational speech corpora - possibilities and limitations for phonetic analysis

    NARCIS (Netherlands)

    Truong, Khiet P.; Trouvain, Jürgen

    2012-01-01

    Existing laughter annotations provided with several publicly available conversational speech corpora (both multiparty and dyadic conversations) were investigated and compared. We discuss the possibilities and limitations of these rather coarse and shallow laughter annotations. There are definition i

  15. A Review of Methods of Instance-based Automatic Image Annotation

    Directory of Open Access Journals (Sweden)

    Morad Derakhshan

    2016-12-01

    Full Text Available Today, to use automatic image annotation in order to fill the semantic gap between low level features of images and understanding their information in retrieving process has become popular. Since automatic image annotation is crucial in understanding digital images several methods have been proposed to automatically annotate an image. One of the most important of these methods is instance-based image annotation. As these methods are vastly used in this paper, the most important instance-based image annotation methods are analyzed. First of all the main parts of instance-based automatic image annotation are analyzed. Afterwards, the main methods of instance-based automatic image annotation are reviewed and compared based on various features. In the end the most important challenges and open-ended fields in instance-based image annotation are analyzed.

  16. Semi-automatic Annotation System for OWL-based Semantic Search

    Directory of Open Access Journals (Sweden)

    C.-H. Liu

    2009-11-01

    Full Text Available Current keyword search by Google, Yahoo, and so on gives enormous unsuitable results. A solution to this perhaps is to annotate semantics to textual web data to enable semantic search, rather than keyword search. However, pure manual annotation is very time-consuming. Further, searching high level concept such as metaphor cannot be done if the annotation is done at a low abstraction level. We, thus, present a semi-automatic annotation system, i.e. an automatic annotator and a manual annotator. Against the web ontology language (OWL terms defined by Protégé, the former annotates the textual web data using the Knuth-Morris-Pratt (KMP algorithm, while the latter allows a user to use the terms to annotate metaphors with high abstraction. The resulting semantically-enhanced textual web document can be semantically processed by other web services such as the information retrieval system and the recommendation system shown in our example.

  17. Bovine Genome Database: supporting community annotation and analysis of the Bos taurus genome

    Directory of Open Access Journals (Sweden)

    Childs Kevin L

    2010-11-01

    Full Text Available Abstract Background A goal of the Bovine Genome Database (BGD; http://BovineGenome.org has been to support the Bovine Genome Sequencing and Analysis Consortium (BGSAC in the annotation and analysis of the bovine genome. We were faced with several challenges, including the need to maintain consistent quality despite diversity in annotation expertise in the research community, the need to maintain consistent data formats, and the need to minimize the potential duplication of annotation effort. With new sequencing technologies allowing many more eukaryotic genomes to be sequenced, the demand for collaborative annotation is likely to increase. Here we present our approach, challenges and solutions facilitating a large distributed annotation project. Results and Discussion BGD has provided annotation tools that supported 147 members of the BGSAC in contributing 3,871 gene models over a fifteen-week period, and these annotations have been integrated into the bovine Official Gene Set. Our approach has been to provide an annotation system, which includes a BLAST site, multiple genome browsers, an annotation portal, and the Apollo Annotation Editor configured to connect directly to our Chado database. In addition to implementing and integrating components of the annotation system, we have performed computational analyses to create gene evidence tracks and a consensus gene set, which can be viewed on individual gene pages at BGD. Conclusions We have provided annotation tools that alleviate challenges associated with distributed annotation. Our system provides a consistent set of data to all annotators and eliminates the need for annotators to format data. Involving the bovine research community in genome annotation has allowed us to leverage expertise in various areas of bovine biology to provide biological insight into the genome sequence.

  18. Giving a Sense: A Pilot Study in Concept Annotation from Multiple Resources

    OpenAIRE

    Bojar, Ondřej; Sudarikov, Roman

    2015-01-01

    We present a pilot study of a web-based annotation of words with senses. The annotated senses come from several knowledge bases and sense inventories. The study is the first step in a planned larger annotation of grounding and should allow us to select a subset of the sense sources that cover any given text reasonably well and show an acceptable level of inter-annotator agreement.

  19. Semantic Annotation Framework For Intelligent Information Retrieval Using KIM Architecture

    Directory of Open Access Journals (Sweden)

    Sanjay Kumar Malik

    2010-11-01

    Full Text Available Due to the explosion of information/knowledge on the web and wide use of search engines for desiredinformation,the role of knowledge management(KM is becoming more significant in an organization.Knowledge Management in an Organization is used to create ,capture, store, share, retrieve and manageinformation efficiently. The semantic web, an intelligent and meaningful web, tend to provide a promisingplatform for knowledge management systems and vice versa, since they have the potential to give eachother the real substance for machine-understandable web resources which in turn will lead to anintelligent, meaningful and efficient information retrieval on web. Today,the challenge for web communityis to integrate the distributed heterogeneous resources on web with an objective of an intelligent webenvironment focusing on data semantics and user requirements. Semantic Annotation(SA is being widelyused which is about assigning to the entities in the text and links to their semantic descriptions. Varioustools like KIM, Amaya etc may be used for semantic Annotation.In this paper, we introduce semantic annotation as one of the key technology in an intelligent webenvironment , then revisit and review, discuss and explore about Knowledge Management and SemanticAnnotation. A Knowledge Management Framework and a Framework for Semantic Annotation andSemantic Search with Knowledge Base(GATE and Ontology have been presented. Then KIM Annotationplatform architecture including KIM Ontology(KIMO, KIM Knowledge Base and KIM front ends havebeen highlighted. Finally, intelligent pattern search and concerned GATE framework with a KIMAnnotation Example have been illiustrated towards an intelligent information retrieval

  20. Missing genes in the annotation of prokaryotic genomes

    Directory of Open Access Journals (Sweden)

    Feng Wu-chun

    2010-03-01

    Full Text Available Abstract Background Protein-coding gene detection in prokaryotic genomes is considered a much simpler problem than in intron-containing eukaryotic genomes. However there have been reports that prokaryotic gene finder programs have problems with small genes (either over-predicting or under-predicting. Therefore the question arises as to whether current genome annotations have systematically missing, small genes. Results We have developed a high-performance computing methodology to investigate this problem. In this methodology we compare all ORFs larger than or equal to 33 aa from all fully-sequenced prokaryotic replicons. Based on that comparison, and using conservative criteria requiring a minimum taxonomic diversity between conserved ORFs in different genomes, we have discovered 1,153 candidate genes that are missing from current genome annotations. These missing genes are similar only to each other and do not have any strong similarity to gene sequences in public databases, with the implication that these ORFs belong to missing gene families. We also uncovered 38,895 intergenic ORFs, readily identified as putative genes by similarity to currently annotated genes (we call these absent annotations. The vast majority of the missing genes found are small (less than 100 aa. A comparison of select examples with GeneMark, EasyGene and Glimmer predictions yields evidence that some of these genes are escaping detection by these programs. Conclusions Prokaryotic gene finders and prokaryotic genome annotations require improvement for accurate prediction of small genes. The number of missing gene families found is likely a lower bound on the actual number, due to the conservative criteria used to determine whether an ORF corresponds to a real gene.

  1. SAS- Semantic Annotation Service for Geoscience resources on the web

    Science.gov (United States)

    Elag, M.; Kumar, P.; Marini, L.; Li, R.; Jiang, P.

    2015-12-01

    There is a growing need for increased integration across the data and model resources that are disseminated on the web to advance their reuse across different earth science applications. Meaningful reuse of resources requires semantic metadata to realize the semantic web vision for allowing pragmatic linkage and integration among resources. Semantic metadata associates standard metadata with resources to turn them into semantically-enabled resources on the web. However, the lack of a common standardized metadata framework as well as the uncoordinated use of metadata fields across different geo-information systems, has led to a situation in which standards and related Standard Names abound. To address this need, we have designed SAS to provide a bridge between the core ontologies required to annotate resources and information systems in order to enable queries and analysis over annotation from a single environment (web). SAS is one of the services that are provided by the Geosematnic framework, which is a decentralized semantic framework to support the integration between models and data and allow semantically heterogeneous to interact with minimum human intervention. Here we present the design of SAS and demonstrate its application for annotating data and models. First we describe how predicates and their attributes are extracted from standards and ingested in the knowledge-base of the Geosemantic framework. Then we illustrate the application of SAS in annotating data managed by SEAD and annotating simulation models that have web interface. SAS is a step in a broader approach to raise the quality of geoscience data and models that are published on the web and allow users to better search, access, and use of the existing resources based on standard vocabularies that are encoded and published using semantic technologies.

  2. Protein function annotation with Structurally Aligned Local Sites of Activity (SALSAs

    Directory of Open Access Journals (Sweden)

    Wang Zhouxi

    2013-02-01

    Full Text Available Abstract Background The prediction of biochemical function from the 3D structure of a protein has proved to be much more difficult than was originally foreseen. A reliable method to test the likelihood of putative annotations and to predict function from structure would add tremendous value to structural genomics data. We report on a new method, Structurally Aligned Local Sites of Activity (SALSA, for the prediction of biochemical function based on a local structural match at the predicted catalytic or binding site. Results Implementation of the SALSA method is described. For the structural genomics protein PY01515 (PDB ID 2aqw from Plasmodium yoelii, it is shown that the putative annotation, Orotidine 5'-monophosphate decarboxylase (OMPDC, is most likely correct. SALSA analysis of YP_001304206.1 (PDB ID 3h3l, a putative sugar hydrolase from Parabacteroides distasonis, shows that its active site does not bear close resemblance to any previously characterized member of its superfamily, the Concanavalin A-like lectins/glucanases. It is noted that three residues in the active site of the thermophilic beta-1,4-xylanase from Nonomuraea flexuosa (PDB ID 1m4w, Y78, E87, and E176, overlap with POOL-predicted residues of similar type, Y168, D153, and E232, in YP_001304206.1. The substrate recognition regions of the two proteins are rather different, suggesting that YP_001304206.1 is a new functional type within the superfamily. A structural genomics protein from Mycobacterium avium (PDB ID 3q1t has been reported to be an enoyl-CoA hydratase (ECH, but SALSA analysis shows a poor match between the predicted residues for the SG protein and those of known ECHs. A better local structural match is obtained with Anabaena beta-diketone hydrolase (ABDH, a known β-diketone hydrolase from Cyanobacterium anabaena (PDB ID 2j5s. This suggests that the reported ECH function of the SG protein is incorrect and that it is more likely a β-diketone hydrolase. Conclusions

  3. Digital Ink: In-Class Annotation of PowerPoint Lectures

    Science.gov (United States)

    Johnson, Anne E.

    2008-01-01

    Digital ink is a tool that, in conjunction with Microsoft PowerPoint software, allows real-time freehand annotation of presentations. Annotation of slides during class encourages student engagement with the material and problems under discussion. Digital ink annotation is a technique suitable for teaching across many disciplines, but is especially…

  4. Integrating Annotations into a Dual-Slide PowerPoint Presentation for Classroom Learning

    Science.gov (United States)

    Lai, Yen-Shou; Tsai, Hung-Hsu; Yu, Pao-Ta

    2011-01-01

    This study introduces a learning environment integrating annotations with a dual-slide PowerPoint presentation for classroom learning. Annotation means a kind of additional information to emphasize the explanations for the learning objects. The use of annotations is to support the cognitive process for PowerPoint presentation in a classroom. The…

  5. Effects of Annotations and Homework on Learning Achievement: An Empirical Study of Scratch Programming Pedagogy

    Science.gov (United States)

    Su, Addison Y. S.; Huang, Chester S. J.; Yang, Stephen J. H.; Ding, T. J.; Hsieh, Y. Z.

    2015-01-01

    In Taiwan elementary schools, Scratch programming has been taught for more than four years. Previous studies have shown that personal annotations is a useful learning method that improve learning performance. An annotation-based Scratch programming (ASP) system provides for the creation, share, and review of annotations and homework solutions in…

  6. A Collaborative Multimedia Annotation Tool for Enhancing Knowledge Sharing in CSCL

    Science.gov (United States)

    Yang, Stephen J. H.; Zhang, Jia; Su, Addison Y. S.; Tsai, Jeffrey J. P.

    2011-01-01

    Knowledge sharing in computer supported collaborative learning (CSCL) requires intensive social interactions among participants, typically in the form of annotations. An annotation refers to an explicit expression of knowledge that is attached to a document to reveal the conceptual meanings of an annotator's implicit thoughts. In this research, we…

  7. A step-wise approach to discourse annotation : Towards a reliable categorization of coherence relations

    NARCIS (Netherlands)

    Scholman, M.C.J.; Evers-Vermeul, J.; Sanders, T.J.M.

    2016-01-01

    Over the last decennia, annotating discourse coherence relations has gained increasing interest of the linguistics research community. Because of the complexity of coherence relations, there is no agreement on an annotation standard. Current annotation methods often lack a systematic order of cohere

  8. Rapid identification of sequences for orphan enzymes to power accurate protein annotation.

    Directory of Open Access Journals (Sweden)

    Kevin R Ramkissoon

    Full Text Available The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the "back catalog" of enzymology--"orphan enzymes," those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC database alone. In this study, we demonstrate how this orphan enzyme "back catalog" is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology's "back catalog" another powerful tool to drive accurate genome annotation.

  9. Sequencing, analysis, and annotation of expressed sequence tags for Camelus dromedarius.

    Directory of Open Access Journals (Sweden)

    Abdulaziz M Al-Swailem

    Full Text Available Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and approximately 40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/, hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism.

  10. Rapid identification of sequences for orphan enzymes to power accurate protein annotation.

    Science.gov (United States)

    Ramkissoon, Kevin R; Miller, Jennifer K; Ojha, Sunil; Watson, Douglas S; Bomar, Martha G; Galande, Amit K; Shearer, Alexander G

    2013-01-01

    The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the "back catalog" of enzymology--"orphan enzymes," those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme "back catalog" is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology's "back catalog" another powerful tool to drive accurate genome annotation.

  11. Incorporating evolution of transcription factor binding sites into annotated alignments

    Indian Academy of Sciences (India)

    Abha S Bais; Steffen Grossmann; Martin Vingron

    2007-08-01

    Identifying transcription factor binding sites (TFBSs) is essential to elucidate putative regulatory mechanisms. A common strategy is to combine cross-species conservation with single sequence TFBS annotation to yield ``conserved TFBSs”. Most current methods in this field adopt a multi-step approach that segregates the two aspects. Again, it is widely accepted that the evolutionary dynamics of binding sites differ from those of the surrounding sequence. Hence, it is desirable to have an approach that explicitly takes this factor into account. Although a plethora of approaches have been proposed for the prediction of conserved TFBSs, very few explicitly model TFBS evolutionary properties, while additionally being multi-step. Recently, we introduced a novel approach to simultaneously align and annotate conserved TFBSs in a pair of sequences. Building upon the standard Smith-Waterman algorithm for local alignments, SimAnn introduces additional states for profiles to output extended alignments or annotated alignments. That is, alignments with parts annotated as gaplessly aligned TFBSs (pair-profile hits) are generated. Moreover, the pair-profile related parameters are derived in a sound statistical framework. In this article, we extend this approach to explicitly incorporate evolution of binding sites in the SimAnn framework. We demonstrate the extension in the theoretical derivations through two position-specific evolutionary models, previously used for modelling TFBS evolution. In a simulated setting, we provide a proof of concept that the approach works given the underlying assumptions, as compared to the original work. Finally, using a real dataset of experimentally verified binding sites in human-mouse sequence pairs, we compare the new approach (eSimAnn) to an existing multi-step tool that also considers TFBS evolution. Although it is widely accepted that binding sites evolve differently from the surrounding sequences, most comparative TFBS identification

  12. Genome annotation of a Saccharomyces sp. lager brewer's yeast

    Directory of Open Access Journals (Sweden)

    Patricia Marcela De León-Medina

    2016-09-01

    Full Text Available The genome of lager brewer's yeast is a hybrid, with Saccharomyces eubayanus and Saccharomyces cerevisiae as sub-genomes. Due to their specific use in the beer industry, relatively little information is available. The genome of brewing yeast was sequenced and annotated in this study. We obtained a genome size of 22.7 Mbp that consisted of 133 scaffolds, with 65 scaffolds larger than 10 kbp. With respect to the annotation, 9939 genes were obtained, and when they were submitted to a local alignment, we found that 53.93% of these genes corresponded to S. cerevisiae, while another 42.86% originated from S. eubayanus. Our results confirm that our strain is a hybrid of at least two different genomes.

  13. Using social annotation and web log to enhance search engine

    CERN Document Server

    Nguyen, Vu Thanh

    2009-01-01

    Search services have been developed rapidly in social Internet. It can help web users easily to find their documents. So that, finding a best method search is always an imagine. This paper would like introduce hybrid method of LPageRank algorithm and Social Sim Rank algorithm. LPageRank is the method using link structure to rank priority of page. It doesn't care content of page and content of query. Therefore, we want to use benefit of social annotations to create the latent semantic association between queries and annotations. This model, we use algorithm SocialPageRank and LPageRank to enhance accuracy of search system. To experiment and evaluate the proposed of the new model, we have used this model for Music Machine Website with their web logs.

  14. Automatic Plant Annotation Using 3D Computer Vision

    DEFF Research Database (Denmark)

    Nielsen, Michael

    -of-squared difference methods and energy-minimizing methods had this problem. Following the test a series of disparity estimation techniques were developed and tested in the test framework using a set of ray traced images and a hand-annotated set of real plants with similar plant shapes. Novel similarity measures were...... individual leaf had to be seperated using a simple labeling algorithm based on connected components analysis (based on Rosenfeld and Pfaltz union-find algorithm) where height information was also included, described as a NURBS surface and annotated with Number of leaves on plant, Area of plant and individual...... reconstruction and extracted features can be used to locate sample locations for multi-spectral reflection analysis. It can be employed during early growth stages in low cost crops or in high cost crops where the grown plants are placed separately. The image acquisition can be done from manually operated...

  15. Feedback Driven Annotation and Refactoring of Parallel Programs

    DEFF Research Database (Denmark)

    Larsen, Per

    This thesis combines programmer knowledge and feedback to improve modeling and optimization of software. The research is motivated by two observations. First, there is a great need for automatic analysis of software for embedded systems - to expose and model parallelism inherent in programs. Second......, some program properties are beyond reach of such analysis for theoretical and practical reasons - but can be described by programmers. Three aspects are explored. The first is annotation of the source code. Two annotations are introduced. These allow more accurate modeling of parallelism...... are not effective unless programmers are told how and when they are benecial. A prototype compilation feedback system was developed in collaboration with IBM Haifa Research Labs. It reports issues that prevent further analysis to the programmer. Performance evaluation shows that three programs performes signicantly...

  16. Scaling up genome annotation using MAKER and work queue.

    Science.gov (United States)

    Thrasher, Andrew; Musgrave, Zachary; Kachmarck, Brian; Thain, Douglas; Emrich, Scott

    2014-01-01

    Next generation sequencing technologies have enabled sequencing many genomes. Because of the overall increasing demand and the inherent parallelism available in many required analyses, these bioinformatics applications should ideally run on clusters, clouds and/or grids. We present a modified annotation framework that achieves a speed-up of 45x using 50 workers using a Caenorhabditis japonica test case. We also evaluate these modifications within the Amazon EC2 cloud framework. The underlying genome annotation (MAKER) is parallelised as an MPI application. Our framework enables it to now run without MPI while utilising a wide variety of distributed computing resources. This parallel framework also allows easy explicit data transfer, which helps overcome a major limitation of bioinformatics tools that often rely on shared file systems. Combined, our proposed framework can be used, even during early stages of development, to easily run sequence analysis tools on clusters, grids and clouds.

  17. Automation and Validation of Annotation for Hindi Anaphora Resolution

    Directory of Open Access Journals (Sweden)

    Pardeep Singh

    2015-10-01

    Full Text Available The process of labelling any language genre by which one can extract useful information is called annotation. This provides syntactic information about a word or a word phrase. In this paper, an effort has been made to provide the algorithm for semiautomatic annotation for Hindi text to cater anaphora resolution only. The study was conducted on twelve files of Ranchi Express available in EMILLE corpus. The corpus is originally tagged for demonstrative pronouns. The detection of the pronouns is supported by the incorporation of seven tags. However the semantic interpretation of the demonstrative pronoun is not supported in the original corpus. In this paper an effort has been made to automate the process of tagging as well as the handling of semantic information through addition tags. It was conducted on 1485 demonstrative pronouns. The average accuracy of precision, recall and F measure is 74, 71 and 72 respectively.

  18. Code Generation for Protocols from CPN models Annotated with Pragmatics

    DEFF Research Database (Denmark)

    Simonsen, Kent Inge; Kristensen, Lars Michael; Kindler, Ekkart

    Model-driven engineering (MDE) provides a foundation for automatically generating software based on models. Models allow software designs to be specified focusing on the problem domain and abstracting from the details of underlying implementation platforms. When applied in the context of formal...... modelling languages, MDE further has the advantage that models are amenable to model checking which allows key behavioural properties of the software design to be verified. The combination of formally verified models and automated code generation contributes to a high degree of assurance that the resulting...... of the same model and sufficiently detailed to serve as a basis for automated code generation when annotated with code generation pragmatics. Pragmatics are syntactical annotations designed to make the CPN models descriptive and to address the problem that models with enough details for generating code from...

  19. Bulgarian sense-annotated corpus – between the tradition and novelty

    Directory of Open Access Journals (Sweden)

    Svetla Koeva

    2015-11-01

    Full Text Available Bulgarian sense-annotated corpus – between the tradition and novelty The Bulgarian Sense-annotated Corpus (BulSemCor is compiled according to the general methodology established by the SemCor project. It is a subset of the Brown Corpus of Bulgarian semantically annotated with a corresponding synonym set (synset in the Bulgarian wordnet. Unlike the bulk of sense-annotated corpora where only (sets of content words are annotated, in BulSemCor each lexical unit has been assigned a sense. The main contributions achieved in the work on BulSemCor are briefly decides in the presented paper: definition of an annotation schema, compilation of an input corpus, development of a sense-annotated corpus, Bulgarian wordnet enlargement.

  20. Shallow syntactic annotation in the corpus of Wrocław University of Technology

    Directory of Open Access Journals (Sweden)

    Adam Radziszewski

    2015-11-01

    Full Text Available Shallow syntactic annotation in the corpus of Wrocław University of Technology In this paper we present shallow syntactic annotation of The Wrocław University of Technology Corpus. We discuss some theoretical and practical considerations related to shallow parsing of Polish, then we present our annotation guidelines. The proposed annotation scheme includes chunking – four chunk types are defined with reference to the notion of accommodation and syntactic connotation, as well as annotation of four inter-chunk predicate-argument relations. Until now almost 18k chunk and 4k relation instances have been annotated. We believe that both the corpus and the annotation guideliness will prove their applicability in construction of automatic shallow parsers.

  1. Towards Annotopia—Enabling the Semantic Interoperability of Web-Based Annotations

    Directory of Open Access Journals (Sweden)

    Anna Gerber

    2012-08-01

    Full Text Available This paper describes the results of a collaborative effort that has reconciled the Open Annotation Collaboration (OAC ontology and the Annotation Ontology (AO to produce a merged data model [the Open Annotation (OA data model] to describe Web-based annotations—and hence facilitate the discovery, sharing and re-use of such annotations. Using a number of case studies that include digital scholarly editing, 3D museum artifacts and sensor data streams, we evaluate the OA model’s capabilities. We also describe our implementation of an online annotation server that supports the storage, search and retrieval of OA-compliant annotations across multiple applications and disciplines. Finally we discuss outstanding problem issues associated with the OA ontology, and the impact that certain design decisions have had on the efficient storage, indexing, search and retrieval of complex structured annotations.

  2. Analysis of LYSA-calculus with explicit confidentiality annotations

    DEFF Research Database (Denmark)

    Gao, Han; Nielson, Hanne Riis

    2006-01-01

    protocols. A static analysis approach is developed for analyzing protocols specified in the extended LYSA. The proposed approach will over-approximate the possible executions of protocols while keeping track of all messages communicated over the network, and furthermore it will capture the potential...... malicious activities performed by attackers as specified by the confidentiality annotations. The proposed analysis approach is fully automatic without the need of human intervention and has been applied successfully to a number of protocols....

  3. Construction of a Phonotactic Dialect Corpus using Semiautomatic Annotation

    Science.gov (United States)

    2007-01-01

    phonological or phonetic feature is speaker specific or commonly found in that speaker’s dialect. Historically, the sociolinguistic study of dialect...more detail in Section 4. Identification of dialect-specific phonetic / phonological effects: This is, of course, the primary goal: detailed labeling of... phonetic / phonological transforms per dialect and the true dialect of the audio file being annotated, finds regions in which a phonetic / phonological

  4. Grass buffers for playas in agricultural landscapes: An annotated bibliography

    Science.gov (United States)

    Melcher, Cynthia P.; Skagen, Susan K.

    2005-01-01

    This bibliography and associated literature synthesis (Melcher and Skagen, 2005) was developed for the Playa Lakes Joint Venture (PLJV). The PLJV sought compilation and annotation of the literature on grass buffers for protecting playas from runoff containing sediments, nutrients, pesticides, and other contaminants. In addition, PLJV sought information regarding the extent to which buffers may attenuate the precipitation runoff needed to fill playas, and avian use of buffers. We emphasize grass buffers, but we also provide information on other buffer types.

  5. An annotated checklist of the Cladocera (Crustacea: Branchiopoda) of Colombia.

    Science.gov (United States)

    Kotov, Alexey A; Fuentes-Reinés, Juan M

    2015-11-20

    Based on the revision of available literature on the Colombian Cladocera (Crustacea: Branchiopoda), we present an annotated checklist, with taxonomical comments for all taxa recorded since the start of research on this group in the country in 1913. We have listed 101 valid taxa, of which most records belong to the Caribbean region of Colombia. The situation in Colombian Cladocera taxonomy is, at present, unfavorable for any realistic conclusions on biodiversity, ecology and biogeography.

  6. Terra: a Collection of Translation Error-Annotated Corpora

    OpenAIRE

    Popović, Maja; Fishel, Mark; Bojar, Ondřej

    2012-01-01

    Recently the first methods of automatic diagnostics of machine translation have emerged; since this area of research is relatively young, the efforts are not coordinated. We present a collection of translation error-annotated corpora, consisting of automatically produced trans- lations and their detailed manual translation error analysis. Using the collected corpora we evaluate the available state-of-the-art methods of MT diagnostics and assess, how well the methods perform, how they...

  7. Experimental Evidence for a Revision in the Annotation of Putative Pyridoxamine 5'-Phosphate Oxidases P(N/MP from Fungi.

    Directory of Open Access Journals (Sweden)

    Tatiana Domitrovic

    Full Text Available Pyridoxinamine 5'-phosphate oxidases (P(N/MP oxidases that bind flavin mononucleotide (FMN and oxidize pyridoxine 5'-phosphate or pyridoxamine 5'-phosphate to form pyridoxal 5'-phosphate (PLP are an important class of enzymes that play a central role in cell metabolism. Failure to generate an adequate supply of PLP is very detrimental to most organisms and is often clinically manifested as a neurological disorder in mammals. In this study, we analyzed the function of YLR456W and YPR172W, two homologous genes of unknown function from S. cerevisiae that have been annotated as putative P(N/MP oxidases based on sequence homology. Different experimental approaches indicated that neither protein catalyzes PLP formation nor binds FMN. On the other hand, our analysis confirmed the enzymatic activity of Pdx3, the S. cerevisiae protein previously implicated in PLP biosynthesis by genetic and structural characterization. After a careful sequence analysis comparing the putative and confirmed P(N/MP oxidases, we found that the protein domain (PF01243 that led to the YLR456W and YPR172W annotation is a poor indicator of P(N/MP oxidase activity. We suggest that a combination of two Pfam domains (PF01243 and PF10590 present in Pdx3 and other confirmed P(N/MP oxidases would be a stronger predictor of this molecular function. This work exemplifies the importance of experimental validation to rectify genome annotation and proposes a revision in the annotation of at least 400 sequences from a wide variety of fungal species that are homologous to YLR456W and are currently misrepresented as putative P(N/MP oxidases.

  8. Multi-label Image Annotation by Structural Grouping Sparsity

    Science.gov (United States)

    Han, Yahong; Wu, Fei; Zhuang, Yueting

    We can obtain high-dimensional heterogeneous features from real-world images on photo-sharing website, for an example Flickr. Those features are implemented to describe their various aspects of visual characteristics, such as color, texture and shape etc. The heterogeneous features are often over-complete to describe certain semantic. Therefore, the selection of limited discriminative features for certain semantics is hence crucial to make the image understanding more interpretable. This chapter introduces one approach for multi-label image annotation with a regularized penalty. We call it Multi-label Image Boosting by the selection of heterogeneous features with structural Grouping Sparsity (MtBGS). MtBGS induces a (structural) sparse selection model to identify subgroups of homogeneous features for predicting a certain label. Moreover, the correlations among multiple tags are utilized in MtBGS to boost the performance of multi-label annotation. Extensive experiments on public image datasets show that the proposed approach has better multi-label image annotation performance and leads to a quite interpretable model for image understanding.

  9. iPad: Semantic annotation and markup of radiological images.

    Science.gov (United States)

    Rubin, Daniel L; Rodriguez, Cesar; Shah, Priyanka; Beaulieu, Chris

    2008-11-06

    Radiological images contain a wealth of information,such as anatomy and pathology, which is often not explicit and computationally accessible. Information schemes are being developed to describe the semantic content of images, but such schemes can be unwieldy to operationalize because there are few tools to enable users to capture structured information easily as part of the routine research workflow. We have created iPad, an open source tool enabling researchers and clinicians to create semantic annotations on radiological images. iPad hides the complexity of the underlying image annotation information model from users, permitting them to describe images and image regions using a graphical interface that maps their descriptions to structured ontologies semi-automatically. Image annotations are saved in a variety of formats,enabling interoperability among medical records systems, image archives in hospitals, and the Semantic Web. Tools such as iPad can help reduce the burden of collecting structured information from images, and it could ultimately enable researchers and physicians to exploit images on a very large scale and glean the biological and physiological significance of image content.

  10. Semi-automatic video semantic annotation based on active learning

    Science.gov (United States)

    Song, Yan; Hua, Xian-Sheng; Dai, Li-Rong; Wang, Ren-Hua

    2005-07-01

    In this paper, we propose a novel semi-automatic annotation scheme for home videos based on active learning. It is well-known that there is a large gap between semantics and low-level features. To narrow down this gap, relevance feedback has been introduced in a number of literatures. Furthermore, to accelerate the convergence to the optimal result, several active learning schemes, in which the most informative samples are chosen to be annotated, have been proposed in literature instead of randomly selecting samples. In this paper, a representative active learning method is proposed, which local consistency of video content is effectively taken into consideration. The main idea is to exploit the global and local statistical characteristics of videos, and the temporal relationship between shots. The global model is trained on a smaller pre-labeled video dataset, and the local information is obtained online in the process of active learning, and will be used to adjust the initial global model adaptively. The experiment results show that the proposed active learning scheme has significantly improved the annotation performance compared with random selecting and common active learning method.

  11. A topic modeling approach for web service annotation

    Directory of Open Access Journals (Sweden)

    Leandro Ordóñez-Ante

    2014-06-01

    Full Text Available The actual implementation of semantic-based mechanisms for service retrieval has been restricted, given the resource-intensive procedure involved in the formal specification of services, which generally comprises associating semantic annotations to their documentation sources. Typically, developer performs such a procedure by hand, requiring specialized knowledge on models for semantic description of services (e.g. OWL-S, WSMO, SAWSDL, as well as formal specifications of knowledge. Thus, this semantic-based service description procedure turns out to be a cumbersome and error-prone task. This paper introduces a proposal for service annotation, based on processing web service documentation for extracting information regarding its offered capabilities. By uncovering the hidden semantic structure of such information through statistical analysis techniques, we are able to associate meaningful annotations to the services operations/resources, while grouping those operations into non-exclusive semantic related categories. This research paper belongs to the TelComp 2.0 project, which Colciencas and University of Cauca founded in cooperation.

  12. Annotating Cancer Variants and Anti-Cancer Therapeutics in Reactome

    Energy Technology Data Exchange (ETDEWEB)

    Milacic, Marija; Haw, Robin, E-mail: robin.haw@oicr.on.ca; Rothfels, Karen; Wu, Guanming [Informatics and Bio-computing Platform, Ontario Institute for Cancer Research, Toronto, ON, M5G0A3 (Canada); Croft, David; Hermjakob, Henning [European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD (United Kingdom); D’Eustachio, Peter [Department of Biochemistry, NYU School of Medicine, New York, NY 10016 (United States); Stein, Lincoln [Informatics and Bio-computing Platform, Ontario Institute for Cancer Research, Toronto, ON, M5G0A3 (Canada)

    2012-11-08

    Reactome describes biological pathways as chemical reactions that closely mirror the actual physical interactions that occur in the cell. Recent extensions of our data model accommodate the annotation of cancer and other disease processes. First, we have extended our class of protein modifications to accommodate annotation of changes in amino acid sequence and the formation of fusion proteins to describe the proteins involved in disease processes. Second, we have added a disease attribute to reaction, pathway, and physical entity classes that uses disease ontology terms. To support the graphical representation of “cancer” pathways, we have adapted our Pathway Browser to display disease variants and events in a way that allows comparison with the wild type pathway, and shows connections between perturbations in cancer and other biological pathways. The curation of pathways associated with cancer, coupled with our efforts to create other disease-specific pathways, will interoperate with our existing pathway and network analysis tools. Using the Epidermal Growth Factor Receptor (EGFR) signaling pathway as an example, we show how Reactome annotates and presents the altered biological behavior of EGFR variants due to their altered kinase and ligand-binding properties, and the mode of action and specificity of anti-cancer therapeutics.

  13. Assessment of protein set coherence using functional annotations

    Directory of Open Access Journals (Sweden)

    Carazo Jose M

    2008-10-01

    Full Text Available Abstract Background Analysis of large-scale experimental datasets frequently produces one or more sets of proteins that are subsequently mined for functional interpretation and validation. To this end, a number of computational methods have been devised that rely on the analysis of functional annotations. Although current methods provide valuable information (e.g. significantly enriched annotations, pairwise functional similarities, they do not specifically measure the degree of homogeneity of a protein set. Results In this work we present a method that scores the degree of functional homogeneity, or coherence, of a set of proteins on the basis of the global similarity of their functional annotations. The method uses statistical hypothesis testing to assess the significance of the set in the context of the functional space of a reference set. As such, it can be used as a first step in the validation of sets expected to be homogeneous prior to further functional interpretation. Conclusion We evaluate our method by analysing known biologically relevant sets as well as random ones. The known relevant sets comprise macromolecular complexes, cellular components and pathways described for Saccharomyces cerevisiae, which are mostly significantly coherent. Finally, we illustrate the usefulness of our approach for validating 'functional modules' obtained from computational analysis of protein-protein interaction networks. Matlab code and supplementary data are available at http://www.cnb.csic.es/~monica/coherence/

  14. Semantic annotation of Web data applied to risk in food.

    Science.gov (United States)

    Hignette, Gaëlle; Buche, Patrice; Couvert, Olivier; Dibie-Barthélemy, Juliette; Doussot, David; Haemmerlé, Ollivier; Mettler, Eric; Soler, Lydie

    2008-11-30

    A preliminary step to risk in food assessment is the gathering of experimental data. In the framework of the Sym'Previus project (http://www.symprevius.org), a complete data integration system has been designed, grouping data provided by industrial partners and data extracted from papers published in the main scientific journals of the domain. Those data have been classified by means of a predefined vocabulary, called ontology. Our aim is to complement the database with data extracted from the Web. In the framework of the WebContent project (www.webcontent.fr), we have designed a semi-automatic acquisition tool, called @WEB, which retrieves scientific documents from the Web. During the @WEB process, data tables are extracted from the documents and then annotated with the ontology. We focus on the data tables as they contain, in general, a synthesis of data published in the documents. In this paper, we explain how the columns of the data tables are automatically annotated with data types of the ontology and how the relations represented by the table are recognised. We also give the results of our experimentation to assess the quality of such an annotation.

  15. Identification of novel endogenous antisense transcripts by DNA microarray analysis targeting complementary strand of annotated genes

    Directory of Open Access Journals (Sweden)

    Kohama Chihiro

    2009-08-01

    Full Text Available Abstract Background Recent transcriptomic analyses in mammals have uncovered the widespread occurrence of endogenous antisense transcripts, termed natural antisense transcripts (NATs. NATs are transcribed from the opposite strand of the gene locus and are thought to control sense gene expression, but the mechanism of such regulation is as yet unknown. Although several thousand potential sense-antisense pairs have been identified in mammals, examples of functionally characterized NATs remain limited. To identify NAT candidates suitable for further functional analyses, we performed DNA microarray-based NAT screening using mouse adult normal tissues and mammary tumors to target not only the sense orientation but also the complementary strand of the annotated genes. Results First, we designed microarray probes to target the complementary strand of genes for which an antisense counterpart had been identified only in human public cDNA sources, but not in the mouse. We observed a prominent expression signal from 66.1% of 635 target genes, and 58 genes of these showed tissue-specific expression. Expression analyses of selected examples (Acaa1b and Aard confirmed their dynamic transcription in vivo. Although interspecies conservation of NAT expression was previously investigated by the presence of cDNA sources in both species, our results suggest that there are more examples of human-mouse conserved NATs that could not be identified by cDNA sources. We also designed probes to target the complementary strand of well-characterized genes, including oncogenes, and compared the expression of these genes between mammary cancerous tissues and non-pathological tissues. We found that antisense expression of 95 genes of 404 well-annotated genes was markedly altered in tumor tissue compared with that in normal tissue and that 19 of these genes also exhibited changes in sense gene expression. These results highlight the importance of NAT expression in the regulation

  16. Supporting Keyword Search for Image Retrieval with Integration of Probabilistic Annotation

    Directory of Open Access Journals (Sweden)

    Tie Hua Zhou

    2015-05-01

    Full Text Available The ever-increasing quantities of digital photo resources are annotated with enriching vocabularies to form semantic annotations. Photo-sharing social networks have boosted the need for efficient and intuitive querying to respond to user requirements in large-scale image collections. In order to help users formulate efficient and effective image retrieval, we present a novel integration of a probabilistic model based on keyword query architecture that models the probability distribution of image annotations: allowing users to obtain satisfactory results from image retrieval via the integration of multiple annotations. We focus on the annotation integration step in order to specify the meaning of each image annotation, thus leading to the most representative annotations of the intent of a keyword search. For this demonstration, we show how a probabilistic model has been integrated to semantic annotations to allow users to intuitively define explicit and precise keyword queries in order to retrieve satisfactory image results distributed in heterogeneous large data sources. Our experiments on SBU (collected by Stony Brook University database show that (i our integrated annotation contains higher quality representatives and semantic matches; and (ii the results indicating annotation integration can indeed improve image search result quality.

  17. Sma3s: A Three-Step Modular Annotator for Large Sequence Datasets

    Science.gov (United States)

    Muñoz-Mérida, Antonio; Viguera, Enrique; Claros, M. Gonzalo; Trelles, Oswaldo; Pérez-Pulido, Antonio J.

    2014-01-01

    Automatic sequence annotation is an essential component of modern ‘omics’ studies, which aim to extract information from large collections of sequence data. Most existing tools use sequence homology to establish evolutionary relationships and assign putative functions to sequences. However, it can be difficult to define a similarity threshold that achieves sufficient coverage without sacrificing annotation quality. Defining the correct configuration is critical and can be challenging for non-specialist users. Thus, the development of robust automatic annotation techniques that generate high-quality annotations without needing expert knowledge would be very valuable for the research community. We present Sma3s, a tool for automatically annotating very large collections of biological sequences from any kind of gene library or genome. Sma3s is composed of three modules that progressively annotate query sequences using either: (i) very similar homologues, (ii) orthologous sequences or (iii) terms enriched in groups of homologous sequences. We trained the system using several random sets of known sequences, demonstrating average sensitivity and specificity values of ∼85%. In conclusion, Sma3s is a versatile tool for high-throughput annotation of a wide variety of sequence datasets that outperforms the accuracy of other well-established annotation algorithms, and it can enrich existing database annotations and uncover previously hidden features. Importantly, Sma3s has already been used in the functional annotation of two published transcriptomes. PMID:24501397

  18. ArrayIDer: automated structural re-annotation pipeline for DNA microarrays

    Directory of Open Access Journals (Sweden)

    McCarthy Fiona M

    2009-01-01

    Full Text Available Abstract Background Systems biology modeling from microarray data requires the most contemporary structural and functional array annotation. However, microarray annotations, especially for non-commercial, non-traditional biomedical model organisms, are often dated. In addition, most microarray analysis tools do not readily accept EST clone names, which are abundantly represented on arrays. Manual re-annotation of microarrays is impracticable and so we developed a computational re-annotation tool (ArrayIDer to retrieve the most recent accession mapping files from public databases based on EST clone names or accessions and rapidly generate database accessions for entire microarrays. Results We utilized the Fred Hutchinson Cancer Research Centre 13K chicken cDNA array – a widely-used non-commercial chicken microarray – to demonstrate the principle that ArrayIDer could markedly improve annotation. We structurally re-annotated 55% of the entire array. Moreover, we decreased non-chicken functional annotations by 2 fold. One beneficial consequence of our re-annotation was to identify 290 pseudogenes, of which 66 were previously incorrectly annotated. Conclusion ArrayIDer allows rapid automated structural re-annotation of entire arrays and provides multiple accession types for use in subsequent functional analysis. This information is especially valuable for systems biology modeling in the non-traditional biomedical model organisms.

  19. AGeS: a software system for microbial genome sequence annotation.

    Directory of Open Access Journals (Sweden)

    Kamal Kumar

    Full Text Available BACKGROUND: The annotation of genomes from next-generation sequencing platforms needs to be rapid, high-throughput, and fully integrated and automated. Although a few Web-based annotation services have recently become available, they may not be the best solution for researchers that need to annotate a large number of genomes, possibly including proprietary data, and store them locally for further analysis. To address this need, we developed a standalone software application, the Annotation of microbial Genome Sequences (AGeS system, which incorporates publicly available and in-house-developed bioinformatics tools and databases, many of which are parallelized for high-throughput performance. METHODOLOGY: The AGeS system supports three main capabilities. The first is the storage of input contig sequences and the resulting annotation data in a central, customized database. The second is the annotation of microbial genomes using an integrated software pipeline, which first analyzes contigs from high-throughput sequencing by locating genomic regions that code for proteins, RNA, and other genomic elements through the Do-It-Yourself Annotation (DIYA framework. The identified protein-coding regions are then functionally annotated using the in-house-developed Pipeline for Protein Annotation (PIPA. The third capability is the visualization of annotated sequences using GBrowse. To date, we have implemented these capabilities for bacterial genomes. AGeS was evaluated by comparing its genome annotations with those provided by three other methods. Our results indicate that the software tools integrated into AGeS provide annotations that are in general agreement with those provided by the compared methods. This is demonstrated by a >94% overlap in the number of identified genes, a significant number of identical annotated features, and a >90% agreement in enzyme function predictions.

  20. nGASP - the nematode genome annotation assessment project

    Energy Technology Data Exchange (ETDEWEB)

    Coghlan, A; Fiedler, T J; McKay, S J; Flicek, P; Harris, T W; Blasiar, D; Allen, J; Stein, L D

    2008-12-19

    While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets for 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. The most accurate gene-finders were 'combiner' algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second place. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy as reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs were the most challenging for gene-finders. While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets for 10 Mb of the C