WorldWideScience

Sample records for metagenomics extracting information

  1. Metagenomes provide valuable comparative information on soil microeukaryotes

    DEFF Research Database (Denmark)

    Jacquiod, Samuel Jehan Auguste; Stenbæk, Jonas; Santos, Susana

    2016-01-01

    has been identified. Our analyses suggest that publicly available metagenome data can provide valuable information on soil microeukaryotes for comparative purposes when handled appropriately, complementing the current view provided by ribosomal amplicon sequencing methods......., providing microbiologists with substantial amounts of accessible information. We took advantage of public metagenomes in order to investigate microeukaryote communities in a well characterized grassland soil. The data gathered allowed the evaluation of several factors impacting the community structure......, including the DNA extraction method, the database choice and also the annotation procedure. While most studies on soil microeukaryotes are based on sequencing of PCR-amplified taxonomic markers (18S rRNA genes, ITS regions), this work represents, to our knowledge, the first report based solely...

  2. Rapid and efficient method to extract metagenomic DNA from estuarine sediments.

    Science.gov (United States)

    Shamim, Kashif; Sharma, Jaya; Dubey, Santosh Kumar

    2017-07-01

    Metagenomic DNA from sediments of selective estuaries of Goa, India was extracted using a simple, fast, efficient and environment friendly method. The recovery of pure metagenomic DNA from our method was significantly high as compared to other well-known methods since the concentration of recovered metagenomic DNA ranged from 1185.1 to 4579.7 µg/g of sediment. The purity of metagenomic DNA was also considerably high as the ratio of absorbance at 260 and 280 nm ranged from 1.88 to 1.94. Therefore, the recovered metagenomic DNA was directly used to perform various molecular biology experiments viz. restriction digestion, PCR amplification, cloning and metagenomic library construction. This clearly proved that our protocol for metagenomic DNA extraction using silica gel efficiently removed the contaminants and prevented shearing of the metagenomic DNA. Thus, this modified method can be used to recover pure metagenomic DNA from various estuarine sediments in a rapid, efficient and eco-friendly manner.

  3. EXTRACT: interactive extraction of environment metadata and term suggestion for metagenomic sample annotation.

    Science.gov (United States)

    Pafilis, Evangelos; Buttigieg, Pier Luigi; Ferrell, Barbra; Pereira, Emiliano; Schnetzer, Julia; Arvanitidis, Christos; Jensen, Lars Juhl

    2016-01-01

    The microbial and molecular ecology research communities have made substantial progress on developing standards for annotating samples with environment metadata. However, sample manual annotation is a highly labor intensive process and requires familiarity with the terminologies used. We have therefore developed an interactive annotation tool, EXTRACT, which helps curators identify and extract standard-compliant terms for annotation of metagenomic records and other samples. Behind its web-based user interface, the system combines published methods for named entity recognition of environment, organism, tissue and disease terms. The evaluators in the BioCreative V Interactive Annotation Task found the system to be intuitive, useful, well documented and sufficiently accurate to be helpful in spotting relevant text passages and extracting organism and environment terms. Comparison of fully manual and text-mining-assisted curation revealed that EXTRACT speeds up annotation by 15-25% and helps curators to detect terms that would otherwise have been missed. Database URL: https://extract.hcmr.gr/. © The Author(s) 2016. Published by Oxford University Press.

  4. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea

    Energy Technology Data Exchange (ETDEWEB)

    Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas; Harmon-Smith, Miranda; Doud, Devin; Reddy, T. B. K.; Schulz, Frederik; Jarett, Jessica; Rivers, Adam R.; Eloe-Fadrosh, Emiley A.; Tringe, Susannah G.; Ivanova, Natalia N.; Copeland, Alex; Clum, Alicia; Becraft, Eric D.; Malmstrom, Rex R.; Birren, Bruce; Podar, Mircea; Bork, Peer; Weinstock, George M.; Garrity, George M.; Dodsworth, Jeremy A.; Yooseph, Shibu; Sutton, Granger; Glöckner, Frank O.; Gilbert, Jack A.; Nelson, William C.; Hallam, Steven J.; Jungbluth, Sean P.; Ettema, Thijs J. G.; Tighe, Scott; Konstantinidis, Konstantinos T.; Liu, Wen-Tso; Baker, Brett J.; Rattei, Thomas; Eisen, Jonathan A.; Hedlund, Brian; McMahon, Katherine D.; Fierer, Noah; Knight, Rob; Finn, Rob; Cochrane, Guy; Karsch-Mizrachi, Ilene; Tyson, Gene W.; Rinke, Christian; Kyrpides, Nikos C.; Schriml, Lynn; Garrity, George M.; Hugenholtz, Philip; Sutton, Granger; Yilmaz, Pelin; Meyer, Folker; Glöckner, Frank O.; Gilbert, Jack A.; Knight, Rob; Finn, Rob; Cochrane, Guy; Karsch-Mizrachi, Ilene; Lapidus, Alla; Meyer, Folker; Yilmaz, Pelin; Parks, Donovan H.; Eren, A. M.; Schriml, Lynn; Banfield, Jillian F.; Hugenholtz, Philip; Woyke, Tanja

    2017-08-08

    The number of genomes from uncultivated microbes will soon surpass the number of isolate genomes in public databases (Hugenholtz, Skarshewski, & Parks, 2016). Technological advancements in high-throughput sequencing and assembly, including single-cell genomics and the computational extraction of genomes from metagenomes (GFMs), are largely responsible. Here we propose community standards for reporting the Minimum Information about a Single-Cell Genome (MIxS-SCG) and Minimum Information about Genomes extracted From Metagenomes (MIxS-GFM) specific for Bacteria and Archaea. The standards have been developed in the context of the International Genomics Standards Consortium (GSC) community (Field et al., 2014) and can be viewed as a supplement to other GSC checklists including the Minimum Information about a Genome Sequence (MIGS), Minimum information about a Metagenomic Sequence(s) (MIMS) (Field et al., 2008) and Minimum Information about a Marker Gene Sequence (MIMARKS) (P. Yilmaz et al., 2011). Community-wide acceptance of MIxS-SCG and MIxS-GFM for Bacteria and Archaea will enable broad comparative analyses of genomes from the majority of taxa that remain uncultivated, improving our understanding of microbial function, ecology, and evolution.

  5. Information extraction system

    Science.gov (United States)

    Lemmond, Tracy D; Hanley, William G; Guensche, Joseph Wendell; Perry, Nathan C; Nitao, John J; Kidwell, Paul Brandon; Boakye, Kofi Agyeman; Glaser, Ron E; Prenger, Ryan James

    2014-05-13

    An information extraction system and methods of operating the system are provided. In particular, an information extraction system for performing meta-extraction of named entities of people, organizations, and locations as well as relationships and events from text documents are described herein.

  6. Databases of the marine metagenomics

    KAUST Repository

    Mineta, Katsuhiko

    2015-10-28

    The metagenomic data obtained from marine environments is significantly useful for understanding marine microbial communities. In comparison with the conventional amplicon-based approach of metagenomics, the recent shotgun sequencing-based approach has become a powerful tool that provides an efficient way of grasping a diversity of the entire microbial community at a sampling point in the sea. However, this approach accelerates accumulation of the metagenome data as well as increase of data complexity. Moreover, when metagenomic approach is used for monitoring a time change of marine environments at multiple locations of the seawater, accumulation of metagenomics data will become tremendous with an enormous speed. Because this kind of situation has started becoming of reality at many marine research institutions and stations all over the world, it looks obvious that the data management and analysis will be confronted by the so-called Big Data issues such as how the database can be constructed in an efficient way and how useful knowledge should be extracted from a vast amount of the data. In this review, we summarize the outline of all the major databases of marine metagenome that are currently publically available, noting that database exclusively on marine metagenome is none but the number of metagenome databases including marine metagenome data are six, unexpectedly still small. We also extend our explanation to the databases, as reference database we call, that will be useful for constructing a marine metagenome database as well as complementing important information with the database. Then, we would point out a number of challenges to be conquered in constructing the marine metagenome database.

  7. An Improved Methodology to Overcome Key Issues in Human Fecal Metagenomic DNA Extraction

    Directory of Open Access Journals (Sweden)

    Jitendra Kumar

    2016-12-01

    Full Text Available Microbes are ubiquitously distributed in nature, and recent culture-independent studies have highlighted the significance of gut microbiota in human health and disease. Fecal DNA is the primary source for the majority of human gut microbiome studies. However, further improvement is needed to obtain fecal metagenomic DNA with sufficient amount and good quality but low host genomic DNA contamination. In the current study, we demonstrate a quick, robust, unbiased, and cost-effective method for the isolation of high molecular weight (>23 kb metagenomic DNA (260/280 ratio >1.8 with a good yield (55.8 ± 3.8 ng/mg of feces. We also confirm that there is very low human genomic DNA contamination (eubacterial: human genomic DNA marker genes = 227.9:1 in the human feces. The newly-developed method robustly performs for fresh as well as stored fecal samples as demonstrated by 16S rRNA gene sequencing using 454 FLX+. Moreover, 16S rRNA gene analysis indicated that compared to other DNA extraction methods tested, the fecal metagenomic DNA isolated with current methodology retains species richness and does not show microbial diversity biases, which is further confirmed by qPCR with a known quantity of spike-in genomes. Overall, our data highlight a protocol with a balance between quality, amount, user-friendliness, and cost effectiveness for its suitability toward usage for culture-independent analysis of the human gut microbiome, which provides a robust solution to overcome key issues associated with fecal metagenomic DNA isolation in human gut microbiome studies.

  8. Multimedia Information Extraction

    CERN Document Server

    Maybury, Mark T

    2012-01-01

    The advent of increasingly large consumer collections of audio (e.g., iTunes), imagery (e.g., Flickr), and video (e.g., YouTube) is driving a need not only for multimedia retrieval but also information extraction from and across media. Furthermore, industrial and government collections fuel requirements for stock media access, media preservation, broadcast news retrieval, identity management, and video surveillance.  While significant advances have been made in language processing for information extraction from unstructured multilingual text and extraction of objects from imagery and vid

  9. Chitinase genes revealed and compared in bacterial isolates, DNA extracts and a metagenomic library from a phytopathogen suppressive soil

    Energy Technology Data Exchange (ETDEWEB)

    Hjort, K.; Bergstrom, M.; Adesina, M.F.; Jansson, J.K.; Smalla, K.; Sjoling, S.

    2009-09-01

    Soil that is suppressive to disease caused by fungal pathogens is an interesting source to target for novel chitinases that might be contributing towards disease suppression. In this study we screened for chitinase genes, in a phytopathogen-suppressive soil in three ways: (1) from a metagenomic library constructed from microbial cells extracted from soil, (2) from directly extracted DNA and (3) from bacterial isolates with antifungal and chitinase activities. Terminal-restriction fragment length polymorphism (T-RFLP) of chitinase genes revealed differences in amplified chitinase genes from the metagenomic library and the directly extracted DNA, but approximately 40% of the identified chitinase terminal-restriction fragments (TRFs) were found in both sources. All of the chitinase TRFs from the isolates were matched to TRFs in the directly extracted DNA and the metagenomic library. The most abundant chitinase TRF in the soil DNA and the metagenomic library corresponded to the TRF{sup 103} of the isolate, Streptomyces mutomycini and/or Streptomyces clavifer. There were good matches between T-RFLP profiles of chitinase gene fragments obtained from different sources of DNA. However, there were also differences in both the chitinase and the 16S rRNA gene T-RFLP patterns depending on the source of DNA, emphasizing the lack of complete coverage of the gene diversity by any of the approaches used.

  10. Applying Shannon's information theory to bacterial and phage genomes and metagenomes

    Science.gov (United States)

    Akhter, Sajia; Bailey, Barbara A.; Salamon, Peter; Aziz, Ramy K.; Edwards, Robert A.

    2013-01-01

    All sequence data contain inherent information that can be measured by Shannon's uncertainty theory. Such measurement is valuable in evaluating large data sets, such as metagenomic libraries, to prioritize their analysis and annotation, thus saving computational resources. Here, Shannon's index of complete phage and bacterial genomes was examined. The information content of a genome was found to be highly dependent on the genome length, GC content, and sequence word size. In metagenomic sequences, the amount of information correlated with the number of matches found by comparison to sequence databases. A sequence with more information (higher uncertainty) has a higher probability of being significantly similar to other sequences in the database. Measuring uncertainty may be used for rapid screening for sequences with matches in available database, prioritizing computational resources, and indicating which sequences with no known similarities are likely to be important for more detailed analysis.

  11. PROTOCOL FOR EXTRACTION OF BACTERIAL METAGENOME DNA TO PRAWN Macrobrachium carcinus L

    Directory of Open Access Journals (Sweden)

    J U González de la Cruz

    2011-07-01

    Full Text Available In this work we adapted a protocol for the extraction of metagenomic DNA (ADNmg bacteria in the digestive system (intestines, stomach and hepatopancreas of Macrobrachium carcinus L., with reference to the method of extracting bacterial DNA from soils and sediments (Rojas-Herrera et al., 2008. This methodology consisted of enzymatic, physics, mechanics and chemistry after a series of tests was abolished enzymatic lysis. However, the success ADNmg extraction was influenced mainly by the preparation of the samples, in particular the hepatopancreas, where it was necessary to remove the fat by thermal shock temperature and phase separation by centrifugation with the sample frozen.The effectiveness of isolated DNA fragmentation was verified by gel electrophoresis in denaturing gradient (DGGE after amplification with universal primers. In general, it had a low diversity (19 phylotypes between the different organs analyzed of 13.5 ± 1 (intestines to 11.7 ± 0.96 (stomach. The Shannon-Weaver index (2.45, Simpsons (10.88 and equity (0972 obtained from the digitization of the image of the gel, suggested that the phylotypes that form the gut microflora M. carcinus, is distributed unevenly between the different organs analyzed.

  12. Current and future resources for functional metagenomics

    Directory of Open Access Journals (Sweden)

    Kathy Nguyen Lam

    2015-10-01

    Full Text Available Functional metagenomics is a powerful experimental approach for studying gene function, starting from the extracted DNA of mixed microbial populations. A functional approach relies on the construction and screening of metagenomic libraries – physical libraries that contain DNA cloned from environmental metagenomes. The information obtained from functional metagenomics can help in future annotations of gene function and serve as a complement to sequence-based metagenomics. In this Perspective, we begin by summarizing the technical challenges of constructing metagenomic libraries and emphasize their value as resources. We then discuss libraries constructed using the popular cloning vector, pCC1FOS, and highlight the strengths and shortcomings of this system, alongside possible strategies to maximize existing pCC1FOS-based libraries by screening in diverse hosts. Finally, we discuss the known bias of libraries constructed from human gut and marine water samples, present results that suggest bias may also occur for soil libraries, and consider factors that bias metagenomic libraries in general. We anticipate that discussion of current resources and limitations will advance tools and technologies for functional metagenomics research.

  13. Challenges in Managing Information Extraction

    Science.gov (United States)

    Shen, Warren H.

    2009-01-01

    This dissertation studies information extraction (IE), the problem of extracting structured information from unstructured data. Example IE tasks include extracting person names from news articles, product information from e-commerce Web pages, street addresses from emails, and names of emerging music bands from blogs. IE is all increasingly…

  14. Evaluation of methods for the concentration and extraction of viruses from sewage in the context of metagenomic sequencing

    DEFF Research Database (Denmark)

    Hjelmsø, Mathis Hjort; Hellmér, Maria; Fernandez-Cassi, Xavier

    2017-01-01

    Viral sewage metagenomics is a novel field of study used for surveillance, epidemiological studies, and evaluation of waste water treatment efficiency. In raw sewage human waste is mixed with household, industrial and drainage water, and virus particles are, therefore, only found in low concentra......Viral sewage metagenomics is a novel field of study used for surveillance, epidemiological studies, and evaluation of waste water treatment efficiency. In raw sewage human waste is mixed with household, industrial and drainage water, and virus particles are, therefore, only found in low...... ways employing a wide range of viral concentration and extraction procedures. However, there is limited knowledge of the efficacy and inherent biases associated with these methods in respect to viral sewage metagenomics, hampering the development of this field. By the use of next generation sequencing...... this study aimed to evaluate the efficiency of four commonly applied viral concentrations techniques (precipitation with polyethylene glycol, organic flocculation with skim milk, monolithic adsorption filtration and glass wool filtration) and extraction methods (Nucleospin RNA XS, QIAamp Viral RNA Mini Kit...

  15. Scenario Customization for Information Extraction

    National Research Council Canada - National Science Library

    Yangarber, Roman

    2001-01-01

    Information Extraction (IE) is an emerging NLP technology, whose function is to process unstructured, natural language text, to locate specific pieces of information, or facts, in the text, and to use these facts to fill a database...

  16. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea

    Energy Technology Data Exchange (ETDEWEB)

    Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas; Harmon-Smith, Miranda; Doud, Devin; Reddy, T. B. K.; Schulz, Frederik; Jarett, Jessica; Rivers, Adam R.; Eloe-Fadrosh, Emiley A.; Tringe, Susannah G.; Ivanova, Natalia N.; Copeland, Alex; Clum, Alicia; Becraft, Eric D.; Malmstrom, Rex R.; Birren, Bruce; Podar, Mircea; Bork, Peer; Weinstock, George M.; Garrity, George M.; Dodsworth, Jeremy A.; Yooseph, Shibu; Sutton, Granger; Glöckner, Frank O.; Gilbert, Jack A.; Nelson, William C.; Hallam, Steven J.; Jungbluth, Sean P.; Ettema, Thijs J. G.; Tighe, Scott; Konstantinidis, Konstantinos T.; Liu, Wen-Tso; Baker, Brett J.; Rattei, Thomas; Eisen, Jonathan A.; Hedlund, Brian; McMahon, Katherine D.; Fierer, Noah; Knight, Rob; Finn, Rob; Cochrane, Guy; Karsch-Mizrachi, Ilene; Tyson, Gene W.; Rinke, Christian; Kyrpides, Nikos C.; Schriml, Lynn; Garrity, George M.; Hugenholtz, Philip; Sutton, Granger; Yilmaz, Pelin; Meyer, Folker; Glöckner, Frank O.; Gilbert, Jack A.; Knight, Rob; Finn, Rob; Cochrane, Guy; Karsch-Mizrachi, Ilene; Lapidus, Alla; Meyer, Folker; Yilmaz, Pelin; Parks, Donovan H.; Eren, A. M.; Schriml, Lynn; Banfield, Jillian F.; Hugenholtz, Philip; Woyke, Tanja

    2017-08-08

    We present two standards developed by the Genomic Standards Consortium (GSC) for reporting bacterial and archaeal genome sequences. Both are extensions of the Minimum Information about Any (x) Sequence (MIxS). The standards are the Minimum Information about a Single Amplified Genome (MISAG) and the Minimum Information about a Metagenome-Assembled Genome (MIMAG), including, but not limited to, assembly quality, and estimates of genome completeness and contamination. These standards can be used in combination with other GSC checklists, including the Minimum Information about a Genome Sequence (MIGS), Minimum Information about a Metagenomic Sequence (MIMS), and Minimum Information about a Marker Gene Sequence (MIMARKS). Community-wide adoption of MISAG and MIMAG will facilitate more robust comparative genomic analyses of bacterial and archaeal diversity.

  17. Extracting useful information from images

    DEFF Research Database (Denmark)

    Kucheryavskiy, Sergey

    2011-01-01

    The paper presents an overview of methods for extracting useful information from digital images. It covers various approaches that utilized different properties of images, like intensity distribution, spatial frequencies content and several others. A few case studies including isotropic and heter......The paper presents an overview of methods for extracting useful information from digital images. It covers various approaches that utilized different properties of images, like intensity distribution, spatial frequencies content and several others. A few case studies including isotropic...

  18. Comparison of Boiling and Robotics Automation Method in DNA Extraction for Metagenomic Sequencing of Human Oral Microbes.

    Science.gov (United States)

    Yamagishi, Junya; Sato, Yukuto; Shinozaki, Natsuko; Ye, Bin; Tsuboi, Akito; Nagasaki, Masao; Yamashita, Riu

    2016-01-01

    The rapid improvement of next-generation sequencing performance now enables us to analyze huge sample sets with more than ten thousand specimens. However, DNA extraction can still be a limiting step in such metagenomic approaches. In this study, we analyzed human oral microbes to compare the performance of three DNA extraction methods: PowerSoil (a method widely used in this field), QIAsymphony (a robotics method), and a simple boiling method. Dental plaque was initially collected from three volunteers in the pilot study and then expanded to 12 volunteers in the follow-up study. Bacterial flora was estimated by sequencing the V4 region of 16S rRNA following species-level profiling. Our results indicate that the efficiency of PowerSoil and QIAsymphony was comparable to the boiling method. Therefore, the boiling method may be a promising alternative because of its simplicity, cost effectiveness, and short handling time. Moreover, this method was reliable for estimating bacterial species and could be used in the future to examine the correlation between oral flora and health status. Despite this, differences in the efficiency of DNA extraction for various bacterial species were observed among the three methods. Based on these findings, there is no "gold standard" for DNA extraction. In future, we suggest that the DNA extraction method should be selected on a case-by-case basis considering the aims and specimens of the study.

  19. Comparison of Boiling and Robotics Automation Method in DNA Extraction for Metagenomic Sequencing of Human Oral Microbes.

    Directory of Open Access Journals (Sweden)

    Junya Yamagishi

    Full Text Available The rapid improvement of next-generation sequencing performance now enables us to analyze huge sample sets with more than ten thousand specimens. However, DNA extraction can still be a limiting step in such metagenomic approaches. In this study, we analyzed human oral microbes to compare the performance of three DNA extraction methods: PowerSoil (a method widely used in this field, QIAsymphony (a robotics method, and a simple boiling method. Dental plaque was initially collected from three volunteers in the pilot study and then expanded to 12 volunteers in the follow-up study. Bacterial flora was estimated by sequencing the V4 region of 16S rRNA following species-level profiling. Our results indicate that the efficiency of PowerSoil and QIAsymphony was comparable to the boiling method. Therefore, the boiling method may be a promising alternative because of its simplicity, cost effectiveness, and short handling time. Moreover, this method was reliable for estimating bacterial species and could be used in the future to examine the correlation between oral flora and health status. Despite this, differences in the efficiency of DNA extraction for various bacterial species were observed among the three methods. Based on these findings, there is no "gold standard" for DNA extraction. In future, we suggest that the DNA extraction method should be selected on a case-by-case basis considering the aims and specimens of the study.

  20. MetaLIMS, a simple open-source laboratory information management system for small metagenomic labs.

    Science.gov (United States)

    Heinle, Cassie Elizabeth; Gaultier, Nicolas Paul Eugène; Miller, Dana; Purbojati, Rikky Wenang; Lauro, Federico M

    2017-06-01

    As the cost of sequencing continues to fall, smaller groups increasingly initiate and manage larger sequencing projects and take on the complexity of data storage for high volumes of samples. This has created a need for low-cost laboratory information management systems (LIMS) that contain flexible fields to accommodate the unique nature of individual labs. Many labs do not have a dedicated information technology position, so LIMS must also be easy to setup and maintain with minimal technical proficiency. MetaLIMS is a free and open-source web-based application available via GitHub. The focus of MetaLIMS is to store sample metadata prior to sequencing and analysis pipelines. Initially designed for environmental metagenomics labs, in addition to storing generic sample collection information and DNA/RNA processing information, the user can also add fields specific to the user's lab. MetaLIMS can also produce a basic sequencing submission form compatible with the proprietary Clarity LIMS system used by some sequencing facilities. To help ease the technical burden associated with web deployment, MetaLIMS options the use of commercial web hosting combined with MetaLIMS bash scripts for ease of setup. MetaLIMS overcomes key challenges common in LIMS by giving labs access to a low-cost and open-source tool that also has the flexibility to meet individual lab needs and an option for easy deployment. By making the web application open source and hosting it on GitHub, we hope to encourage the community to build upon MetaLIMS, making it more robust and tailored to the needs of more researchers. © The Authors 2017. Published by Oxford University Press.

  1. BioCreative Workshops for DOE Genome Sciences: Text Mining for Metagenomics

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Cathy H. [Univ. of Delaware, Newark, DE (United States). Center for Bioinformatics and Computational Biology; Hirschman, Lynette [The MITRE Corporation, Bedford, MA (United States)

    2016-10-29

    The objective of this project was to host BioCreative workshops to define and develop text mining tasks to meet the needs of the Genome Sciences community, focusing on metadata information extraction in metagenomics. Following the successful introduction of metagenomics at the BioCreative IV workshop, members of the metagenomics community and BioCreative communities continued discussion to identify candidate topics for a BioCreative metagenomics track for BioCreative V. Of particular interest was the capture of environmental and isolation source information from text. The outcome was to form a “community of interest” around work on the interactive EXTRACT system, which supported interactive tagging of environmental and species data. This experiment is included in the BioCreative V virtual issue of Database. In addition, there was broad participation by members of the metagenomics community in the panels held at BioCreative V, leading to valuable exchanges between the text mining developers and members of the metagenomics research community. These exchanges are reflected in a number of the overview and perspective pieces also being captured in the BioCreative V virtual issue. Overall, this conversation has exposed the metagenomics researchers to the possibilities of text mining, and educated the text mining developers to the specific needs of the metagenomics community.

  2. A primer on metagenomics.

    Directory of Open Access Journals (Sweden)

    John C Wooley

    2010-02-01

    Full Text Available Metagenomics is a discipline that enables the genomic study of uncultured microorganisms. Faster, cheaper sequencing technologies and the ability to sequence uncultured microbes sampled directly from their habitats are expanding and transforming our view of the microbial world. Distilling meaningful information from the millions of new genomic sequences presents a serious challenge to bioinformaticians. In cultured microbes, the genomic data come from a single clone, making sequence assembly and annotation tractable. In metagenomics, the data come from heterogeneous microbial communities, sometimes containing more than 10,000 species, with the sequence data being noisy and partial. From sampling, to assembly, to gene calling and function prediction, bioinformatics faces new demands in interpreting voluminous, noisy, and often partial sequence data. Although metagenomics is a relative newcomer to science, the past few years have seen an explosion in computational methods applied to metagenomic-based research. It is therefore not within the scope of this article to provide an exhaustive review. Rather, we provide here a concise yet comprehensive introduction to the current computational requirements presented by metagenomics, and review the recent progress made. We also note whether there is software that implements any of the methods presented here, and briefly review its utility. Nevertheless, it would be useful if readers of this article would avail themselves of the comment section provided by this journal, and relate their own experiences. Finally, the last section of this article provides a few representative studies illustrating different facets of recent scientific discoveries made using metagenomics.

  3. Extracting information from multiplex networks

    Science.gov (United States)

    Iacovacci, Jacopo; Bianconi, Ginestra

    2016-06-01

    Multiplex networks are generalized network structures that are able to describe networks in which the same set of nodes are connected by links that have different connotations. Multiplex networks are ubiquitous since they describe social, financial, engineering, and biological networks as well. Extending our ability to analyze complex networks to multiplex network structures increases greatly the level of information that is possible to extract from big data. For these reasons, characterizing the centrality of nodes in multiplex networks and finding new ways to solve challenging inference problems defined on multiplex networks are fundamental questions of network science. In this paper, we discuss the relevance of the Multiplex PageRank algorithm for measuring the centrality of nodes in multilayer networks and we characterize the utility of the recently introduced indicator function Θ ˜ S for describing their mesoscale organization and community structure. As working examples for studying these measures, we consider three multiplex network datasets coming for social science.

  4. An Improved Method for High Quality Metagenomics DNA Extraction from Human and Environmental Samples

    DEFF Research Database (Denmark)

    Bag, Satyabrata; Saha, Bipasa; Mehta, Ojasvi

    2016-01-01

    and human origin samples. We introduced a combination of physical, chemical and mechanical lysis methods for proper lysis of microbial inhabitants. The community microbial DNA was precipitated by using salt and organic solvent. Both the quality and quantity of isolated DNA was compared with the existing...... methodologies and the supremacy of our method was confirmed. Maximum recovery of genomic DNA in the absence of substantial amount of impurities made the method convenient for nucleic acid extraction. The nucleic acids obtained using this method are suitable for different downstream applications. This improved...

  5. Microbial food safety: Potential of DNA extraction methods for use in diagnostic metagenomics

    DEFF Research Database (Denmark)

    Josefsen, Mathilde Hasseldam; Andersen, Sandra Christine; Christensen, Julia

    2015-01-01

    ) yielding protocols. The PowerLyzer PowerSoil DNA Isolation Kit performed significantly better than all other protocols tested. Selected protocols were modified, i.e., extended heating and homogenization, resulting in increased yields of total DNA. For QIAamp Fast DNA Stool Mini Kit (Qiagen) a 7-fold...... of the protocols to extract DNA was observed. The highest DNA yield was obtained with the PowerLyzer PowerSoil DNA Isolation Kit, whereas the FastDNA SPIN Kit for Feces (MP Biomedicals) resulted in the highest amount of PCR-amplifiable C. jejuni DNA....

  6. Transductive Pattern Learning for Information Extraction

    National Research Council Canada - National Science Library

    McLernon, Brian; Kushmerick, Nicholas

    2006-01-01

    .... We present TPLEX, a semi-supervised learning algorithm for information extraction that can acquire extraction patterns from a small amount of labelled text in conjunction with a large amount of unlabelled text...

  7. Information Extraction for Social Media

    NARCIS (Netherlands)

    Habib, M. B.; Keulen, M. van

    2014-01-01

    The rapid growth in IT in the last two decades has led to a growth in the amount of information available online. A new style for sharing information is social media. Social media is a continuously instantly updated source of information. In this position paper, we propose a framework for

  8. Metagenomic Analysis of Dairy Bacteriophages

    DEFF Research Database (Denmark)

    Muhammed, Musemma K.; Kot, Witold; Neve, Horst

    2017-01-01

    Despite their huge potential for characterizing the biodiversity of phages, metagenomic studies are currently not available for dairy bacteriophages, partly due to the lack of a standard procedure for phage extraction. We optimized an extraction method that allows to remove the bulk protein from...

  9. Information Extraction From Chemical Patents

    Directory of Open Access Journals (Sweden)

    Sandra Bergmann

    2012-01-01

    Full Text Available The development of new chemicals or pharmaceuticals is preceded by an indepth analysis of published patents in this field. This information retrieval is a costly and time inefficient step when done by a human reader, yet it is mandatory for potential success of an investment. The goal of the research project UIMA-HPC is to automate and hence speed-up the process of knowledge mining about patents. Multi-threaded analysis engines, developed according to UIMA (Unstructured Information Management Architecture standards, process texts and images in thousands of documents in parallel. UNICORE (UNiform Interface to COmputing Resources workflow control structures make it possible to dynamically allocate resources for every given task to gain best cpu-time/realtime ratios in an HPC environment.

  10. Extraction of inhibitor-free metagenomic DNA from polluted sediments, compatible with molecular diversity analysis using adsorption and ion-exchange treatments.

    Science.gov (United States)

    Desai, Chirayu; Madamwar, Datta

    2007-03-01

    PCR inhibitor-free metagenomic DNA of high quality and high yield was extracted from highly polluted sediments using a simple remediation strategy of adsorption and ion-exchange chromatography. Extraction procedure was optimized with series of steps, which involved gentle mechanical lysis, treatment with powdered activated charcoal (PAC) and ion-exchange chromatography with amberlite resin. Quality of the extracted DNA for molecular diversity analysis was tested by amplifying bacterial 16S rDNA (16S rRNA gene) with eubacterial specific universal primers (8f and 1492r), cloning of the amplified 16S rDNA and ARDRA (amplified rDNA restriction analysis) of the 16S rDNA clones. The presence of discrete differences in ARDRA banding profiles provided evidence for expediency of the DNA extraction protocol in molecular diversity studies. A comparison of the optimized protocol with commercial Ultraclean Soil DNA isolation kit suggested that method described in this report would be more efficient in removing metallic and organic inhibitors, from polluted sediment samples.

  11. Exploration of noncoding sequences in metagenomes.

    Directory of Open Access Journals (Sweden)

    Fabián Tobar-Tosse

    Full Text Available Environment-dependent genomic features have been defined for different metagenomes, whose genes and their associated processes are related to specific environments. Identification of ORFs and their functional categories are the most common methods for association between functional and environmental features. However, this analysis based on finding ORFs misses noncoding sequences and, therefore, some metagenome regulatory or structural information could be discarded. In this work we analyzed 23 whole metagenomes, including coding and noncoding sequences using the following sequence patterns: (G+C content, Codon Usage (Cd, Trinucleotide Usage (Tn, and functional assignments for ORF prediction. Herein, we present evidence of a high proportion of noncoding sequences discarded in common similarity-based methods in metagenomics, and the kind of relevant information present in those. We found a high density of trinucleotide repeat sequences (TRS in noncoding sequences, with a regulatory and adaptive function for metagenome communities. We present associations between trinucleotide values and gene function, where metagenome clustering correlate with microorganism adaptations and kinds of metagenomes. We propose here that noncoding sequences have relevant information to describe metagenomes that could be considered in a whole metagenome analysis in order to improve their organization, classification protocols, and their relation with the environment.

  12. Marine metagenomics as a source for bioprospecting

    KAUST Repository

    Kodzius, Rimantas

    2015-08-12

    This review summarizes usage of genome-editing technologies for metagenomic studies; these studies are used to retrieve and modify valuable microorganisms for production, particularly in marine metagenomics. Organisms may be cultivable or uncultivable. Metagenomics is providing especially valuable information for uncultivable samples. The novel genes, pathways and genomes can be deducted. Therefore, metagenomics, particularly genome engineering and system biology, allows for the enhancement of biological and chemical producers and the creation of novel bioresources. With natural resources rapidly depleting, genomics may be an effective way to efficiently produce quantities of known and novel foods, livestock feed, fuels, pharmaceuticals and fine or bulk chemicals.

  13. An integrated metagenome and -proteome analysis of the microbial community residing in a biogas production plant.

    Science.gov (United States)

    Ortseifen, Vera; Stolze, Yvonne; Maus, Irena; Sczyrba, Alexander; Bremges, Andreas; Albaum, Stefan P; Jaenicke, Sebastian; Fracowiak, Jochen; Pühler, Alfred; Schlüter, Andreas

    2016-08-10

    To study the metaproteome of a biogas-producing microbial community, fermentation samples were taken from an agricultural biogas plant for microbial cell and protein extraction and corresponding metagenome analyses. Based on metagenome sequence data, taxonomic community profiling was performed to elucidate the composition of bacterial and archaeal sub-communities. The community's cytosolic metaproteome was represented in a 2D-PAGE approach. Metaproteome databases for protein identification were compiled based on the assembled metagenome sequence dataset for the biogas plant analyzed and non-corresponding biogas metagenomes. Protein identification results revealed that the corresponding biogas protein database facilitated the highest identification rate followed by other biogas-specific databases, whereas common public databases yielded insufficient identification rates. Proteins of the biogas microbiome identified as highly abundant were assigned to the pathways involved in methanogenesis, transport and carbon metabolism. Moreover, the integrated metagenome/-proteome approach enabled the examination of genetic-context information for genes encoding identified proteins by studying neighboring genes on the corresponding contig. Exemplarily, this approach led to the identification of a Methanoculleus sp. contig encoding 16 methanogenesis-related gene products, three of which were also detected as abundant proteins within the community's metaproteome. Thus, metagenome contigs provide additional information on the genetic environment of identified abundant proteins. Copyright © 2016 Elsevier B.V. All rights reserved.

  14. Extracting Information from Multimedia Meeting Collections

    OpenAIRE

    Gatica-Perez, Daniel; Zhang, Dong; Bengio, Samy

    2005-01-01

    Multimedia meeting collections, composed of unedited audio and video streams, handwritten notes, slides, and electronic documents that jointly constitute a raw record of complex human interaction processes in the workplace, have attracted interest due to the increasing feasibility of recording them in large quantities, by the opportunities for information access and retrieval applications derived from the automatic extraction of relevant meeting information, and by the challenges that the ext...

  15. DKIE: Open Source Information Extraction for Danish

    DEFF Research Database (Denmark)

    Derczynski, Leon; Field, Camilla Vilhelmsen; Bøgh, Kenneth Sejdenfaden

    2014-01-01

    Danish is a major Scandinavian language spoken daily by around six million people. However, it lacks a unified, open set of NLP tools. This demonstration will introduce DKIE, an extensible open-source toolkit for processing Danish text. We implement an information extraction architecture for Danish...

  16. Exploring neighborhoods in the metagenome universe.

    Science.gov (United States)

    Aßhauer, Kathrin P; Klingenberg, Heiner; Lingner, Thomas; Meinicke, Peter

    2014-07-14

    The variety of metagenomes in current databases provides a rapidly growing source of information for comparative studies. However, the quantity and quality of supplementary metadata is still lagging behind. It is therefore important to be able to identify related metagenomes by means of the available sequence data alone. We have studied efficient sequence-based methods for large-scale identification of similar metagenomes within a database retrieval context. In a broad comparison of different profiling methods we found that vector-based distance measures are well-suitable for the detection of metagenomic neighbors. Our evaluation on more than 1700 publicly available metagenomes indicates that for a query metagenome from a particular habitat on average nine out of ten nearest neighbors represent the same habitat category independent of the utilized profiling method or distance measure. While for well-defined labels a neighborhood accuracy of 100% can be achieved, in general the neighbor detection is severely affected by a natural overlap of manually annotated categories. In addition, we present results of a novel visualization method that is able to reflect the similarity of metagenomes in a 2D scatter plot. The visualization method shows a similarly high accuracy in the reduced space as compared with the high-dimensional profile space. Our study suggests that for inspection of metagenome neighborhoods the profiling methods and distance measures can be chosen to provide a convenient interpretation of results in terms of the underlying features. Furthermore, supplementary metadata of metagenome samples in the future needs to comply with readily available ontologies for fine-grained and standardized annotation. To make profile-based k-nearest-neighbor search and the 2D-visualization of the metagenome universe available to the research community, we included the proposed methods in our CoMet-Universe server for comparative metagenome analysis.

  17. Unsupervised information extraction by text segmentation

    CERN Document Server

    Cortez, Eli

    2013-01-01

    A new unsupervised approach to the problem of Information Extraction by Text Segmentation (IETS) is proposed, implemented and evaluated herein. The authors' approach relies on information available on pre-existing data to learn how to associate segments in the input string with attributes of a given domain relying on a very effective set of content-based features. The effectiveness of the content-based features is also exploited to directly learn from test data structure-based features, with no previous human-driven training, a feature unique to the presented approach. Based on the approach, a

  18. Extracting the information backbone in online system.

    Science.gov (United States)

    Zhang, Qian-Ming; Zeng, An; Shang, Ming-Sheng

    2013-01-01

    Information overload is a serious problem in modern society and many solutions such as recommender system have been proposed to filter out irrelevant information. In the literature, researchers have been mainly dedicated to improving the recommendation performance (accuracy and diversity) of the algorithms while they have overlooked the influence of topology of the online user-object bipartite networks. In this paper, we find that some information provided by the bipartite networks is not only redundant but also misleading. With such "less can be more" feature, we design some algorithms to improve the recommendation performance by eliminating some links from the original networks. Moreover, we propose a hybrid method combining the time-aware and topology-aware link removal algorithms to extract the backbone which contains the essential information for the recommender systems. From the practical point of view, our method can improve the performance and reduce the computational time of the recommendation system, thus improving both of their effectiveness and efficiency.

  19. Metaproteomics: extracting and mining proteome information to characterize metabolic activities in microbial communities.

    Science.gov (United States)

    Abraham, Paul E; Giannone, Richard J; Xiong, Weili; Hettich, Robert L

    2014-06-17

    Contemporary microbial ecology studies usually employ one or more "omics" approaches to investigate the structure and function of microbial communities. Among these, metaproteomics aims to characterize the metabolic activities of the microbial membership, providing a direct link between the genetic potential and functional metabolism. The successful deployment of metaproteomics research depends on the integration of high-quality experimental and bioinformatic techniques for uncovering the metabolic activities of a microbial community in a way that is complementary to other "meta-omic" approaches. The essential, quality-defining informatics steps in metaproteomics investigations are: (1) construction of the metagenome, (2) functional annotation of predicted protein-coding genes, (3) protein database searching, (4) protein inference, and (5) extraction of metabolic information. In this article, we provide an overview of current bioinformatic approaches and software implementations in metaproteome studies in order to highlight the key considerations needed for successful implementation of this powerful community-biology tool. Copyright © 2014 John Wiley & Sons, Inc.

  20. Information extraction from muon radiography data

    International Nuclear Information System (INIS)

    Borozdin, K.N.; Asaki, T.J.; Chartrand, R.; Hengartner, N.W.; Hogan, G.E.; Morris, C.L.; Priedhorsky, W.C.; Schirato, R.C.; Schultz, L.J.; Sottile, M.J.; Vixie, K.R.; Wohlberg, B.E.; Blanpied, G.

    2004-01-01

    Scattering muon radiography was proposed recently as a technique of detection and 3-d imaging for dense high-Z objects. High-energy cosmic ray muons are deflected in matter in the process of multiple Coulomb scattering. By measuring the deflection angles we are able to reconstruct the configuration of high-Z material in the object. We discuss the methods for information extraction from muon radiography data. Tomographic methods widely used in medical images have been applied to a specific muon radiography information source. Alternative simple technique based on the counting of high-scattered muons in the voxels seems to be efficient in many simulated scenes. SVM-based classifiers and clustering algorithms may allow detection of compact high-Z object without full image reconstruction. The efficiency of muon radiography can be increased using additional informational sources, such as momentum estimation, stopping power measurement, and detection of muonic atom emission.

  1. Metagenomic Analysis of Dairy Bacteriophages: Extraction Method and Pilot Study on Whey Samples Derived from Using Undefined and Defined Mesophilic Starter Cultures.

    Science.gov (United States)

    Muhammed, Musemma K; Kot, Witold; Neve, Horst; Mahony, Jennifer; Castro-Mejía, Josué L; Krych, Lukasz; Hansen, Lars H; Nielsen, Dennis S; Sørensen, Søren J; Heller, Knut J; van Sinderen, Douwe; Vogensen, Finn K

    2017-10-01

    Despite being potentially highly useful for characterizing the biodiversity of phages, metagenomic studies are currently not available for dairy bacteriophages, partly due to the lack of a standard procedure for phage extraction. We optimized an extraction method that allows the removal of the bulk protein from whey and milk samples with losses of less than 50% of spiked phages. The protocol was applied to extract phages from whey in order to test the notion that members of Lactococcus lactis 936 (now Sk1virus ), P335, c2 (now C2virus ) and Leuconostoc phage groups are the most frequently encountered in the dairy environment. The relative abundance and diversity of phages in eight and four whey mixtures from dairies using undefined mesophilic mixed-strain cultures containing Lactococcus lactis subsp. lactis biovar diacetylactis and Leuconostoc species (i.e., DL starter cultures) and defined cultures, respectively, were assessed. Results obtained from transmission electron microscopy and high-throughput sequence analyses revealed the dominance of Lc. lactis 936 phages (order Caudovirales , family Siphoviridae ) in dairies using undefined DL starter cultures and Lc. lactis c2 phages (order Caudovirales , family Siphoviridae ) in dairies using defined cultures. The 936 and Leuconostoc phages demonstrated limited diversity. Possible coinduction of temperate P335 prophages and satellite phages in one of the whey mixtures was also observed. IMPORTANCE The method optimized in this study could provide an important basis for understanding the dynamics of the phage community (abundance, development, diversity, evolution, etc.) in dairies with different sizes, locations, and production strategies. It may also enable the discovery of previously unknown phages, which is crucial for the development of rapid molecular biology-based methods for phage burden surveillance systems. The dominance of only a few phage groups in the dairy environment signifies the depth of knowledge

  2. Comparative fecal metagenomics unveils unique functional capacity of the swine gut

    Directory of Open Access Journals (Sweden)

    Martinson John

    2011-05-01

    Full Text Available Abstract Background Uncovering the taxonomic composition and functional capacity within the swine gut microbial consortia is of great importance to animal physiology and health as well as to food and water safety due to the presence of human pathogens in pig feces. Nonetheless, limited information on the functional diversity of the swine gut microbiome is available. Results Analysis of 637, 722 pyrosequencing reads (130 megabases generated from Yorkshire pig fecal DNA extracts was performed to help better understand the microbial diversity and largely unknown functional capacity of the swine gut microbiome. Swine fecal metagenomic sequences were annotated using both MG-RAST and JGI IMG/M-ER pipelines. Taxonomic analysis of metagenomic reads indicated that swine fecal microbiomes were dominated by Firmicutes and Bacteroidetes phyla. At a finer phylogenetic resolution, Prevotella spp. dominated the swine fecal metagenome, while some genes associated with Treponema and Anareovibrio species were found to be exclusively within the pig fecal metagenomic sequences analyzed. Functional analysis revealed that carbohydrate metabolism was the most abundant SEED subsystem, representing 13% of the swine metagenome. Genes associated with stress, virulence, cell wall and cell capsule were also abundant. Virulence factors associated with antibiotic resistance genes with highest sequence homology to genes in Bacteroidetes, Clostridia, and Methanosarcina were numerous within the gene families unique to the swine fecal metagenomes. Other abundant proteins unique to the distal swine gut shared high sequence homology to putative carbohydrate membrane transporters. Conclusions The results from this metagenomic survey demonstrated the presence of genes associated with resistance to antibiotics and carbohydrate metabolism suggesting that the swine gut microbiome may be shaped by husbandry practices.

  3. Extracting the information backbone in online system.

    Directory of Open Access Journals (Sweden)

    Qian-Ming Zhang

    Full Text Available Information overload is a serious problem in modern society and many solutions such as recommender system have been proposed to filter out irrelevant information. In the literature, researchers have been mainly dedicated to improving the recommendation performance (accuracy and diversity of the algorithms while they have overlooked the influence of topology of the online user-object bipartite networks. In this paper, we find that some information provided by the bipartite networks is not only redundant but also misleading. With such "less can be more" feature, we design some algorithms to improve the recommendation performance by eliminating some links from the original networks. Moreover, we propose a hybrid method combining the time-aware and topology-aware link removal algorithms to extract the backbone which contains the essential information for the recommender systems. From the practical point of view, our method can improve the performance and reduce the computational time of the recommendation system, thus improving both of their effectiveness and efficiency.

  4. Extracting the Information Backbone in Online System

    Science.gov (United States)

    Zhang, Qian-Ming; Zeng, An; Shang, Ming-Sheng

    2013-01-01

    Information overload is a serious problem in modern society and many solutions such as recommender system have been proposed to filter out irrelevant information. In the literature, researchers have been mainly dedicated to improving the recommendation performance (accuracy and diversity) of the algorithms while they have overlooked the influence of topology of the online user-object bipartite networks. In this paper, we find that some information provided by the bipartite networks is not only redundant but also misleading. With such “less can be more” feature, we design some algorithms to improve the recommendation performance by eliminating some links from the original networks. Moreover, we propose a hybrid method combining the time-aware and topology-aware link removal algorithms to extract the backbone which contains the essential information for the recommender systems. From the practical point of view, our method can improve the performance and reduce the computational time of the recommendation system, thus improving both of their effectiveness and efficiency. PMID:23690946

  5. Chaotic spectra: How to extract dynamic information

    International Nuclear Information System (INIS)

    Taylor, H.S.; Gomez Llorente, J.M.; Zakrzewski, J.; Kulander, K.C.

    1988-10-01

    Nonlinear dynamics is applied to chaotic unassignable atomic and molecular spectra with the aim of extracting detailed information about regular dynamic motions that exist over short intervals of time. It is shown how this motion can be extracted from high resolution spectra by doing low resolution studies or by Fourier transforming limited regions of the spectrum. These motions mimic those of periodic orbits (PO) and are inserts into the dominant chaotic motion. Considering these inserts and the PO as a dynamically decoupled region of space, resonant scattering theory and stabilization methods enable us to compute ladders of resonant states which interact with the chaotic quasi-continuum computed in principle from basis sets placed off the PO. The interaction of the resonances with the quasicontinuum explains the low resolution spectra seen in such experiments. It also allows one to associate low resolution features with a particular PO. The motion on the PO thereby supplies the molecular movements whose quantization causes the low resolution spectra. Characteristic properties of the periodic orbit based resonances are discussed. The method is illustrated on the photoabsorption spectrum of the hydrogen atom in a strong magnetic field and on the photodissociation spectrum of H 3 + . Other molecular systems which are currently under investigation using this formalism are also mentioned. 53 refs., 10 figs., 2 tabs

  6. Extraction of quantifiable information from complex systems

    CERN Document Server

    Dahmen, Wolfgang; Griebel, Michael; Hackbusch, Wolfgang; Ritter, Klaus; Schneider, Reinhold; Schwab, Christoph; Yserentant, Harry

    2014-01-01

    In April 2007, the  Deutsche Forschungsgemeinschaft (DFG) approved the  Priority Program 1324 “Mathematical Methods for Extracting Quantifiable Information from Complex Systems.” This volume presents a comprehensive overview of the most important results obtained over the course of the program.   Mathematical models of complex systems provide the foundation for further technological developments in science, engineering and computational finance.  Motivated by the trend toward steadily increasing computer power, ever more realistic models have been developed in recent years. These models have also become increasingly complex, and their numerical treatment poses serious challenges.   Recent developments in mathematics suggest that, in the long run, much more powerful numerical solution strategies could be derived if the interconnections between the different fields of research were systematically exploited at a conceptual level. Accordingly, a deeper understanding of the mathematical foundations as w...

  7. Extraction of temporal information in functional MRI

    Science.gov (United States)

    Singh, M.; Sungkarat, W.; Jeong, Jeong-Won; Zhou, Yongxia

    2002-10-01

    The temporal resolution of functional MRI (fMRI) is limited by the shape of the haemodynamic response function (hrf) and the vascular architecture underlying the activated regions. Typically, the temporal resolution of fMRI is on the order of 1 s. We have developed a new data processing approach to extract temporal information on a pixel-by-pixel basis at the level of 100 ms from fMRI data. Instead of correlating or fitting the time-course of each pixel to a single reference function, which is the common practice in fMRI, we correlate each pixel's time-course to a series of reference functions that are shifted with respect to each other by 100 ms. The reference function yielding the highest correlation coefficient for a pixel is then used as a time marker for that pixel. A Monte Carlo simulation and experimental study of this approach were performed to estimate the temporal resolution as a function of signal-to-noise ratio (SNR) in the time-course of a pixel. Assuming a known and stationary hrf, the simulation and experimental studies suggest a lower limit in the temporal resolution of approximately 100 ms at an SNR of 3. The multireference function approach was also applied to extract timing information from an event-related motor movement study where the subjects flexed a finger on cue. The event was repeated 19 times with the event's presentation staggered to yield an approximately 100-ms temporal sampling of the haemodynamic response over the entire presentation cycle. The timing differences among different regions of the brain activated by the motor task were clearly visualized and quantified by this method. The results suggest that it is possible to achieve a temporal resolution of /spl sim/200 ms in practice with this approach.

  8. Optical Aperture Synthesis Object's Information Extracting Based on Wavelet Denoising

    International Nuclear Information System (INIS)

    Fan, W J; Lu, Y

    2006-01-01

    Wavelet denoising is studied to improve OAS(optical aperture synthesis) object's Fourier information extracting. Translation invariance wavelet denoising based on Donoho wavelet soft threshold denoising is researched to remove Pseudo-Gibbs in wavelet soft threshold image. OAS object's information extracting based on translation invariance wavelet denoising is studied. The study shows that wavelet threshold denoising can improve the precision and the repetition of object's information extracting from interferogram, and the translation invariance wavelet denoising information extracting is better than soft threshold wavelet denoising information extracting

  9. Respiratory Information Extraction from Electrocardiogram Signals

    KAUST Repository

    Amin, Gamal El Din Fathy

    2010-12-01

    The Electrocardiogram (ECG) is a tool measuring the electrical activity of the heart, and it is extensively used for diagnosis and monitoring of heart diseases. The ECG signal reflects not only the heart activity but also many other physiological processes. The respiratory activity is a prominent process that affects the ECG signal due to the close proximity of the heart and the lungs. In this thesis, several methods for the extraction of respiratory process information from the ECG signal are presented. These methods allow an estimation of the lung volume and the lung pressure from the ECG signal. The potential benefit of this is to eliminate the corresponding sensors used to measure the respiration activity. A reduction of the number of sensors connected to patients will increase patients’ comfort and reduce the costs associated with healthcare. As a further result, the efficiency of diagnosing respirational disorders will increase since the respiration activity can be monitored with a common, widely available method. The developed methods can also improve the detection of respirational disorders that occur while patients are sleeping. Such disorders are commonly diagnosed in sleeping laboratories where the patients are connected to a number of different sensors. Any reduction of these sensors will result in a more natural sleeping environment for the patients and hence a higher sensitivity of the diagnosis.

  10. Abundance profiling of specific gene groups using precomputed gut metagenomes yields novel biological hypotheses.

    Directory of Open Access Journals (Sweden)

    Konstantin Yarygin

    Full Text Available The gut microbiota is essentially a multifunctional bioreactor within a human being. The exploration of its enormous metabolic potential provides insights into the mechanisms underlying microbial ecology and interactions with the host. The data obtained using "shotgun" metagenomics capture information about the whole spectrum of microbial functions. However, each new study presenting new sequencing data tends to extract only a little of the information concerning the metabolic potential and often omits specific functions. A meta-analysis of the available data with an emphasis on biomedically relevant gene groups can unveil new global trends in the gut microbiota. As a step toward the reuse of metagenomic data, we developed a method for the quantitative profiling of user-defined groups of genes in human gut metagenomes. This method is based on the quick analysis of a gene coverage matrix obtained by pre-mapping the metagenomic reads to a global gut microbial catalogue. The method was applied to profile the abundance of several gene groups related to antibiotic resistance, phages, biosynthesis clusters and carbohydrate degradation in 784 metagenomes from healthy populations worldwide and patients with inflammatory bowel diseases and obesity. We discovered country-wise functional specifics in gut resistome and virome compositions. The most distinct features of the disease microbiota were found for Crohn's disease, followed by ulcerative colitis and obesity. Profiling of the genes belonging to crAssphage showed that its abundance varied across the world populations and was not associated with clinical status. We demonstrated temporal resilience of crAssphage and the influence of the sample preparation protocol on its detected abundance. Our approach offers a convenient method to add value to accumulated "shotgun" metagenomic data by helping researchers state and assess novel biological hypotheses.

  11. Shotgun metagenomic data streams: surfing without fear

    Energy Technology Data Exchange (ETDEWEB)

    Berendzen, Joel R [Los Alamos National Laboratory

    2010-12-06

    Timely information about bio-threat prevalence, consequence, propagation, attribution, and mitigation is needed to support decision-making, both routinely and in a crisis. One DNA sequencer can stream 25 Gbp of information per day, but sampling strategies and analysis techniques are needed to turn raw sequencing power into actionable knowledge. Shotgun metagenomics can enable biosurveillance at the level of a single city, hospital, or airplane. Metagenomics characterizes viruses and bacteria from complex environments such as soil, air filters, or sewage. Unlike targeted-primer-based sequencing, shotgun methods are not blind to sequences that are truly novel, and they can measure absolute prevalence. Shotgun metagenomic sampling can be non-invasive, efficient, and inexpensive while being informative. We have developed analysis techniques for shotgun metagenomic sequencing that rely upon phylogenetic signature patterns. They work by indexing local sequence patterns in a manner similar to web search engines. Our methods are laptop-fast and favorable scaling properties ensure they will be sustainable as sequencing methods grow. We show examples of application to soil metagenomic samples.

  12. Sample-based XPath Ranking for Web Information Extraction

    NARCIS (Netherlands)

    Jundt, Oliver; van Keulen, Maurice

    Web information extraction typically relies on a wrapper, i.e., program code or a configuration that specifies how to extract some information from web pages at a specific website. Manually creating and maintaining wrappers is a cumbersome and error-prone task. It may even be prohibitive as some

  13. Metagenomics at Grass Roots

    Indian Academy of Sciences (India)

    CAMERA (Community Cyber-infrastructure for Advanced Mi- crobial Ecology .... Acidobacteria known to metabolize a variety of car- bon sources .... [7] J Nesme et al., Back to the future of soil metagenomics, Frontiers in Microbi- ology, Vol.7 ...

  14. Metagenomics at Grass Roots

    Indian Academy of Sciences (India)

    Metagenomics is a robust, interdisciplinary approach for studyingmicrobial community composition, function, and dynamics.It typically involves a core of molecular biology, microbiology,ecology, statistics, and computational biology. Excitingoutcomes anticipated from these studies include unravelingof complex interactions ...

  15. Ancient DNA analysis identifies marine mollusc shells as new metagenomic archives of the past

    DEFF Research Database (Denmark)

    Der Sarkissian, Clio; Pichereau, Vianney; Dupont, Catherine

    2017-01-01

    Marine mollusc shells enclose a wealth of information on coastal organisms and their environment. Their life history traits as well as (palaeo-) environmental conditions, including temperature, food availability, salinity and pollution, can be traced through the analysis of their shell (micro...... extraction, high-throughput shotgun DNA sequencing and metagenomic analyses to marine mollusc shells spanning the last ~7,000 years. We report successful DNA extraction from shells, including a variety of ancient specimens, and find that DNA recovery is highly dependent on their biomineral structure......, carbonate layer preservation and disease state. We demonstrate positive taxonomic identification of mollusc species using a combination of mitochondrial DNA genomes, barcodes, genome-scale data and metagenomic approaches. We also find shell biominerals to contain a diversity of microbial DNA from the marine...

  16. The Agent of extracting Internet Information with Lead Order

    Science.gov (United States)

    Mo, Zan; Huang, Chuliang; Liu, Aijun

    In order to carry out e-commerce better, advanced technologies to access business information are in need urgently. An agent is described to deal with the problems of extracting internet information that caused by the non-standard and skimble-scamble structure of Chinese websites. The agent designed includes three modules which respond to the process of extracting information separately. A method of HTTP tree and a kind of Lead algorithm is proposed to generate a lead order, with which the required web can be retrieved easily. How to transform the extracted information structuralized with natural language is also discussed.

  17. Interactive metagenomic visualization in a Web browser

    Directory of Open Access Journals (Sweden)

    Phillippy Adam M

    2011-09-01

    Full Text Available Abstract Background A critical output of metagenomic studies is the estimation of abundances of taxonomical or functional groups. The inherent uncertainty in assignments to these groups makes it important to consider both their hierarchical contexts and their prediction confidence. The current tools for visualizing metagenomic data, however, omit or distort quantitative hierarchical relationships and lack the facility for displaying secondary variables. Results Here we present Krona, a new visualization tool that allows intuitive exploration of relative abundances and confidences within the complex hierarchies of metagenomic classifications. Krona combines a variant of radial, space-filling displays with parametric coloring and interactive polar-coordinate zooming. The HTML5 and JavaScript implementation enables fully interactive charts that can be explored with any modern Web browser, without the need for installed software or plug-ins. This Web-based architecture also allows each chart to be an independent document, making them easy to share via e-mail or post to a standard Web server. To illustrate Krona's utility, we describe its application to various metagenomic data sets and its compatibility with popular metagenomic analysis tools. Conclusions Krona is both a powerful metagenomic visualization tool and a demonstration of the potential of HTML5 for highly accessible bioinformatic visualizations. Its rich and interactive displays facilitate more informed interpretations of metagenomic analyses, while its implementation as a browser-based application makes it extremely portable and easily adopted into existing analysis packages. Both the Krona rendering code and conversion tools are freely available under a BSD open-source license, and available from: http://krona.sourceforge.net.

  18. Cause Information Extraction from Financial Articles Concerning Business Performance

    Science.gov (United States)

    Sakai, Hiroyuki; Masuyama, Shigeru

    We propose a method of extracting cause information from Japanese financial articles concerning business performance. Our method acquires cause informtion, e. g. “_??__??__??__??__??__??__??__??__??__??_ (zidousya no uriage ga koutyou: Sales of cars were good)”. Cause information is useful for investors in selecting companies to invest. Our method extracts cause information as a form of causal expression by using statistical information and initial clue expressions automatically. Our method can extract causal expressions without predetermined patterns or complex rules given by hand, and is expected to be applied to other tasks for acquiring phrases that have a particular meaning not limited to cause information. We compared our method with our previous one originally proposed for extracting phrases concerning traffic accident causes and experimental results showed that our new method outperforms our previous one.

  19. Critical Assessment of Metagenome Interpretation

    DEFF Research Database (Denmark)

    Sczyrba, Alexander; Hofmann, Peter; Belmann, Peter

    2017-01-01

    Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchma...

  20. Metagenomics and Bioinformatics in Microbial Ecology: Current Status and Beyond.

    Science.gov (United States)

    Hiraoka, Satoshi; Yang, Ching-Chia; Iwasaki, Wataru

    2016-09-29

    Metagenomic approaches are now commonly used in microbial ecology to study microbial communities in more detail, including many strains that cannot be cultivated in the laboratory. Bioinformatic analyses make it possible to mine huge metagenomic datasets and discover general patterns that govern microbial ecosystems. However, the findings of typical metagenomic and bioinformatic analyses still do not completely describe the ecology and evolution of microbes in their environments. Most analyses still depend on straightforward sequence similarity searches against reference databases. We herein review the current state of metagenomics and bioinformatics in microbial ecology and discuss future directions for the field. New techniques will allow us to go beyond routine analyses and broaden our knowledge of microbial ecosystems. We need to enrich reference databases, promote platforms that enable meta- or comprehensive analyses of diverse metagenomic datasets, devise methods that utilize long-read sequence information, and develop more powerful bioinformatic methods to analyze data from diverse perspectives.

  1. FANTOM: Functional and taxonomic analysis of metagenomes

    Directory of Open Access Journals (Sweden)

    Sanli Kemal

    2013-02-01

    Full Text Available Abstract Background Interpretation of quantitative metagenomics data is important for our understanding of ecosystem functioning and assessing differences between various environmental samples. There is a need for an easy to use tool to explore the often complex metagenomics data in taxonomic and functional context. Results Here we introduce FANTOM, a tool that allows for exploratory and comparative analysis of metagenomics abundance data integrated with metadata information and biological databases. Importantly, FANTOM can make use of any hierarchical database and it comes supplied with NCBI taxonomic hierarchies as well as KEGG Orthology, COG, PFAM and TIGRFAM databases. Conclusions The software is implemented in Python, is platform independent, and is available at http://www.sysbio.se/Fantom.

  2. Can we replace curation with information extraction software?

    Science.gov (United States)

    Karp, Peter D

    2016-01-01

    Can we use programs for automated or semi-automated information extraction from scientific texts as practical alternatives to professional curation? I show that error rates of current information extraction programs are too high to replace professional curation today. Furthermore, current IEP programs extract single narrow slivers of information, such as individual protein interactions; they cannot extract the large breadth of information extracted by professional curators for databases such as EcoCyc. They also cannot arbitrate among conflicting statements in the literature as curators can. Therefore, funding agencies should not hobble the curation efforts of existing databases on the assumption that a problem that has stymied Artificial Intelligence researchers for more than 60 years will be solved tomorrow. Semi-automated extraction techniques appear to have significantly more potential based on a review of recent tools that enhance curator productivity. But a full cost-benefit analysis for these tools is lacking. Without such analysis it is possible to expend significant effort developing information-extraction tools that automate small parts of the overall curation workflow without achieving a significant decrease in curation costs.Database URL. © The Author(s) 2016. Published by Oxford University Press.

  3. Mining knowledge from text repositories using information extraction ...

    Indian Academy of Sciences (India)

    Information extraction (IE); text mining; text repositories; knowledge discovery from .... general purpose English words. However ... of precision and recall, as extensive experimentation is required due to lack of public tagged corpora. 4. Mining ...

  4. Mars Target Encyclopedia: Information Extraction for Planetary Science

    Science.gov (United States)

    Wagstaff, K. L.; Francis, R.; Gowda, T.; Lu, Y.; Riloff, E.; Singh, K.

    2017-06-01

    Mars surface targets / and published compositions / Seek and ye will find. We used text mining methods to extract information from LPSC abstracts about the composition of Mars surface targets. Users can search by element, mineral, or target.

  5. The functional potential of microbial communities in hydraulic fracturing source water and produced water from natural gas extraction characterized by metagenomic sequencing.

    Directory of Open Access Journals (Sweden)

    Arvind Murali Mohan

    Full Text Available Microbial activity in produced water from hydraulic fracturing operations can lead to undesired environmental impacts and increase gas production costs. However, the metabolic profile of these microbial communities is not well understood. Here, for the first time, we present results from a shotgun metagenome of microbial communities in both hydraulic fracturing source water and wastewater produced by hydraulic fracturing. Taxonomic analyses showed an increase in anaerobic/facultative anaerobic classes related to Clostridia, Gammaproteobacteria, Bacteroidia and Epsilonproteobacteria in produced water as compared to predominantly aerobic Alphaproteobacteria in the fracturing source water. The metabolic profile revealed a relative increase in genes responsible for carbohydrate metabolism, respiration, sporulation and dormancy, iron acquisition and metabolism, stress response and sulfur metabolism in the produced water samples. These results suggest that microbial communities in produced water have an increased genetic ability to handle stress, which has significant implications for produced water management, such as disinfection.

  6. Integrating Information Extraction Agents into a Tourism Recommender System

    Science.gov (United States)

    Esparcia, Sergio; Sánchez-Anguix, Víctor; Argente, Estefanía; García-Fornes, Ana; Julián, Vicente

    Recommender systems face some problems. On the one hand information needs to be maintained updated, which can result in a costly task if it is not performed automatically. On the other hand, it may be interesting to include third party services in the recommendation since they improve its quality. In this paper, we present an add-on for the Social-Net Tourism Recommender System that uses information extraction and natural language processing techniques in order to automatically extract and classify information from the Web. Its goal is to maintain the system updated and obtain information about third party services that are not offered by service providers inside the system.

  7. Addressing Information Proliferation: Applications of Information Extraction and Text Mining

    Science.gov (United States)

    Li, Jingjing

    2013-01-01

    The advent of the Internet and the ever-increasing capacity of storage media have made it easy to store, deliver, and share enormous volumes of data, leading to a proliferation of information on the Web, in online libraries, on news wires, and almost everywhere in our daily lives. Since our ability to process and absorb this information remains…

  8. Metagenomic Sequencing of an In Vitro-Simulated Microbial Community

    Energy Technology Data Exchange (ETDEWEB)

    Morgan, Jenna L.; Darling, Aaron E.; Eisen, Jonathan A.

    2009-12-01

    Background: Microbial life dominates the earth, but many species are difficult or even impossible to study under laboratory conditions. Sequencing DNA directly from the environment, a technique commonly referred to as metagenomics, is an important tool for cataloging microbial life. This culture-independent approach involves collecting samples that include microbes in them, extracting DNA from the samples, and sequencing the DNA. A sample may contain many different microorganisms, macroorganisms, and even free-floating environmental DNA. A fundamental challenge in metagenomics has been estimating the abundance of organisms in a sample based on the frequency with which the organism's DNA was observed in reads generated via DNA sequencing. Methodology/Principal Findings: We created mixtures of ten microbial species for which genome sequences are known. Each mixture contained an equal number of cells of each species. We then extracted DNA from the mixtures, sequenced the DNA, and measured the frequency with which genomic regions from each organism was observed in the sequenced DNA. We found that the observed frequency of reads mapping to each organism did not reflect the equal numbers of cells that were known to be included in each mixture. The relative organism abundances varied significantly depending on the DNA extraction and sequencing protocol utilized. Conclusions/Significance: We describe a new data resource for measuring the accuracy of metagenomic binning methods, created by in vitro-simulation of a metagenomic community. Our in vitro simulation can be used to complement previous in silico benchmark studies. In constructing a synthetic community and sequencing its metagenome, we encountered several sources of observation bias that likely affect most metagenomic experiments to date and present challenges for comparative metagenomic studies. DNA preparation methods have a particularly profound effect in our study, implying that samples prepared with

  9. Information extraction from multi-institutional radiology reports.

    Science.gov (United States)

    Hassanpour, Saeed; Langlotz, Curtis P

    2016-01-01

    The radiology report is the most important source of clinical imaging information. It documents critical information about the patient's health and the radiologist's interpretation of medical findings. It also communicates information to the referring physicians and records that information for future clinical and research use. Although efforts to structure some radiology report information through predefined templates are beginning to bear fruit, a large portion of radiology report information is entered in free text. The free text format is a major obstacle for rapid extraction and subsequent use of information by clinicians, researchers, and healthcare information systems. This difficulty is due to the ambiguity and subtlety of natural language, complexity of described images, and variations among different radiologists and healthcare organizations. As a result, radiology reports are used only once by the clinician who ordered the study and rarely are used again for research and data mining. In this work, machine learning techniques and a large multi-institutional radiology report repository are used to extract the semantics of the radiology report and overcome the barriers to the re-use of radiology report information in clinical research and other healthcare applications. We describe a machine learning system to annotate radiology reports and extract report contents according to an information model. This information model covers the majority of clinically significant contents in radiology reports and is applicable to a wide variety of radiology study types. Our automated approach uses discriminative sequence classifiers for named-entity recognition to extract and organize clinically significant terms and phrases consistent with the information model. We evaluated our information extraction system on 150 radiology reports from three major healthcare organizations and compared its results to a commonly used non-machine learning information extraction method. We

  10. Biotechnological applications of functional metagenomics in the food and pharmaceutical industries.

    Science.gov (United States)

    Coughlan, Laura M; Cotter, Paul D; Hill, Colin; Alvarez-Ordóñez, Avelino

    2015-01-01

    Microorganisms are found throughout nature, thriving in a vast range of environmental conditions. The majority of them are unculturable or difficult to culture by traditional methods. Metagenomics enables the study of all microorganisms, regardless of whether they can be cultured or not, through the analysis of genomic data obtained directly from an environmental sample, providing knowledge of the species present, and allowing the extraction of information regarding the functionality of microbial communities in their natural habitat. Function-based screenings, following the cloning and expression of metagenomic DNA in a heterologous host, can be applied to the discovery of novel proteins of industrial interest encoded by the genes of previously inaccessible microorganisms. Functional metagenomics has considerable potential in the food and pharmaceutical industries, where it can, for instance, aid (i) the identification of enzymes with desirable technological properties, capable of catalyzing novel reactions or replacing existing chemically synthesized catalysts which may be difficult or expensive to produce, and able to work under a wide range of environmental conditions encountered in food and pharmaceutical processing cycles including extreme conditions of temperature, pH, osmolarity, etc; (ii) the discovery of novel bioactives including antimicrobials active against microorganisms of concern both in food and medical settings; (iii) the investigation of industrial and societal issues such as antibiotic resistance development. This review article summarizes the state-of-the-art functional metagenomic methods available and discusses the potential of functional metagenomic approaches to mine as yet unexplored environments to discover novel genes with biotechnological application in the food and pharmaceutical industries.

  11. Biotechnological applications of functional metagenomics in the food and pharmaceutical industries

    Directory of Open Access Journals (Sweden)

    Laura M Coughlan

    2015-06-01

    Full Text Available Microorganisms are found throughout nature, thriving in a vast range of environmental conditions. The majority of them are unculturable or difficult to culture by traditional methods. Metagenomics enables the study of all microorganisms, regardless of whether they can be cultured or not, through the analysis of genomic data obtained directly from an environmental sample, providing knowledge of the species present and allowing the extraction of information regarding the functionality of microbial communities in their natural habitat. Function-based screenings, following the cloning and expression of metagenomic DNA in a heterologous host, can be applied to the discovery of novel proteins of industrial interest encoded by the genes of previously inaccessible microorganisms. Functional metagenomics has considerable potential in the food and pharmaceutical industries, where it can, for instance, aid (i the identification of enzymes with desirable technological properties, capable of catalysing novel reactions or replacing existing chemically synthesized catalysts which may be difficult or expensive to produce, and able to work under a wide range of environmental conditions encountered in food and pharmaceutical processing cycles including extreme conditions of temperature, pH, osmolarity, etc; (ii the discovery of novel bioactives including antimicrobials active against microorganisms of concern both in food and medical settings; (iii the investigation of industrial and societal issues such as antibiotic resistance development. This review article summarizes the state-of-the-art functional metagenomic methods available and discusses the potential of functional metagenomic approaches to mine as yet unexplored environments to discover novel genes with biotechnological application in the food and pharmaceutical industries.

  12. Laboratory procedures to generate viral metagenomes.

    Science.gov (United States)

    Thurber, Rebecca V; Haynes, Matthew; Breitbart, Mya; Wegley, Linda; Rohwer, Forest

    2009-01-01

    This collection of laboratory protocols describes the steps to collect viruses from various samples with the specific aim of generating viral metagenome sequence libraries (viromes). Viral metagenomics, the study of uncultured viral nucleic acid sequences from different biomes, relies on several concentration, purification, extraction, sequencing and heuristic bioinformatic methods. No single technique can provide an all-inclusive approach, and therefore the protocols presented here will be discussed in terms of hypothetical projects. However, care must be taken to individualize each step depending on the source and type of viral-particles. This protocol is a description of the processes we have successfully used to: (i) concentrate viral particles from various types of samples, (ii) eliminate contaminating cells and free nucleic acids and (iii) extract, amplify and purify viral nucleic acids. Overall, a sample can be processed to isolate viral nucleic acids suitable for high-throughput sequencing in approximately 1 week.

  13. Fine-grained information extraction from German transthoracic echocardiography reports.

    Science.gov (United States)

    Toepfer, Martin; Corovic, Hamo; Fette, Georg; Klügl, Peter; Störk, Stefan; Puppe, Frank

    2015-11-12

    Information extraction techniques that get structured representations out of unstructured data make a large amount of clinically relevant information about patients accessible for semantic applications. These methods typically rely on standardized terminologies that guide this process. Many languages and clinical domains, however, lack appropriate resources and tools, as well as evaluations of their applications, especially if detailed conceptualizations of the domain are required. For instance, German transthoracic echocardiography reports have not been targeted sufficiently before, despite of their importance for clinical trials. This work therefore aimed at development and evaluation of an information extraction component with a fine-grained terminology that enables to recognize almost all relevant information stated in German transthoracic echocardiography reports at the University Hospital of Würzburg. A domain expert validated and iteratively refined an automatically inferred base terminology. The terminology was used by an ontology-driven information extraction system that outputs attribute value pairs. The final component has been mapped to the central elements of a standardized terminology, and it has been evaluated according to documents with different layouts. The final system achieved state-of-the-art precision (micro average.996) and recall (micro average.961) on 100 test documents that represent more than 90 % of all reports. In particular, principal aspects as defined in a standardized external terminology were recognized with f 1=.989 (micro average) and f 1=.963 (macro average). As a result of keyword matching and restraint concept extraction, the system obtained high precision also on unstructured or exceptionally short documents, and documents with uncommon layout. The developed terminology and the proposed information extraction system allow to extract fine-grained information from German semi-structured transthoracic echocardiography reports

  14. Extraction of Information of Audio-Visual Contents

    Directory of Open Access Journals (Sweden)

    Carlos Aguilar

    2011-10-01

    Full Text Available In this article we show how it is possible to use Channel Theory (Barwise and Seligman, 1997 for modeling the process of information extraction realized by audiences of audio-visual contents. To do this, we rely on the concepts pro- posed by Channel Theory and, especially, its treatment of representational systems. We then show how the information that an agent is capable of extracting from the content depends on the number of channels he is able to establish between the content and the set of classifications he is able to discriminate. The agent can endeavor the extraction of information through these channels from the totality of content; however, we discuss the advantages of extracting from its constituents in order to obtain a greater number of informational items that represent it. After showing how the extraction process is endeavored for each channel, we propose a method of representation of all the informative values an agent can obtain from a content using a matrix constituted by the channels the agent is able to establish on the content (source classifications, and the ones he can understand as individual (destination classifications. We finally show how this representation allows reflecting the evolution of the informative items through the evolution of audio-visual content.

  15. Semantic Information Extraction of Lanes Based on Onboard Camera Videos

    Science.gov (United States)

    Tang, L.; Deng, T.; Ren, C.

    2018-04-01

    In the field of autonomous driving, semantic information of lanes is very important. This paper proposes a method of automatic detection of lanes and extraction of semantic information from onboard camera videos. The proposed method firstly detects the edges of lanes by the grayscale gradient direction, and improves the Probabilistic Hough transform to fit them; then, it uses the vanishing point principle to calculate the lane geometrical position, and uses lane characteristics to extract lane semantic information by the classification of decision trees. In the experiment, 216 road video images captured by a camera mounted onboard a moving vehicle were used to detect lanes and extract lane semantic information. The results show that the proposed method can accurately identify lane semantics from video images.

  16. Bracken: estimating species abundance in metagenomics data

    Directory of Open Access Journals (Sweden)

    Jennifer Lu

    2017-01-01

    Full Text Available Metagenomic experiments attempt to characterize microbial communities using high-throughput DNA sequencing. Identification of the microorganisms in a sample provides information about the genetic profile, population structure, and role of microorganisms within an environment. Until recently, most metagenomics studies focused on high-level characterization at the level of phyla, or alternatively sequenced the 16S ribosomal RNA gene that is present in bacterial species. As the cost of sequencing has fallen, though, metagenomics experiments have increasingly used unbiased shotgun sequencing to capture all the organisms in a sample. This approach requires a method for estimating abundance directly from the raw read data. Here we describe a fast, accurate new method that computes the abundance at the species level using the reads collected in a metagenomics experiment. Bracken (Bayesian Reestimation of Abundance after Classification with KrakEN uses the taxonomic assignments made by Kraken, a very fast read-level classifier, along with information about the genomes themselves to estimate abundance at the species level, the genus level, or above. We demonstrate that Bracken can produce accurate species- and genus-level abundance estimates even when a sample contains multiple near-identical species.

  17. Beyond biodiversity: fish metagenomes.

    Directory of Open Access Journals (Sweden)

    Alba Ardura

    Full Text Available Biodiversity and intra-specific genetic diversity are interrelated and determine the potential of a community to survive and evolve. Both are considered together in Prokaryote communities treated as metagenomes or ensembles of functional variants beyond species limits.Many factors alter biodiversity in higher Eukaryote communities, and human exploitation can be one of the most important for some groups of plants and animals. For example, fisheries can modify both biodiversity and genetic diversity (intra specific. Intra-specific diversity can be drastically altered by overfishing. Intense fishing pressure on one stock may imply extinction of some genetic variants and subsequent loss of intra-specific diversity. The objective of this study was to apply a metagenome approach to fish communities and explore its value for rapid evaluation of biodiversity and genetic diversity at community level. Here we have applied the metagenome approach employing the barcoding target gene coi as a model sequence in catch from four very different fish assemblages exploited by fisheries: freshwater communities from the Amazon River and northern Spanish rivers, and marine communities from the Cantabric and Mediterranean seas.Treating all sequences obtained from each regional catch as a biological unit (exploited community we found that metagenomic diversity indices of the Amazonian catch sample here examined were lower than expected. Reduced diversity could be explained, at least partially, by overexploitation of the fish community that had been independently estimated by other methods.We propose using a metagenome approach for estimating diversity in Eukaryote communities and early evaluating genetic variation losses at multi-species level.

  18. Beyond biodiversity: fish metagenomes.

    Science.gov (United States)

    Ardura, Alba; Planes, Serge; Garcia-Vazquez, Eva

    2011-01-01

    Biodiversity and intra-specific genetic diversity are interrelated and determine the potential of a community to survive and evolve. Both are considered together in Prokaryote communities treated as metagenomes or ensembles of functional variants beyond species limits.Many factors alter biodiversity in higher Eukaryote communities, and human exploitation can be one of the most important for some groups of plants and animals. For example, fisheries can modify both biodiversity and genetic diversity (intra specific). Intra-specific diversity can be drastically altered by overfishing. Intense fishing pressure on one stock may imply extinction of some genetic variants and subsequent loss of intra-specific diversity. The objective of this study was to apply a metagenome approach to fish communities and explore its value for rapid evaluation of biodiversity and genetic diversity at community level. Here we have applied the metagenome approach employing the barcoding target gene coi as a model sequence in catch from four very different fish assemblages exploited by fisheries: freshwater communities from the Amazon River and northern Spanish rivers, and marine communities from the Cantabric and Mediterranean seas.Treating all sequences obtained from each regional catch as a biological unit (exploited community) we found that metagenomic diversity indices of the Amazonian catch sample here examined were lower than expected. Reduced diversity could be explained, at least partially, by overexploitation of the fish community that had been independently estimated by other methods.We propose using a metagenome approach for estimating diversity in Eukaryote communities and early evaluating genetic variation losses at multi-species level.

  19. Knowledge Dictionary for Information Extraction on the Arabic Text Data

    Directory of Open Access Journals (Sweden)

    Wahyu Jauharis Saputra

    2013-04-01

    Full Text Available Information extraction is an early stage of a process of textual data analysis. Information extraction is required to get information from textual data that can be used for process analysis, such as classification and categorization. A textual data is strongly influenced by the language. Arabic is gaining a significant attention in many studies because Arabic language is very different from others, and in contrast to other languages, tools and research on the Arabic language is still lacking. The information extracted using the knowledge dictionary is a concept of expression. A knowledge dictionary is usually constructed manually by an expert and this would take a long time and is specific to a problem only. This paper proposed a method for automatically building a knowledge dictionary. Dictionary knowledge is formed by classifying sentences having the same concept, assuming that they will have a high similarity value. The concept that has been extracted can be used as features for subsequent computational process such as classification or categorization. Dataset used in this paper was the Arabic text dataset. Extraction result was tested by using a decision tree classification engine and the highest precision value obtained was 71.0% while the highest recall value was 75.0%. 

  20. Ontology-Based Information Extraction for Business Intelligence

    Science.gov (United States)

    Saggion, Horacio; Funk, Adam; Maynard, Diana; Bontcheva, Kalina

    Business Intelligence (BI) requires the acquisition and aggregation of key pieces of knowledge from multiple sources in order to provide valuable information to customers or feed statistical BI models and tools. The massive amount of information available to business analysts makes information extraction and other natural language processing tools key enablers for the acquisition and use of that semantic information. We describe the application of ontology-based extraction and merging in the context of a practical e-business application for the EU MUSING Project where the goal is to gather international company intelligence and country/region information. The results of our experiments so far are very promising and we are now in the process of building a complete end-to-end solution.

  1. Soil metagenomics and tropical soil productivity

    OpenAIRE

    Garrett, Karen A.

    2009-01-01

    This presentation summarizes research in the soil metagenomics cross cutting research activity. Soil metagenomics studies soil microbial communities as contributors to soil health.C CCRA-4 (Soil Metagenomics)

  2. NAMED ENTITY RECOGNITION FROM BIOMEDICAL TEXT -AN INFORMATION EXTRACTION TASK

    Directory of Open Access Journals (Sweden)

    N. Kanya

    2016-07-01

    Full Text Available Biomedical Text Mining targets the Extraction of significant information from biomedical archives. Bio TM encompasses Information Retrieval (IR and Information Extraction (IE. The Information Retrieval will retrieve the relevant Biomedical Literature documents from the various Repositories like PubMed, MedLine etc., based on a search query. The IR Process ends up with the generation of corpus with the relevant document retrieved from the Publication databases based on the query. The IE task includes the process of Preprocessing of the document, Named Entity Recognition (NER from the documents and Relationship Extraction. This process includes Natural Language Processing, Data Mining techniques and machine Language algorithm. The preprocessing task includes tokenization, stop word Removal, shallow parsing, and Parts-Of-Speech tagging. NER phase involves recognition of well-defined objects such as genes, proteins or cell-lines etc. This process leads to the next phase that is extraction of relationships (IE. The work was based on machine learning algorithm Conditional Random Field (CRF.

  3. Multiple comparative metagenomics using multiset k-mer counting

    Directory of Open Access Journals (Sweden)

    Gaëtan Benoit

    2016-11-01

    Full Text Available Background Large scale metagenomic projects aim to extract biodiversity knowledge between different environmental conditions. Current methods for comparing microbial communities face important limitations. Those based on taxonomical or functional assignation rely on a small subset of the sequences that can be associated to known organisms. On the other hand, de novo methods, that compare the whole sets of sequences, either do not scale up on ambitious metagenomic projects or do not provide precise and exhaustive results. Methods These limitations motivated the development of a new de novo metagenomic comparative method, called Simka. This method computes a large collection of standard ecological distances by replacing species counts by k-mer counts. Simka scales-up today’s metagenomic projects thanks to a new parallel k-mer counting strategy on multiple datasets. Results Experiments on public Human Microbiome Project datasets demonstrate that Simka captures the essential underlying biological structure. Simka was able to compute in a few hours both qualitative and quantitative ecological distances on hundreds of metagenomic samples (690 samples, 32 billions of reads. We also demonstrate that analyzing metagenomes at the k-mer level is highly correlated with extremely precise de novo comparison techniques which rely on all-versus-all sequences alignment strategy or which are based on taxonomic profiling.

  4. Evaluation of ddRADseq for reduced representation metagenome sequencing

    Directory of Open Access Journals (Sweden)

    Michael Y. Liu

    2017-09-01

    Full Text Available Background Profiling of microbial communities via metagenomic shotgun sequencing has enabled researches to gain unprecedented insight into microbial community structure and the functional roles of community members. This study describes a method and basic analysis for a metagenomic adaptation of the double digest restriction site associated DNA sequencing (ddRADseq protocol for reduced representation metagenome profiling. Methods This technique takes advantage of the sequence specificity of restriction endonucleases to construct an Illumina-compatible sequencing library containing DNA fragments that are between a pair of restriction sites located within close proximity. This results in a reduced sequencing library with coverage breadth that can be tuned by size selection. We assessed the performance of the metagenomic ddRADseq approach by applying the full method to human stool samples and generating sequence data. Results The ddRADseq data yields a similar estimate of community taxonomic profile as obtained from shotgun metagenome sequencing of the same human stool samples. No obvious bias with respect to genomic G + C content and the estimated relative species abundance was detected. Discussion Although ddRADseq does introduce some bias in taxonomic representation, the bias is likely to be small relative to DNA extraction bias. ddRADseq appears feasible and could have value as a tool for metagenome-wide association studies.

  5. The YNP metagenome project

    DEFF Research Database (Denmark)

    Inskeep, William P.; Jay, Zackary J.; Tringe, Susannah G.

    2013-01-01

    The Yellowstone geothermal complex contains over 10,000 diverse geothermal features that host numerous phylogenetically deeply rooted and poorly understood archaea, bacteria, and viruses. Microbial communities in high-temperature environments are generally less diverse than soil, marine, sediment......, and environmental variables. Twenty geochemically distinct geothermal ecosystems representing a broad spectrum of Yellowstone hot-spring environments were used for metagenomic and geochemical analysis and included approximately equal numbers of: (1) phototrophic mats, (2) “filamentous streamer” communities, and (3...

  6. A Two-Step Resume Information Extraction Algorithm

    Directory of Open Access Journals (Sweden)

    Jie Chen

    2018-01-01

    Full Text Available With the rapid growth of Internet-based recruiting, there are a great number of personal resumes among recruiting systems. To gain more attention from the recruiters, most resumes are written in diverse formats, including varying font size, font colour, and table cells. However, the diversity of format is harmful to data mining, such as resume information extraction, automatic job matching, and candidates ranking. Supervised methods and rule-based methods have been proposed to extract facts from resumes, but they strongly rely on hierarchical structure information and large amounts of labelled data, which are hard to collect in reality. In this paper, we propose a two-step resume information extraction approach. In the first step, raw text of resume is identified as different resume blocks. To achieve the goal, we design a novel feature, Writing Style, to model sentence syntax information. Besides word index and punctuation index, word lexical attribute and prediction results of classifiers are included in Writing Style. In the second step, multiple classifiers are employed to identify different attributes of fact information in resumes. Experimental results on a real-world dataset show that the algorithm is feasible and effective.

  7. Optimum detection for extracting maximum information from symmetric qubit sets

    International Nuclear Information System (INIS)

    Mizuno, Jun; Fujiwara, Mikio; Sasaki, Masahide; Akiba, Makoto; Kawanishi, Tetsuya; Barnett, Stephen M.

    2002-01-01

    We demonstrate a class of optimum detection strategies for extracting the maximum information from sets of equiprobable real symmetric qubit states of a single photon. These optimum strategies have been predicted by Sasaki et al. [Phys. Rev. A 59, 3325 (1999)]. The peculiar aspect is that the detections with at least three outputs suffice for optimum extraction of information regardless of the number of signal elements. The cases of ternary (or trine), quinary, and septenary polarization signals are studied where a standard von Neumann detection (a projection onto a binary orthogonal basis) fails to access the maximum information. Our experiments demonstrate that it is possible with present technologies to attain about 96% of the theoretical limit

  8. Extracting Semantic Information from Visual Data: A Survey

    Directory of Open Access Journals (Sweden)

    Qiang Liu

    2016-03-01

    Full Text Available The traditional environment maps built by mobile robots include both metric ones and topological ones. These maps are navigation-oriented and not adequate for service robots to interact with or serve human users who normally rely on the conceptual knowledge or semantic contents of the environment. Therefore, the construction of semantic maps becomes necessary for building an effective human-robot interface for service robots. This paper reviews recent research and development in the field of visual-based semantic mapping. The main focus is placed on how to extract semantic information from visual data in terms of feature extraction, object/place recognition and semantic representation methods.

  9. Rapid automatic keyword extraction for information retrieval and analysis

    Science.gov (United States)

    Rose, Stuart J [Richland, WA; Cowley,; E, Wendy [Richland, WA; Crow, Vernon L [Richland, WA; Cramer, Nicholas O [Richland, WA

    2012-03-06

    Methods and systems for rapid automatic keyword extraction for information retrieval and analysis. Embodiments can include parsing words in an individual document by delimiters, stop words, or both in order to identify candidate keywords. Word scores for each word within the candidate keywords are then calculated based on a function of co-occurrence degree, co-occurrence frequency, or both. Based on a function of the word scores for words within the candidate keyword, a keyword score is calculated for each of the candidate keywords. A portion of the candidate keywords are then extracted as keywords based, at least in part, on the candidate keywords having the highest keyword scores.

  10. Robust Vehicle and Traffic Information Extraction for Highway Surveillance

    Directory of Open Access Journals (Sweden)

    Yeh Chia-Hung

    2005-01-01

    Full Text Available A robust vision-based traffic monitoring system for vehicle and traffic information extraction is developed in this research. It is challenging to maintain detection robustness at all time for a highway surveillance system. There are three major problems in detecting and tracking a vehicle: (1 the moving cast shadow effect, (2 the occlusion effect, and (3 nighttime detection. For moving cast shadow elimination, a 2D joint vehicle-shadow model is employed. For occlusion detection, a multiple-camera system is used to detect occlusion so as to extract the exact location of each vehicle. For vehicle nighttime detection, a rear-view monitoring technique is proposed to maintain tracking and detection accuracy. Furthermore, we propose a method to improve the accuracy of background extraction, which usually serves as the first step in any vehicle detection processing. Experimental results are given to demonstrate that the proposed techniques are effective and efficient for vision-based highway surveillance.

  11. A highly optimized grid deployment: the metagenomic analysis example.

    Science.gov (United States)

    Aparicio, Gabriel; Blanquer, Ignacio; Hernández, Vicente

    2008-01-01

    Computational resources and computationally expensive processes are two topics that are not growing at the same ratio. The availability of large amounts of computing resources in Grid infrastructures does not mean that efficiency is not an important issue. It is necessary to analyze the whole process to improve partitioning and submission schemas, especially in the most critical experiments. This is the case of metagenomic analysis, and this text shows the work done in order to optimize a Grid deployment, which has led to a reduction of the response time and the failure rates. Metagenomic studies aim at processing samples of multiple specimens to extract the genes and proteins that belong to the different species. In many cases, the sequencing of the DNA of many microorganisms is hindered by the impossibility of growing significant samples of isolated specimens. Many bacteria cannot survive alone, and require the interaction with other organisms. In such cases, the information of the DNA available belongs to different kinds of organisms. One important stage in Metagenomic analysis consists on the extraction of fragments followed by the comparison and analysis of their function stage. By the comparison to existing chains, whose function is well known, fragments can be classified. This process is computationally intensive and requires of several iterations of alignment and phylogeny classification steps. Source samples reach several millions of sequences, which could reach up to thousands of nucleotides each. These sequences are compared to a selected part of the "Non-redundant" database which only implies the information from eukaryotic species. From this first analysis, a refining process is performed and alignment analysis is restarted from the results. This process implies several CPU years. The article describes and analyzes the difficulties to fragment, automate and check the above operations in current Grid production environments. This environment has been

  12. Advanced applications of natural language processing for performing information extraction

    CERN Document Server

    Rodrigues, Mário

    2015-01-01

    This book explains how can be created information extraction (IE) applications that are able to tap the vast amount of relevant information available in natural language sources: Internet pages, official documents such as laws and regulations, books and newspapers, and social web. Readers are introduced to the problem of IE and its current challenges and limitations, supported with examples. The book discusses the need to fill the gap between documents, data, and people, and provides a broad overview of the technology supporting IE. The authors present a generic architecture for developing systems that are able to learn how to extract relevant information from natural language documents, and illustrate how to implement working systems using state-of-the-art and freely available software tools. The book also discusses concrete applications illustrating IE uses.   ·         Provides an overview of state-of-the-art technology in information extraction (IE), discussing achievements and limitations for t...

  13. A Bioinformatician's Guide to Metagenomics

    Energy Technology Data Exchange (ETDEWEB)

    Kunin, Victor; Copeland, Alex; Lapidus, Alla; Mavromatis, Konstantinos; Hugenholtz, Philip

    2008-08-01

    As random shotgun metagenomic projects proliferate and become the dominant source of publicly available sequence data, procedures for best practices in their execution and analysis become increasingly important. Based on our experience at the Joint Genome Institute, we describe step-by-step the chain of decisions accompanying a metagenomic project from the viewpoint of a bioinformatician. We guide the reader through a standard workflow for a metagenomic project beginning with pre-sequencing considerations such as community composition and sequence data type that will greatly influence downstream analyses. We proceed with recommendations for sampling and data generation including sample and metadata collection, community profiling, construction of shotgun libraries and sequencing strategies. We then discuss the application of generic sequence processing steps (read preprocessing, assembly, and gene prediction and annotation) to metagenomic datasets by contrast to genome projects. Different types of data analyses particular to metagenomes are then presented including binning, dominant population analysis and gene-centric analysis. Finally data management systems and issues are presented and discussed. We hope that this review will assist bioinformaticians and biologists in making better-informed decisions on their journey during a metagenomic project.

  14. Improving information extraction using a probability-based approach

    DEFF Research Database (Denmark)

    Kim, S.; Ahmed, Saeema; Wallace, K.

    2007-01-01

    Information plays a crucial role during the entire life-cycle of a product. It has been shown that engineers frequently consult colleagues to obtain the information they require to solve problems. However, the industrial world is now more transient and key personnel move to other companies...... or retire. It is becoming essential to retrieve vital information from archived product documents, if it is available. There is, therefore, great interest in ways of extracting relevant and sharable information from documents. A keyword-based search is commonly used, but studies have shown...... the recall, while maintaining the high precision, a learning approach that makes identification decisions based on a probability model, rather than simply looking up the presence of the pre-defined variations, looks promising. This paper presents the results of developing such a probability-based entity...

  15. Finding the needles in the meta-genome haystack

    NARCIS (Netherlands)

    Kowalchuk, G.A.; Speksnijder, A.G.C.L.; Zhang, K.; Goodman, R.M.; Veen, van J.A.

    2007-01-01

    In the collective genomes (the metagenome) of the microorganisms inhabiting the Earth's diverse environments is written the history of life on this planet. New molecular tools developed and used for the past 15 years by microbial ecologists are facilitating the extraction, cloning, screening, and

  16. The microbiome of Brazilian mangrove sediments as revealed by metagenomics

    NARCIS (Netherlands)

    Andreote, Fernando Dini; Jiménez Avella, Diego; Chaves, Diego; Dias, Armando Cavalcante Franco; Luvizotto, Danice Mazzer; Dini-Andreote, Francisco; Fasanella, Cristiane Cipola; Lopez, Maryeimy Varon; Baena, Sandra; Taketani, Rodrigo Gouvêa; de Melo, Itamar Soares

    2012-01-01

    Here we embark in a deep metagenomic survey that revealed the taxonomic and potential metabolic pathways aspects of mangrove sediment microbiology. The extraction of DNA from sediment samples and the direct application of pyrosequencing resulted in approximately 215 Mb of data from four distinct

  17. Transliteration normalization for Information Extraction and Machine Translation

    Directory of Open Access Journals (Sweden)

    Yuval Marton

    2014-12-01

    Full Text Available Foreign name transliterations typically include multiple spelling variants. These variants cause data sparseness and inconsistency problems, increase the Out-of-Vocabulary (OOV rate, and present challenges for Machine Translation, Information Extraction and other natural language processing (NLP tasks. This work aims to identify and cluster name spelling variants using a Statistical Machine Translation method: word alignment. The variants are identified by being aligned to the same “pivot” name in another language (the source-language in Machine Translation settings. Based on word-to-word translation and transliteration probabilities, as well as the string edit distance metric, names with similar spellings in the target language are clustered and then normalized to a canonical form. With this approach, tens of thousands of high-precision name transliteration spelling variants are extracted from sentence-aligned bilingual corpora in Arabic and English (in both languages. When these normalized name spelling variants are applied to Information Extraction tasks, improvements over strong baseline systems are observed. When applied to Machine Translation tasks, a large improvement potential is shown.

  18. Metagenome Assembly at the DOE JGI (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    Energy Technology Data Exchange (ETDEWEB)

    Chain, Patrick

    2011-10-13

    Patrick Chain of DOE JGI at LANL, Co-Chair of the Metagenome-specific Assembly session, on Metagenome Assembly at the DOE JGIat the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  19. Knowledge discovery: Extracting usable information from large amounts of data

    International Nuclear Information System (INIS)

    Whiteson, R.

    1998-01-01

    The threat of nuclear weapons proliferation is a problem of world wide concern. Safeguards are the key to nuclear nonproliferation and data is the key to safeguards. The safeguards community has access to a huge and steadily growing volume of data. The advantages of this data rich environment are obvious, there is a great deal of information which can be utilized. The challenge is to effectively apply proven and developing technologies to find and extract usable information from that data. That information must then be assessed and evaluated to produce the knowledge needed for crucial decision making. Efficient and effective analysis of safeguards data will depend on utilizing technologies to interpret the large, heterogeneous data sets that are available from diverse sources. With an order-of-magnitude increase in the amount of data from a wide variety of technical, textual, and historical sources there is a vital need to apply advanced computer technologies to support all-source analysis. There are techniques of data warehousing, data mining, and data analysis that can provide analysts with tools that will expedite their extracting useable information from the huge amounts of data to which they have access. Computerized tools can aid analysts by integrating heterogeneous data, evaluating diverse data streams, automating retrieval of database information, prioritizing inputs, reconciling conflicting data, doing preliminary interpretations, discovering patterns or trends in data, and automating some of the simpler prescreening tasks that are time consuming and tedious. Thus knowledge discovery technologies can provide a foundation of support for the analyst. Rather than spending time sifting through often irrelevant information, analysts could use their specialized skills in a focused, productive fashion. This would allow them to make their analytical judgments with more confidence and spend more of their time doing what they do best

  20. Mining the metagenome of activated biomass of an industrial wastewater treatment plant by a novel method.

    Science.gov (United States)

    Sharma, Nandita; Tanksale, Himgouri; Kapley, Atya; Purohit, Hemant J

    2012-12-01

    Metagenomic libraries herald the era of magnifying the microbial world, tapping into the vast metabolic potential of uncultivated microbes, and enhancing the rate of discovery of novel genes and pathways. In this paper, we describe a method that facilitates the extraction of metagenomic DNA from activated sludge of an industrial wastewater treatment plant and its use in mining the metagenome via library construction. The efficiency of this method was demonstrated by the large representation of the bacterial genome in the constructed metagenomic libraries and by the functional clones obtained. The BAC library represented 95.6 times the bacterial genome, while, the pUC library represented 41.7 times the bacterial genome. Twelve clones in the BAC library demonstrated lipolytic activity, while four clones demonstrated dioxygenase activity. Four clones in pUC library tested positive for cellulase activity. This method, using FTA cards, not only can be used for library construction, but can also store the metagenome at room temperature.

  1. Evolving spectral transformations for multitemporal information extraction using evolutionary computation

    Science.gov (United States)

    Momm, Henrique; Easson, Greg

    2011-01-01

    Remote sensing plays an important role in assessing temporal changes in land features. The challenge often resides in the conversion of large quantities of raw data into actionable information in a timely and cost-effective fashion. To address this issue, research was undertaken to develop an innovative methodology integrating biologically-inspired algorithms with standard image classification algorithms to improve information extraction from multitemporal imagery. Genetic programming was used as the optimization engine to evolve feature-specific candidate solutions in the form of nonlinear mathematical expressions of the image spectral channels (spectral indices). The temporal generalization capability of the proposed system was evaluated by addressing the task of building rooftop identification from a set of images acquired at different dates in a cross-validation approach. The proposed system generates robust solutions (kappa values > 0.75 for stage 1 and > 0.4 for stage 2) despite the statistical differences between the scenes caused by land use and land cover changes coupled with variable environmental conditions, and the lack of radiometric calibration between images. Based on our results, the use of nonlinear spectral indices enhanced the spectral differences between features improving the clustering capability of standard classifiers and providing an alternative solution for multitemporal information extraction.

  2. Recognition techniques for extracting information from semistructured documents

    Science.gov (United States)

    Della Ventura, Anna; Gagliardi, Isabella; Zonta, Bruna

    2000-12-01

    Archives of optical documents are more and more massively employed, the demand driven also by the new norms sanctioning the legal value of digital documents, provided they are stored on supports that are physically unalterable. On the supply side there is now a vast and technologically advanced market, where optical memories have solved the problem of the duration and permanence of data at costs comparable to those for magnetic memories. The remaining bottleneck in these systems is the indexing. The indexing of documents with a variable structure, while still not completely automated, can be machine supported to a large degree with evident advantages both in the organization of the work, and in extracting information, providing data that is much more detailed and potentially significant for the user. We present here a system for the automatic registration of correspondence to and from a public office. The system is based on a general methodology for the extraction, indexing, archiving, and retrieval of significant information from semi-structured documents. This information, in our prototype application, is distributed among the database fields of sender, addressee, subject, date, and body of the document.

  3. Genome signature analysis of thermal virus metagenomes reveals Archaea and thermophilic signatures

    Directory of Open Access Journals (Sweden)

    Pride David T

    2008-09-01

    Full Text Available Abstract Background Metagenomic analysis provides a rich source of biological information for otherwise intractable viral communities. However, study of viral metagenomes has been hampered by its nearly complete reliance on BLAST algorithms for identification of DNA sequences. We sought to develop algorithms for examination of viral metagenomes to identify the origin of sequences independent of BLAST algorithms. We chose viral metagenomes obtained from two hot springs, Bear Paw and Octopus, in Yellowstone National Park, as they represent simple microbial populations where comparatively large contigs were obtained. Thermal spring metagenomes have high proportions of sequences without significant Genbank homology, which has hampered identification of viruses and their linkage with hosts. To analyze each metagenome, we developed a method to classify DNA fragments using genome signature-based phylogenetic classification (GSPC, where metagenomic fragments are compared to a database of oligonucleotide signatures for all previously sequenced Bacteria, Archaea, and viruses. Results From both Bear Paw and Octopus hot springs, each assembled contig had more similarity to other metagenome contigs than to any sequenced microbial genome based on GSPC analysis, suggesting a genome signature common to each of these extreme environments. While viral metagenomes from Bear Paw and Octopus share some similarity, the genome signatures from each locale are largely unique. GSPC using a microbial database predicts most of the Octopus metagenome has archaeal signatures, while bacterial signatures predominate in Bear Paw; a finding consistent with those of Genbank BLAST. When using a viral database, the majority of the Octopus metagenome is predicted to belong to archaeal virus Families Globuloviridae and Fuselloviridae, while none of the Bear Paw metagenome is predicted to belong to archaeal viruses. As expected, when microbial and viral databases are combined, each of

  4. Genome signature analysis of thermal virus metagenomes reveals Archaea and thermophilic signatures.

    Science.gov (United States)

    Pride, David T; Schoenfeld, Thomas

    2008-09-17

    Metagenomic analysis provides a rich source of biological information for otherwise intractable viral communities. However, study of viral metagenomes has been hampered by its nearly complete reliance on BLAST algorithms for identification of DNA sequences. We sought to develop algorithms for examination of viral metagenomes to identify the origin of sequences independent of BLAST algorithms. We chose viral metagenomes obtained from two hot springs, Bear Paw and Octopus, in Yellowstone National Park, as they represent simple microbial populations where comparatively large contigs were obtained. Thermal spring metagenomes have high proportions of sequences without significant Genbank homology, which has hampered identification of viruses and their linkage with hosts. To analyze each metagenome, we developed a method to classify DNA fragments using genome signature-based phylogenetic classification (GSPC), where metagenomic fragments are compared to a database of oligonucleotide signatures for all previously sequenced Bacteria, Archaea, and viruses. From both Bear Paw and Octopus hot springs, each assembled contig had more similarity to other metagenome contigs than to any sequenced microbial genome based on GSPC analysis, suggesting a genome signature common to each of these extreme environments. While viral metagenomes from Bear Paw and Octopus share some similarity, the genome signatures from each locale are largely unique. GSPC using a microbial database predicts most of the Octopus metagenome has archaeal signatures, while bacterial signatures predominate in Bear Paw; a finding consistent with those of Genbank BLAST. When using a viral database, the majority of the Octopus metagenome is predicted to belong to archaeal virus Families Globuloviridae and Fuselloviridae, while none of the Bear Paw metagenome is predicted to belong to archaeal viruses. As expected, when microbial and viral databases are combined, each of the Octopus and Bear Paw metagenomic contigs

  5. Assembling large, complex environmental metagenomes

    Energy Technology Data Exchange (ETDEWEB)

    Howe, A. C. [Michigan State Univ., East Lansing, MI (United States). Microbiology and Molecular Genetics, Plant Soil and Microbial Sciences; Jansson, J. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Earth Sciences Division; Malfatti, S. A. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Tringe, S. G. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Tiedje, J. M. [Michigan State Univ., East Lansing, MI (United States). Microbiology and Molecular Genetics, Plant Soil and Microbial Sciences; Brown, C. T. [Michigan State Univ., East Lansing, MI (United States). Microbiology and Molecular Genetics, Computer Science and Engineering

    2012-12-28

    The large volumes of sequencing data required to sample complex environments deeply pose new challenges to sequence analysis approaches. De novo metagenomic assembly effectively reduces the total amount of data to be analyzed but requires significant computational resources. We apply two pre-assembly filtering approaches, digital normalization and partitioning, to make large metagenome assemblies more computationaly tractable. Using a human gut mock community dataset, we demonstrate that these methods result in assemblies nearly identical to assemblies from unprocessed data. We then assemble two large soil metagenomes from matched Iowa corn and native prairie soils. The predicted functional content and phylogenetic origin of the assembled contigs indicate significant taxonomic differences despite similar function. The assembly strategies presented are generic and can be extended to any metagenome; full source code is freely available under a BSD license.

  6. Automated extraction of chemical structure information from digital raster images

    Directory of Open Access Journals (Sweden)

    Shedden Kerby A

    2009-02-01

    Full Text Available Abstract Background To search for chemical structures in research articles, diagrams or text representing molecules need to be translated to a standard chemical file format compatible with cheminformatic search engines. Nevertheless, chemical information contained in research articles is often referenced as analog diagrams of chemical structures embedded in digital raster images. To automate analog-to-digital conversion of chemical structure diagrams in scientific research articles, several software systems have been developed. But their algorithmic performance and utility in cheminformatic research have not been investigated. Results This paper aims to provide critical reviews for these systems and also report our recent development of ChemReader – a fully automated tool for extracting chemical structure diagrams in research articles and converting them into standard, searchable chemical file formats. Basic algorithms for recognizing lines and letters representing bonds and atoms in chemical structure diagrams can be independently run in sequence from a graphical user interface-and the algorithm parameters can be readily changed-to facilitate additional development specifically tailored to a chemical database annotation scheme. Compared with existing software programs such as OSRA, Kekule, and CLiDE, our results indicate that ChemReader outperforms other software systems on several sets of sample images from diverse sources in terms of the rate of correct outputs and the accuracy on extracting molecular substructure patterns. Conclusion The availability of ChemReader as a cheminformatic tool for extracting chemical structure information from digital raster images allows research and development groups to enrich their chemical structure databases by annotating the entries with published research articles. Based on its stable performance and high accuracy, ChemReader may be sufficiently accurate for annotating the chemical database with links

  7. Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art

    NARCIS (Netherlands)

    Habib, Mena Badieh; van Keulen, Maurice

    2011-01-01

    Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration

  8. Information Extraction for Clinical Data Mining: A Mammography Case Study.

    Science.gov (United States)

    Nassif, Houssam; Woods, Ryan; Burnside, Elizabeth; Ayvaci, Mehmet; Shavlik, Jude; Page, David

    2009-01-01

    Breast cancer is the leading cause of cancer mortality in women between the ages of 15 and 54. During mammography screening, radiologists use a strict lexicon (BI-RADS) to describe and report their findings. Mammography records are then stored in a well-defined database format (NMD). Lately, researchers have applied data mining and machine learning techniques to these databases. They successfully built breast cancer classifiers that can help in early detection of malignancy. However, the validity of these models depends on the quality of the underlying databases. Unfortunately, most databases suffer from inconsistencies, missing data, inter-observer variability and inappropriate term usage. In addition, many databases are not compliant with the NMD format and/or solely consist of text reports. BI-RADS feature extraction from free text and consistency checks between recorded predictive variables and text reports are crucial to addressing this problem. We describe a general scheme for concept information retrieval from free text given a lexicon, and present a BI-RADS features extraction algorithm for clinical data mining. It consists of a syntax analyzer, a concept finder and a negation detector. The syntax analyzer preprocesses the input into individual sentences. The concept finder uses a semantic grammar based on the BI-RADS lexicon and the experts' input. It parses sentences detecting BI-RADS concepts. Once a concept is located, a lexical scanner checks for negation. Our method can handle multiple latent concepts within the text, filtering out ultrasound concepts. On our dataset, our algorithm achieves 97.7% precision, 95.5% recall and an F 1 -score of 0.97. It outperforms manual feature extraction at the 5% statistical significance level.

  9. INFORMATION EXTRACTION IN TOMB PIT USING HYPERSPECTRAL DATA

    Directory of Open Access Journals (Sweden)

    X. Yang

    2018-04-01

    Full Text Available Hyperspectral data has characteristics of multiple bands and continuous, large amount of data, redundancy, and non-destructive. These characteristics make it possible to use hyperspectral data to study cultural relics. In this paper, the hyperspectral imaging technology is adopted to recognize the bottom images of an ancient tomb located in Shanxi province. There are many black remains on the bottom surface of the tomb, which are suspected to be some meaningful texts or paintings. Firstly, the hyperspectral data is preprocessing to get the reflectance of the region of interesting. For the convenient of compute and storage, the original reflectance value is multiplied by 10000. Secondly, this article uses three methods to extract the symbols at the bottom of the ancient tomb. Finally we tried to use morphology to connect the symbols and gave fifteen reference images. The results show that the extraction of information based on hyperspectral data can obtain a better visual experience, which is beneficial to the study of ancient tombs by researchers, and provides some references for archaeological research findings.

  10. Information Extraction in Tomb Pit Using Hyperspectral Data

    Science.gov (United States)

    Yang, X.; Hou, M.; Lyu, S.; Ma, S.; Gao, Z.; Bai, S.; Gu, M.; Liu, Y.

    2018-04-01

    Hyperspectral data has characteristics of multiple bands and continuous, large amount of data, redundancy, and non-destructive. These characteristics make it possible to use hyperspectral data to study cultural relics. In this paper, the hyperspectral imaging technology is adopted to recognize the bottom images of an ancient tomb located in Shanxi province. There are many black remains on the bottom surface of the tomb, which are suspected to be some meaningful texts or paintings. Firstly, the hyperspectral data is preprocessing to get the reflectance of the region of interesting. For the convenient of compute and storage, the original reflectance value is multiplied by 10000. Secondly, this article uses three methods to extract the symbols at the bottom of the ancient tomb. Finally we tried to use morphology to connect the symbols and gave fifteen reference images. The results show that the extraction of information based on hyperspectral data can obtain a better visual experience, which is beneficial to the study of ancient tombs by researchers, and provides some references for archaeological research findings.

  11. Automated Extraction of Substance Use Information from Clinical Texts.

    Science.gov (United States)

    Wang, Yan; Chen, Elizabeth S; Pakhomov, Serguei; Arsoniadis, Elliot; Carter, Elizabeth W; Lindemann, Elizabeth; Sarkar, Indra Neil; Melton, Genevieve B

    2015-01-01

    Within clinical discourse, social history (SH) includes important information about substance use (alcohol, drug, and nicotine use) as key risk factors for disease, disability, and mortality. In this study, we developed and evaluated a natural language processing (NLP) system for automated detection of substance use statements and extraction of substance use attributes (e.g., temporal and status) based on Stanford Typed Dependencies. The developed NLP system leveraged linguistic resources and domain knowledge from a multi-site social history study, Propbank and the MiPACQ corpus. The system attained F-scores of 89.8, 84.6 and 89.4 respectively for alcohol, drug, and nicotine use statement detection, as well as average F-scores of 82.1, 90.3, 80.8, 88.7, 96.6, and 74.5 respectively for extraction of attributes. Our results suggest that NLP systems can achieve good performance when augmented with linguistic resources and domain knowledge when applied to a wide breadth of substance use free text clinical notes.

  12. Domain-independent information extraction in unstructured text

    Energy Technology Data Exchange (ETDEWEB)

    Irwin, N.H. [Sandia National Labs., Albuquerque, NM (United States). Software Surety Dept.

    1996-09-01

    Extracting information from unstructured text has become an important research area in recent years due to the large amount of text now electronically available. This status report describes the findings and work done during the second year of a two-year Laboratory Directed Research and Development Project. Building on the first-year`s work of identifying important entities, this report details techniques used to group words into semantic categories and to output templates containing selective document content. Using word profiles and category clustering derived during a training run, the time-consuming knowledge-building task can be avoided. Though the output still lacks in completeness when compared to systems with domain-specific knowledge bases, the results do look promising. The two approaches are compatible and could complement each other within the same system. Domain-independent approaches retain appeal as a system that adapts and learns will soon outpace a system with any amount of a priori knowledge.

  13. Extracting and Using Photon Polarization Information in Radiative B Decays

    Energy Technology Data Exchange (ETDEWEB)

    Grossman, Yuval

    2000-05-09

    The authors discuss the uses of conversion electron pairs for extracting photon polarization information in weak radiative B decays. Both cases of leptons produced through a virtual and real photon are considered. Measurements of the angular correlation between the (K-pi) and (e{sup +}e{sup {minus}}) decay planes in B --> K*(--> K-pi)gamma (*)(--> e{sup +}e{sup {minus}}) decays can be used to determine the helicity amplitudes in the radiative B --> K*gamma decays. A large right-handed helicity amplitude in B-bar decays is a signal of new physics. The time-dependent CP asymmetry in the B{sup 0} decay angular correlation is shown to measure sin 2-beta and cos 2-beta with little hadronic uncertainty.

  14. Extraction of neutron spectral information from Bonner-Sphere data

    CERN Document Server

    Haney, J H; Zaidins, C S

    1999-01-01

    We have extended a least-squares method of extracting neutron spectral information from Bonner-Sphere data which was previously developed by Zaidins et al. (Med. Phys. 5 (1978) 42). A pulse-height analysis with background stripping is employed which provided a more accurate count rate for each sphere. Newer response curves by Mares and Schraube (Nucl. Instr. and Meth. A 366 (1994) 461) were included for the moderating spheres and the bare detector which comprise the Bonner spectrometer system. Finally, the neutron energy spectrum of interest was divided using the philosophy of fuzzy logic into three trapezoidal regimes corresponding to slow, moderate, and fast neutrons. Spectral data was taken using a PuBe source in two different environments and the analyzed data is presented for these cases as slow, moderate, and fast neutron fluences. (author)

  15. ONTOGRABBING: Extracting Information from Texts Using Generative Ontologies

    DEFF Research Database (Denmark)

    Nilsson, Jørgen Fischer; Szymczak, Bartlomiej Antoni; Jensen, P.A.

    2009-01-01

    We describe principles for extracting information from texts using a so-called generative ontology in combination with syntactic analysis. Generative ontologies are introduced as semantic domains for natural language phrases. Generative ontologies extend ordinary finite ontologies with rules...... for producing recursively shaped terms representing the ontological content (ontological semantics) of NL noun phrases and other phrases. We focus here on achieving a robust, often only partial, ontology-driven parsing of and ascription of semantics to a sentence in the text corpus. The aim of the ontological...... analysis is primarily to identify paraphrases, thereby achieving a search functionality beyond mere keyword search with synsets. We further envisage use of the generative ontology as a phrase-based rather than word-based browser into text corpora....

  16. MetaStorm: A Public Resource for Customizable Metagenomics Annotation.

    Science.gov (United States)

    Arango-Argoty, Gustavo; Singh, Gargi; Heath, Lenwood S; Pruden, Amy; Xiao, Weidong; Zhang, Liqing

    2016-01-01

    Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/), which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution.

  17. MetaStorm: A Public Resource for Customizable Metagenomics Annotation.

    Directory of Open Access Journals (Sweden)

    Gustavo Arango-Argoty

    Full Text Available Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/, which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution.

  18. MetaStorm: A Public Resource for Customizable Metagenomics Annotation

    Science.gov (United States)

    Arango-Argoty, Gustavo; Singh, Gargi; Heath, Lenwood S.; Pruden, Amy; Xiao, Weidong; Zhang, Liqing

    2016-01-01

    Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/), which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution. PMID:27632579

  19. Comparative metagenomics of the Red Sea

    KAUST Repository

    Mineta, Katsuhiko

    2016-01-01

    started monthly samplings of the metagenomes in the Red Sea under KAUST-CCF project. In collaboration with Kitasato University, we also collected the metagenome data from the ocean in Japan, which shows contrasting features to the Red Sea. Therefore

  20. Marine metagenomics as a source for bioprospecting

    KAUST Repository

    Kodzius, Rimantas; Gojobori, Takashi

    2015-01-01

    This review summarizes usage of genome-editing technologies for metagenomic studies; these studies are used to retrieve and modify valuable microorganisms for production, particularly in marine metagenomics. Organisms may be cultivable

  1. Ancient DNA analysis identifies marine mollusc shells as new metagenomic archives of the past.

    Science.gov (United States)

    Der Sarkissian, Clio; Pichereau, Vianney; Dupont, Catherine; Ilsøe, Peter C; Perrigault, Mickael; Butler, Paul; Chauvaud, Laurent; Eiríksson, Jón; Scourse, James; Paillard, Christine; Orlando, Ludovic

    2017-09-01

    Marine mollusc shells enclose a wealth of information on coastal organisms and their environment. Their life history traits as well as (palaeo-) environmental conditions, including temperature, food availability, salinity and pollution, can be traced through the analysis of their shell (micro-) structure and biogeochemical composition. Adding to this list, the DNA entrapped in shell carbonate biominerals potentially offers a novel and complementary proxy both for reconstructing palaeoenvironments and tracking mollusc evolutionary trajectories. Here, we assess this potential by applying DNA extraction, high-throughput shotgun DNA sequencing and metagenomic analyses to marine mollusc shells spanning the last ~7,000 years. We report successful DNA extraction from shells, including a variety of ancient specimens, and find that DNA recovery is highly dependent on their biomineral structure, carbonate layer preservation and disease state. We demonstrate positive taxonomic identification of mollusc species using a combination of mitochondrial DNA genomes, barcodes, genome-scale data and metagenomic approaches. We also find shell biominerals to contain a diversity of microbial DNA from the marine environment. Finally, we reconstruct genomic sequences of organisms closely related to the Vibrio tapetis bacteria from Manila clam shells previously diagnosed with Brown Ring Disease. Our results reveal marine mollusc shells as novel genetic archives of the past, which opens new perspectives in ancient DNA research, with the potential to reconstruct the evolutionary history of molluscs, microbial communities and pathogens in the face of environmental changes. Other future applications include conservation of endangered mollusc species and aquaculture management. © 2017 John Wiley & Sons Ltd.

  2. Web Resources for Metagenomics Studies

    Directory of Open Access Journals (Sweden)

    Pravin Dudhagara

    2015-10-01

    Full Text Available The development of next-generation sequencing (NGS platforms spawned an enormous volume of data. This explosion in data has unearthed new scalability challenges for existing bioinformatics tools. The analysis of metagenomic sequences using bioinformatics pipelines is complicated by the substantial complexity of these data. In this article, we review several commonly-used online tools for metagenomics data analysis with respect to their quality and detail of analysis using simulated metagenomics data. There are at least a dozen such software tools presently available in the public domain. Among them, MGRAST, IMG/M, and METAVIR are the most well-known tools according to the number of citations by peer-reviewed scientific media up to mid-2015. Here, we describe 12 online tools with respect to their web link, annotation pipelines, clustering methods, online user support, and availability of data storage. We have also done the rating for each tool to screen more potential and preferential tools and evaluated five best tools using synthetic metagenome. The article comprehensively deals with the contemporary problems and the prospects of metagenomics from a bioinformatics viewpoint.

  3. Information extraction and knowledge graph construction from geoscience literature

    Science.gov (United States)

    Wang, Chengbin; Ma, Xiaogang; Chen, Jianguo; Chen, Jingwen

    2018-03-01

    Geoscience literature published online is an important part of open data, and brings both challenges and opportunities for data analysis. Compared with studies of numerical geoscience data, there are limited works on information extraction and knowledge discovery from textual geoscience data. This paper presents a workflow and a few empirical case studies for that topic, with a focus on documents written in Chinese. First, we set up a hybrid corpus combining the generic and geology terms from geology dictionaries to train Chinese word segmentation rules of the Conditional Random Fields model. Second, we used the word segmentation rules to parse documents into individual words, and removed the stop-words from the segmentation results to get a corpus constituted of content-words. Third, we used a statistical method to analyze the semantic links between content-words, and we selected the chord and bigram graphs to visualize the content-words and their links as nodes and edges in a knowledge graph, respectively. The resulting graph presents a clear overview of key information in an unstructured document. This study proves the usefulness of the designed workflow, and shows the potential of leveraging natural language processing and knowledge graph technologies for geoscience.

  4. Data Assimilation to Extract Soil Moisture Information from SMAP Observations

    Directory of Open Access Journals (Sweden)

    Jana Kolassa

    2017-11-01

    Full Text Available This study compares different methods to extract soil moisture information through the assimilation of Soil Moisture Active Passive (SMAP observations. Neural network (NN and physically-based SMAP soil moisture retrievals were assimilated into the National Aeronautics and Space Administration (NASA Catchment model over the contiguous United States for April 2015 to March 2017. By construction, the NN retrievals are consistent with the global climatology of the Catchment model soil moisture. Assimilating the NN retrievals without further bias correction improved the surface and root zone correlations against in situ measurements from 14 SMAP core validation sites (CVS by 0.12 and 0.16, respectively, over the model-only skill, and reduced the surface and root zone unbiased root-mean-square error (ubRMSE by 0.005 m 3 m − 3 and 0.001 m 3 m − 3 , respectively. The assimilation reduced the average absolute surface bias against the CVS measurements by 0.009 m 3 m − 3 , but increased the root zone bias by 0.014 m 3 m − 3 . Assimilating the NN retrievals after a localized bias correction yielded slightly lower surface correlation and ubRMSE improvements, but generally the skill differences were small. The assimilation of the physically-based SMAP Level-2 passive soil moisture retrievals using a global bias correction yielded similar skill improvements, as did the direct assimilation of locally bias-corrected SMAP brightness temperatures within the SMAP Level-4 soil moisture algorithm. The results show that global bias correction methods may be able to extract more independent information from SMAP observations compared to local bias correction methods, but without accurate quality control and observation error characterization they are also more vulnerable to adverse effects from retrieval errors related to uncertainties in the retrieval inputs and algorithm. Furthermore, the results show that using global bias correction approaches without a

  5. Multi-Filter String Matching and Human-Centric Entity Matching for Information Extraction

    Science.gov (United States)

    Sun, Chong

    2012-01-01

    More and more information is being generated in text documents, such as Web pages, emails and blogs. To effectively manage this unstructured information, one broadly used approach includes locating relevant content in documents, extracting structured information and integrating the extracted information for querying, mining or further analysis. In…

  6. A probabilistic model to recover individual genomes from metagenomes

    NARCIS (Netherlands)

    J. Dröge (Johannes); A. Schönhuth (Alexander); A.C. McHardy (Alice)

    2017-01-01

    textabstractShotgun metagenomics of microbial communities reveal information about strains of relevance for applications in medicine, biotechnology and ecology. Recovering their genomes is a crucial but very challenging step due to the complexity of the underlying biological system and technical

  7. Earth Science Data Analytics: Preparing for Extracting Knowledge from Information

    Science.gov (United States)

    Kempler, Steven; Barbieri, Lindsay

    2016-01-01

    Data analytics is the process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations and other useful information. Data analytics is a broad term that includes data analysis, as well as an understanding of the cognitive processes an analyst uses to understand problems and explore data in meaningful ways. Analytics also include data extraction, transformation, and reduction, utilizing specific tools, techniques, and methods. Turning to data science, definitions of data science sound very similar to those of data analytics (which leads to a lot of the confusion between the two). But the skills needed for both, co-analyzing large amounts of heterogeneous data, understanding and utilizing relevant tools and techniques, and subject matter expertise, although similar, serve different purposes. Data Analytics takes on a practitioners approach to applying expertise and skills to solve issues and gain subject knowledge. Data Science, is more theoretical (research in itself) in nature, providing strategic actionable insights and new innovative methodologies. Earth Science Data Analytics (ESDA) is the process of examining, preparing, reducing, and analyzing large amounts of spatial (multi-dimensional), temporal, or spectral data using a variety of data types to uncover patterns, correlations and other information, to better understand our Earth. The large variety of datasets (temporal spatial differences, data types, formats, etc.) invite the need for data analytics skills that understand the science domain, and data preparation, reduction, and analysis techniques, from a practitioners point of view. The application of these skills to ESDA is the focus of this presentation. The Earth Science Information Partners (ESIP) Federation Earth Science Data Analytics (ESDA) Cluster was created in recognition of the practical need to facilitate the co-analysis of large amounts of data and information for Earth science. Thus, from a to

  8. Testing the reliability of information extracted from ancient zircon

    Science.gov (United States)

    Kielman, Ross; Whitehouse, Martin; Nemchin, Alexander

    2015-04-01

    Studies combining zircon U-Pb chronology, trace element distribution as well as O and Hf isotope systematics are a powerful way to gain understanding of the processes shaping Earth's evolution, especially in detrital populations where constraints from the original host are missing. Such studies of the Hadean detrital zircon population abundant in sedimentary rocks in Western Australia have involved analysis of an unusually large number of individual grains, but also highlighted potential problems with the approach, only apparent when multiple analyses are obtained from individual grains. A common feature of the Hadean as well as many early Archaean zircon populations is their apparent inhomogeneity, which reduces confidence in conclusions based on studies combining chemistry and isotopic characteristics of zircon. In order to test the reliability of information extracted from early Earth zircon, we report results from one of the first in-depth multi-method study of zircon from a relatively simple early Archean magmatic rock, used as an analogue to ancient detrital zircon. The approach involves making multiple SIMS analyses in individual grains in order to be comparable to the most advanced studies of detrital zircon populations. The investigated sample is a relatively undeformed, non-migmatitic ca. 3.8 Ga tonalite collected a few kms south of the Isua Greenstone Belt, southwest Greenland. Extracted zircon grains can be combined into three different groups based on the behavior of their U-Pb systems: (i) grains that show internally consistent and concordant ages and define an average age of 3805±15 Ma, taken to be the age of the rock, (ii) grains that are distributed close to the concordia line, but with significant variability between multiple analyses, suggesting an ancient Pb loss and (iii) grains that have multiple analyses distributed along a discordia pointing towards a zero intercept, indicating geologically recent Pb-loss. This overall behavior has

  9. Gene prediction in metagenomic fragments: A large scale machine learning approach

    Directory of Open Access Journals (Sweden)

    Morgenstern Burkhard

    2008-04-01

    Full Text Available Abstract Background Metagenomics is an approach to the characterization of microbial genomes via the direct isolation of genomic sequences from the environment without prior cultivation. The amount of metagenomic sequence data is growing fast while computational methods for metagenome analysis are still in their infancy. In contrast to genomic sequences of single species, which can usually be assembled and analyzed by many available methods, a large proportion of metagenome data remains as unassembled anonymous sequencing reads. One of the aims of all metagenomic sequencing projects is the identification of novel genes. Short length, for example, Sanger sequencing yields on average 700 bp fragments, and unknown phylogenetic origin of most fragments require approaches to gene prediction that are different from the currently available methods for genomes of single species. In particular, the large size of metagenomic samples requires fast and accurate methods with small numbers of false positive predictions. Results We introduce a novel gene prediction algorithm for metagenomic fragments based on a two-stage machine learning approach. In the first stage, we use linear discriminants for monocodon usage, dicodon usage and translation initiation sites to extract features from DNA sequences. In the second stage, an artificial neural network combines these features with open reading frame length and fragment GC-content to compute the probability that this open reading frame encodes a protein. This probability is used for the classification and scoring of gene candidates. With large scale training, our method provides fast single fragment predictions with good sensitivity and specificity on artificially fragmented genomic DNA. Additionally, this method is able to predict translation initiation sites accurately and distinguishes complete from incomplete genes with high reliability. Conclusion Large scale machine learning methods are well-suited for gene

  10. Challenges and opportunities in understanding microbial communities with metagenome assembly (accompanied by IPython Notebook tutorial)

    Science.gov (United States)

    Howe, Adina; Chain, Patrick S. G.

    2015-01-01

    Metagenomic investigations hold great promise for informing the genetics, physiology, and ecology of environmental microorganisms. Current challenges for metagenomic analysis are related to our ability to connect the dots between sequencing reads, their population of origin, and their encoding functions. Assembly-based methods reduce dataset size by extending overlapping reads into larger contiguous sequences (contigs), providing contextual information for genetic sequences that does not rely on existing references. These methods, however, tend to be computationally intensive and are again challenged by sequencing errors as well as by genomic repeats While numerous tools have been developed based on these methodological concepts, they present confounding choices and training requirements to metagenomic investigators. To help with accessibility to assembly tools, this review also includes an IPython Notebook metagenomic assembly tutorial. This tutorial has instructions for execution any operating system using Amazon Elastic Cloud Compute and guides users through downloading, assembly, and mapping reads to contigs of a mock microbiome metagenome. Despite its challenges, metagenomic analysis has already revealed novel insights into many environments on Earth. As software, training, and data continue to emerge, metagenomic data access and its discoveries will to grow. PMID:26217314

  11. Challenges and opportunities in understanding microbial communities with metagenome assembly (accompanied by IPython Notebook tutorial

    Directory of Open Access Journals (Sweden)

    Adina eHowe

    2015-07-01

    Full Text Available Metagenomic investigations hold great promise for informing the genetics, physiology, and ecology of environmental microorganisms. Current challenges for metagenomic analysis are related to our ability to connect the dots between sequencing reads, their population of origin, and their encoding functions. Assembly-based methods reduce dataset size by extending overlapping reads into larger contiguous sequences (contigs, providing contextual information for genetic sequences that does not rely on existing references. These methods, however, tend to be computationally intensive and are again challenged by sequencing errors as well as by genomic repeats While numerous tools have been developed based on these methodological concepts, they present confounding choices and training requirements to metagenomic investigators. To help with accessibility to assembly tools, this review also includes an IPython Notebook metagenomic assembly tutorial. This tutorial has instructions for execution any operating system using Amazon Elastic Cloud Compute and guides users through downloading, assembly, and mapping reads to contigs of a mock microbiome metagenome. Despite its challenges, metagenomic analysis has already revealed novel insights into many environments on Earth. As software, training, and data continue to emerge, metagenomic data access and its discoveries will to grow.

  12. Extraction of CT dose information from DICOM metadata: automated Matlab-based approach.

    Science.gov (United States)

    Dave, Jaydev K; Gingold, Eric L

    2013-01-01

    The purpose of this study was to extract exposure parameters and dose-relevant indexes of CT examinations from information embedded in DICOM metadata. DICOM dose report files were identified and retrieved from a PACS. An automated software program was used to extract from these files information from the structured elements in the DICOM metadata relevant to exposure. Extracting information from DICOM metadata eliminated potential errors inherent in techniques based on optical character recognition, yielding 100% accuracy.

  13. Medicaid Analytic eXtract (MAX) General Information

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Medicaid Analytic eXtract (MAX) data is a set of person-level data files on Medicaid eligibility, service utilization, and payments. The MAX data are created to...

  14. Marine Metagenome as A Resource for Novel Enzymes

    KAUST Repository

    Alma’abadi, Amani D.

    2015-11-10

    More than 99% of identified prokaryotes, including many from the marine environment, cannot be cultured in the laboratory. This lack of capability restricts our knowledge of microbial genetics and community ecology. Metagenomics, the culture-independent cloning of environmental DNAs that are isolated directly from an environmental sample, has already provided a wealth of information about the uncultured microbial world. It has also facilitated the discovery of novel biocatalysts by allowing researchers to probe directly into a huge diversity of enzymes within natural microbial communities. Recent advances in these studies have led to great interest in recruiting microbial enzymes for the development of environmentally-friendly industry. Although the metagenomics approach has many limitations, it is expected to provide not only scientific insights but also economic benefits, especially in industry. This review highlights the importance of metagenomics in mining microbial lipases, as an example, by using high-throughput techniques. In addition, we discuss challenges in the metagenomics as an important part of bioinformatics analysis in big data.

  15. Marine Metagenome as A Resource for Novel Enzymes

    Directory of Open Access Journals (Sweden)

    Amani D. Alma’abadi

    2015-10-01

    Full Text Available More than 99% of identified prokaryotes, including many from the marine environment, cannot be cultured in the laboratory. This lack of capability restricts our knowledge of microbial genetics and community ecology. Metagenomics, the culture-independent cloning of environmental DNAs that are isolated directly from an environmental sample, has already provided a wealth of information about the uncultured microbial world. It has also facilitated the discovery of novel biocatalysts by allowing researchers to probe directly into a huge diversity of enzymes within natural microbial communities. Recent advances in these studies have led to a great interest in recruiting microbial enzymes for the development of environmentally-friendly industry. Although the metagenomics approach has many limitations, it is expected to provide not only scientific insights but also economic benefits, especially in industry. This review highlights the importance of metagenomics in mining microbial lipases, as an example, by using high-throughput techniques. In addition, we discuss challenges in the metagenomics as an important part of bioinformatics analysis in big data.

  16. Deployment and Preparation of Metagenomic Analysis on the EELA Grid

    International Nuclear Information System (INIS)

    Aparicio, G.; Blanquer, I.; Hernandez, V.; Pignatelli, M.; Tamames, J.

    2007-01-01

    In many cases, the sequencing of the DNA of many microorganisms is hindered by the impossibility of growing significant samples of isolated specimens. Many bacteria cannot survive alone, and require the interaction with other organisms. In such cases, the information of the DNA available belongs to different kinds of organisms. Metagenomic studies aim at processing samples of multiple specimens to extract the genes and proteins that belong to the different species. This can be achieved through a process of extraction of fragment, comparison and analysis of the function. By the comparison to existing chains, whose function is well known, fragments can be classified. This process is computationally expensive and requires several iterations of alignment and phylogeny classification steps. Source samples reach several millions of sequences, which could reach up to thousands of nucleotides each. These sequences are compared to a selected part of the N on-redundant d atabase which only implies the information from eukaryotic species. From this first analysis, a refining process is performed and alignment analysis is restarted from the results. This process implies several CPU years. An environment has been developed to fragment, automate and check the above operations. This environment has been tuned-up from an experimental study which has tested the most efficient and reliable resources, the optimal job size, and the data transference and database reindexation overhead. The environment should re-submit faulty jobs, detect endless tasks and ensure that the results are correctly retrieved and work flow synchronised. The paper will give an outline on the structure of the system, and the preparation steps performed to deal with this experiment. (Author)

  17. Information Extraction with Character-level Neural Networks and Free Noisy Supervision

    OpenAIRE

    Meerkamp, Philipp; Zhou, Zhengyi

    2016-01-01

    We present an architecture for information extraction from text that augments an existing parser with a character-level neural network. The network is trained using a measure of consistency of extracted data with existing databases as a form of noisy supervision. Our architecture combines the ability of constraint-based information extraction systems to easily incorporate domain knowledge and constraints with the ability of deep neural networks to leverage large amounts of data to learn compl...

  18. Metagenomic analysis of microbial communities and beyond

    DEFF Research Database (Denmark)

    Schreiber, Lars

    2014-01-01

    From small clone libraries to large next-generation sequencing datasets – the field of community genomics or metagenomics has developed tremendously within the last years. This chapter will summarize some of these developments and will also highlight pitfalls of current metagenomic analyses...... heterologous expression of metagenomic DNA fragments to discover novel metabolic functions. Lastly, the chapter will shortly discuss the meta-analysis of gene expression of microbial communities, more precisely metatranscriptomics and metaproteomics....

  19. Metagenomic analysis of the airborne environment in urban spaces.

    Science.gov (United States)

    Be, Nicholas A; Thissen, James B; Fofanov, Viacheslav Y; Allen, Jonathan E; Rojas, Mark; Golovko, George; Fofanov, Yuriy; Koshinsky, Heather; Jaing, Crystal J

    2015-02-01

    The organisms in aerosol microenvironments, especially densely populated urban areas, are relevant to maintenance of public health and detection of potential epidemic or biothreat agents. To examine aerosolized microorganisms in this environment, we performed sequencing on the material from an urban aerosol surveillance program. Whole metagenome sequencing was applied to DNA extracted from air filters obtained during periods from each of the four seasons. The composition of bacteria, plants, fungi, invertebrates, and viruses demonstrated distinct temporal shifts. Bacillus thuringiensis serovar kurstaki was detected in samples known to be exposed to aerosolized spores, illustrating the potential utility of this approach for identification of intentionally introduced microbial agents. Together, these data demonstrate the temporally dependent metagenomic complexity of urban aerosols and the potential of genomic analytical techniques for biosurveillance and monitoring of threats to public health.

  20. Semantics-based information extraction for detecting economic events

    NARCIS (Netherlands)

    A.C. Hogenboom (Alexander); F. Frasincar (Flavius); K. Schouten (Kim); O. van der Meer

    2013-01-01

    textabstractAs today's financial markets are sensitive to breaking news on economic events, accurate and timely automatic identification of events in news items is crucial. Unstructured news items originating from many heterogeneous sources have to be mined in order to extract knowledge useful for

  1. Tagline: Information Extraction for Semi-Structured Text Elements in Medical Progress Notes

    Science.gov (United States)

    Finch, Dezon Kile

    2012-01-01

    Text analysis has become an important research activity in the Department of Veterans Affairs (VA). Statistical text mining and natural language processing have been shown to be very effective for extracting useful information from medical documents. However, neither of these techniques is effective at extracting the information stored in…

  2. An Effective Approach to Biomedical Information Extraction with Limited Training Data

    Science.gov (United States)

    Jonnalagadda, Siddhartha

    2011-01-01

    In the current millennium, extensive use of computers and the internet caused an exponential increase in information. Few research areas are as important as information extraction, which primarily involves extracting concepts and the relations between them from free text. Limitations in the size of training data, lack of lexicons and lack of…

  3. Metagenomics and the protein universe

    Science.gov (United States)

    Godzik, Adam

    2011-01-01

    Metagenomics sequencing projects have dramatically increased our knowledge of the protein universe and provided over one-half of currently known protein sequences; they have also introduced a much broader phylogenetic diversity into the protein databases. The full analysis of metagenomic datasets is only beginning, but it has already led to the discovery of thousands of new protein families, likely representing novel functions specific to given environments. At the same time, a deeper analysis of such novel families, including experimental structure determination of some representatives, suggests that most of them represent distant homologs of already characterized protein families, and thus most of the protein diversity present in the new environments are due to functional divergence of the known protein families rather than the emergence of new ones. PMID:21497084

  4. A rapid extraction of landslide disaster information research based on GF-1 image

    Science.gov (United States)

    Wang, Sai; Xu, Suning; Peng, Ling; Wang, Zhiyi; Wang, Na

    2015-08-01

    In recent years, the landslide disasters occurred frequently because of the seismic activity. It brings great harm to people's life. It has caused high attention of the state and the extensive concern of society. In the field of geological disaster, landslide information extraction based on remote sensing has been controversial, but high resolution remote sensing image can improve the accuracy of information extraction effectively with its rich texture and geometry information. Therefore, it is feasible to extract the information of earthquake- triggered landslides with serious surface damage and large scale. Taking the Wenchuan county as the study area, this paper uses multi-scale segmentation method to extract the landslide image object through domestic GF-1 images and DEM data, which uses the estimation of scale parameter tool to determine the optimal segmentation scale; After analyzing the characteristics of landslide high-resolution image comprehensively and selecting spectrum feature, texture feature, geometric features and landform characteristics of the image, we can establish the extracting rules to extract landslide disaster information. The extraction results show that there are 20 landslide whose total area is 521279.31 .Compared with visual interpretation results, the extraction accuracy is 72.22%. This study indicates its efficient and feasible to extract earthquake landslide disaster information based on high resolution remote sensing and it provides important technical support for post-disaster emergency investigation and disaster assessment.

  5. Challenges and Opportunities of Airborne Metagenomics

    KAUST Repository

    Behzad, H.; Gojobori, Takashi; Mineta, K.

    2015-01-01

    microorganisms. Airborne metagenomic studies could also lead to discoveries of novel genes and metabolic pathways relevant to meteorological and industrial applications, environmental bioremediation, and biogeochemical cycles.

  6. Integrative Workflows for Metagenomic Analysis

    Directory of Open Access Journals (Sweden)

    Efthymios eLadoukakis

    2014-11-01

    Full Text Available The rapid evolution of all sequencing technologies, described by the term Next Generation Sequencing (NGS, have revolutionized metagenomic analysis. They constitute a combination of high-throughput analytical protocols, coupled to delicate measuring techniques, in order to potentially discover, properly assemble and map allelic sequences to the correct genomes, achieving particularly high yields for only a fraction of the cost of traditional processes (i.e. Sanger. From a bioinformatic perspective, this boils down to many gigabytes of data being generated from each single sequencing experiment, rendering the management or even the storage, critical bottlenecks with respect to the overall analytical endeavor. The enormous complexity is even more aggravated by the versatility of the processing steps available, represented by the numerous bioinformatic tools that are essential, for each analytical task, in order to fully unveil the genetic content of a metagenomic dataset. These disparate tasks range from simple, nonetheless non-trivial, quality control of raw data to exceptionally complex protein annotation procedures, requesting a high level of expertise for their proper application or the neat implementation of the whole workflow. Furthermore, a bioinformatic analysis of such scale, requires grand computational resources, imposing as the sole realistic solution, the utilization of cloud computing infrastructures. In this review article we discuss different, integrative, bioinformatic solutions available, which address the aforementioned issues, by performing a critical assessment of the available automated pipelines for data management, quality control and annotation of metagenomic data, embracing various, major sequencing technologies and applications.

  7. Towards an information extraction and knowledge formation framework based on Shannon entropy

    Directory of Open Access Journals (Sweden)

    Iliescu Dragoș

    2017-01-01

    Full Text Available Information quantity subject is approached in this paperwork, considering the specific domain of nonconforming product management as information source. This work represents a case study. Raw data were gathered from a heavy industrial works company, information extraction and knowledge formation being considered herein. Involved method for information quantity estimation is based on Shannon entropy formula. Information and entropy spectrum are decomposed and analysed for extraction of specific information and knowledge-that formation. The result of the entropy analysis point out the information needed to be acquired by the involved organisation, this being presented as a specific knowledge type.

  8. Metagenomic frameworks for monitoring antibiotic resistance in aquatic environments.

    Science.gov (United States)

    Port, Jesse A; Cullen, Alison C; Wallace, James C; Smith, Marissa N; Faustman, Elaine M

    2014-03-01

    High-throughput genomic technologies offer new approaches for environmental health monitoring, including metagenomic surveillance of antibiotic resistance determinants (ARDs). Although natural environments serve as reservoirs for antibiotic resistance genes that can be transferred to pathogenic and human commensal bacteria, monitoring of these determinants has been infrequent and incomplete. Furthermore, surveillance efforts have not been integrated into public health decision making. We used a metagenomic epidemiology-based approach to develop an ARD index that quantifies antibiotic resistance potential, and we analyzed this index for common modal patterns across environmental samples. We also explored how metagenomic data such as this index could be conceptually framed within an early risk management context. We analyzed 25 published data sets from shotgun pyrosequencing projects. The samples consisted of microbial community DNA collected from marine and freshwater environments across a gradient of human impact. We used principal component analysis to identify index patterns across samples. We observed significant differences in the overall index and index subcategory levels when comparing ecosystems more proximal versus distal to human impact. The selection of different sequence similarity thresholds strongly influenced the index measurements. Unique index subcategory modes distinguished the different metagenomes. Broad-scale screening of ARD potential using this index revealed utility for framing environmental health monitoring and surveillance. This approach holds promise as a screening tool for establishing baseline ARD levels that can be used to inform and prioritize decision making regarding management of ARD sources and human exposure routes. Port JA, Cullen AC, Wallace JC, Smith MN, Faustman EM. 2014. Metagenomic frameworks for monitoring antibiotic resistance in aquatic environments. Environ Health Perspect 122:222–228; http://dx.doi.org/10.1289/ehp

  9. Extracting local information from crowds through betting markets

    Science.gov (United States)

    Weijs, Steven

    2015-04-01

    In this research, a set-up is considered in which users can bet against a forecasting agency to challenge their probabilistic forecasts. From an information theory standpoint, a reward structure is considered that either provides the forecasting agency with better information, paying the successful providers of information for their winning bets, or funds excellent forecasting agencies through users that think they know better. Especially for local forecasts, the approach may help to diagnose model biases and to identify local predictive information that can be incorporated in the models. The challenges and opportunities for implementing such a system in practice are also discussed.

  10. Spoken Language Understanding Systems for Extracting Semantic Information from Speech

    CERN Document Server

    Tur, Gokhan

    2011-01-01

    Spoken language understanding (SLU) is an emerging field in between speech and language processing, investigating human/ machine and human/ human communication by leveraging technologies from signal processing, pattern recognition, machine learning and artificial intelligence. SLU systems are designed to extract the meaning from speech utterances and its applications are vast, from voice search in mobile devices to meeting summarization, attracting interest from both commercial and academic sectors. Both human/machine and human/human communications can benefit from the application of SLU, usin

  11. Sifting Through Chaos: Extracting Information from Unstructured Legal Opinions.

    Science.gov (United States)

    Oliveira, Bruno Miguel; Guimarães, Rui Vasconcellos; Antunes, Luís; Rodrigues, Pedro Pereira

    2018-01-01

    Abiding to the law is, in some cases, a delicate balance between the rights of different players. Re-using health records is such a case. While the law grants reuse rights to public administration documents, in which health records produced in public health institutions are included, it also grants privacy to personal records. To safeguard a correct usage of data, public hospitals in Portugal employ jurists that are responsible for allowing or withholding access rights to health records. To help decision making, these jurists can consult the legal opinions issued by the national committee on public administration documents usage. While these legal opinions are of undeniable value, due to their doctrine contribution, they are only available in a format best suited from printing, forcing individual consultation of each document, with no option, whatsoever of clustered search, filtering or indexing, which are standard operations nowadays in a document management system. When having to decide on tens of data requests a day, it becomes unfeasible to consult the hundreds of legal opinions already available. With the objective to create a modern document management system, we devised an open, platform agnostic system that extracts and compiles the legal opinions, ex-tracts its contents and produces metadata, allowing for a fast searching and filtering of said legal opinions.

  12. Back to the Future of Soil Metagenomics.\

    Czech Academy of Sciences Publication Activity Database

    Nesme J, J.; Achouak, W.; Agathos SN, S.N.; Bailey, M.; Baldrian, Petr; Brunel, D.; Frostegård, Å.; Heulin, T.; Jansson JK, J.K.; Jurkevitch, E.; Kruus, K.L.; Kowalchuk, G.A.; Lagares, A.; Lapin-Scott, H.M.; Lemanceau, P.; Le Paslier, D.; Mandic-Mulec, I.; Murrell, J.C.; Myrold, D.D.; Nalin, R.; Nannipieri, P.; Neufeld, J.D.; O'Gara, F.; Parnell, J.J.; Pühler, A.; Pylro, V.; Ramos, J.L.; Roesch, L.F.; Schloter, M.; Schleper, C.; Sczyrba, A.; Sessitsch, A.; Sjöling, S.; Sørensen, J.; Sørensen, S.J.; Tebbe, C.C.; Topp, E.; Tsiamis, G.; van Elsas, J.D.; van Keulen, G.; Widmer, F.; Wagner, M.; Zhang, T.; Zhang, X.; Zhao, L; Zhu, Y-G.; Vogel, T.M.; Simonet, P.

    2016-01-01

    Roč. 7, FEB 10 (2016), s. 73 ISSN 1664-302X Institutional support: RVO:61388971 Keywords : metagenomic * soil microbiology; terrestrial microbiology * metagenomic; soil microbiology; terrestrial microbiology Subject RIV: EE - Microbiology, Virology Impact factor: 4.076, year: 2016

  13. Information extraction from FN plots of tungsten microemitters

    Energy Technology Data Exchange (ETDEWEB)

    Mussa, Khalil O. [Department of Physics, Mu' tah University, Al-Karak (Jordan); Mousa, Marwan S., E-mail: mmousa@mutah.edu.jo [Department of Physics, Mu' tah University, Al-Karak (Jordan); Fischer, Andreas, E-mail: andreas.fischer@physik.tu-chemnitz.de [Institut für Physik, Technische Universität Chemnitz, Chemnitz (Germany)

    2013-09-15

    Tungsten based microemitter tips have been prepared both clean and coated with dielectric materials. For clean tungsten tips, apex radii have been varied ranging from 25 to 500 nm. These tips were manufactured by electrochemical etching a 0.1 mm diameter high purity (99.95%) tungsten wire at the meniscus of two molar NaOH solution. Composite micro-emitters considered here are consisting of a tungsten core coated with different dielectric materials—such as magnesium oxide (MgO), sodium hydroxide (NaOH), tetracyanoethylene (TCNE), and zinc oxide (ZnO). It is worthwhile noting here, that the rather unconventional NaOH coating has shown several interesting properties. Various properties of these emitters were measured including current–voltage (IV) characteristics and the physical shape of the tips. A conventional field emission microscope (FEM) with a tip (cathode)–screen (anode) separation standardized at 10 mm was used to electrically characterize the electron emitters. The system was evacuated down to a base pressure of ∼10{sup −8}mbar when baked at up to ∼180°C overnight. This allowed measurements of typical field electron emission (FE) characteristics, namely the IV characteristics and the emission images on a conductive phosphorus screen (the anode). Mechanical characterization has been performed through a FEI scanning electron microscope (SEM). Within this work, the mentioned experimental results are connected to the theory for analyzing Fowler–Nordheim (FN) plots. We compared and evaluated the data extracted from clean tungsten tips of different radii and determined deviations between the results of different extraction methods applied. In particular, we derived the apex radii of several clean and coated tungsten tips by both SEM imaging and analyzing FN plots. The aim of this analysis is to support the ongoing discussion on recently developed improvements of the theory for analyzing FN plots related to metal field electron emitters, which in

  14. Information extraction from FN plots of tungsten microemitters

    International Nuclear Information System (INIS)

    Mussa, Khalil O.; Mousa, Marwan S.; Fischer, Andreas

    2013-01-01

    Tungsten based microemitter tips have been prepared both clean and coated with dielectric materials. For clean tungsten tips, apex radii have been varied ranging from 25 to 500 nm. These tips were manufactured by electrochemical etching a 0.1 mm diameter high purity (99.95%) tungsten wire at the meniscus of two molar NaOH solution. Composite micro-emitters considered here are consisting of a tungsten core coated with different dielectric materials—such as magnesium oxide (MgO), sodium hydroxide (NaOH), tetracyanoethylene (TCNE), and zinc oxide (ZnO). It is worthwhile noting here, that the rather unconventional NaOH coating has shown several interesting properties. Various properties of these emitters were measured including current–voltage (IV) characteristics and the physical shape of the tips. A conventional field emission microscope (FEM) with a tip (cathode)–screen (anode) separation standardized at 10 mm was used to electrically characterize the electron emitters. The system was evacuated down to a base pressure of ∼10 −8 mbar when baked at up to ∼180°C overnight. This allowed measurements of typical field electron emission (FE) characteristics, namely the IV characteristics and the emission images on a conductive phosphorus screen (the anode). Mechanical characterization has been performed through a FEI scanning electron microscope (SEM). Within this work, the mentioned experimental results are connected to the theory for analyzing Fowler–Nordheim (FN) plots. We compared and evaluated the data extracted from clean tungsten tips of different radii and determined deviations between the results of different extraction methods applied. In particular, we derived the apex radii of several clean and coated tungsten tips by both SEM imaging and analyzing FN plots. The aim of this analysis is to support the ongoing discussion on recently developed improvements of the theory for analyzing FN plots related to metal field electron emitters, which in

  15. BeerDeCoded: the open beer metagenome project.

    Science.gov (United States)

    Sobel, Jonathan; Henry, Luc; Rotman, Nicolas; Rando, Gianpaolo

    2017-01-01

    Next generation sequencing has radically changed research in the life sciences, in both academic and corporate laboratories. The potential impact is tremendous, yet a majority of citizens have little or no understanding of the technological and ethical aspects of this widespread adoption. We designed BeerDeCoded as a pretext to discuss the societal issues related to genomic and metagenomic data with fellow citizens, while advancing scientific knowledge of the most popular beverage of all. In the spirit of citizen science, sample collection and DNA extraction were carried out with the participation of non-scientists in the community laboratory of Hackuarium, a not-for-profit organisation that supports unconventional research and promotes the public understanding of science. The dataset presented herein contains the targeted metagenomic profile of 39 bottled beers from 5 countries, based on internal transcribed spacer (ITS) sequencing of fungal species. A preliminary analysis reveals the presence of a large diversity of wild yeast species in commercial brews. With this project, we demonstrate that coupling simple laboratory procedures that can be carried out in a non-professional environment with state-of-the-art sequencing technologies and targeted metagenomic analyses, can lead to the detection and identification of the microbial content in bottled beer.

  16. MOCAT: a metagenomics assembly and gene prediction toolkit.

    Science.gov (United States)

    Kultima, Jens Roat; Sunagawa, Shinichi; Li, Junhua; Chen, Weineng; Chen, Hua; Mende, Daniel R; Arumugam, Manimozhiyan; Pan, Qi; Liu, Binghang; Qin, Junjie; Wang, Jun; Bork, Peer

    2012-01-01

    MOCAT is a highly configurable, modular pipeline for fast, standardized processing of single or paired-end sequencing data generated by the Illumina platform. The pipeline uses state-of-the-art programs to quality control, map, and assemble reads from metagenomic samples sequenced at a depth of several billion base pairs, and predict protein-coding genes on assembled metagenomes. Mapping against reference databases allows for read extraction or removal, as well as abundance calculations. Relevant statistics for each processing step can be summarized into multi-sheet Excel documents and queryable SQL databases. MOCAT runs on UNIX machines and integrates seamlessly with the SGE and PBS queuing systems, commonly used to process large datasets. The open source code and modular architecture allow users to modify or exchange the programs that are utilized in the various processing steps. Individual processing steps and parameters were benchmarked and tested on artificial, real, and simulated metagenomes resulting in an improvement of selected quality metrics. MOCAT can be freely downloaded at http://www.bork.embl.de/mocat/.

  17. MOCAT: a metagenomics assembly and gene prediction toolkit.

    Directory of Open Access Journals (Sweden)

    Jens Roat Kultima

    Full Text Available MOCAT is a highly configurable, modular pipeline for fast, standardized processing of single or paired-end sequencing data generated by the Illumina platform. The pipeline uses state-of-the-art programs to quality control, map, and assemble reads from metagenomic samples sequenced at a depth of several billion base pairs, and predict protein-coding genes on assembled metagenomes. Mapping against reference databases allows for read extraction or removal, as well as abundance calculations. Relevant statistics for each processing step can be summarized into multi-sheet Excel documents and queryable SQL databases. MOCAT runs on UNIX machines and integrates seamlessly with the SGE and PBS queuing systems, commonly used to process large datasets. The open source code and modular architecture allow users to modify or exchange the programs that are utilized in the various processing steps. Individual processing steps and parameters were benchmarked and tested on artificial, real, and simulated metagenomes resulting in an improvement of selected quality metrics. MOCAT can be freely downloaded at http://www.bork.embl.de/mocat/.

  18. Cyclodipeptides from metagenomic library of a japanese marine sponge

    Energy Technology Data Exchange (ETDEWEB)

    He, Rui; Wang, Bochu; Zhub, Liancai, E-mail: wangbc2000@126.com [Bioengineering College, Chongqing University, Chongqing, (China); Wang, Manyuan [School of Traditional Chinese Medicine, Capital University of Medical Sciences, Beijing (China); Wakimoto, Toshiyuki; Abe, Ikuro, E-mail: abei@mol.f.u-tokyo.ac.jp [Graduate School of Pharmaceutical Sciences, The University of Tokyo, Tokyo (Japan)

    2013-12-01

    Culture-independent metagenomics is an attractive and promising approach to explore unique bioactive small molecules from marine sponges harboring uncultured symbiotic microbes. Therefore, we conducted functional screening of the metagenomic library constructed from the Japanese marine sponge Discodermia calyx. Bioassay-guided fractionation of plate culture extract of antibacterial clone pDC113 afforded eleven cyclodipeptides: Cyclo(l-Thr-l-Leu) (1), Cyclo(l-Val-d-Pro) (2), Cyclo(l-Ile-d-Pro) (3), Cyclo(l-Leu-l-Pro) (4), Cyclo(l-Val-l-Leu) (5), Cyclo(l-Leu-l-Ile) (6), Cyclo(l-Leu-l-Leu) (7), Cyclo(l-Phe-l-Tyr) (8), Cyclo(l-Trp-l-Pro) (9), Cyclo(l-Val-l-Trp) (10) and Cyclo(l-Ile-l-Trp) (11). To the best of our knowledge, these are first cyclodepeptides isolated from metagenomic library. Sequence analysis suggested that isolated cyclodipeptides were not synthesized by nonribosomal peptide synthetases and there was no significant indication of cyclodipeptide synthetases. (author)

  19. Culture-independent discovery of natural products from soil metagenomes.

    Science.gov (United States)

    Katz, Micah; Hover, Bradley M; Brady, Sean F

    2016-03-01

    Bacterial natural products have proven to be invaluable starting points in the development of many currently used therapeutic agents. Unfortunately, traditional culture-based methods for natural product discovery have been deemphasized by pharmaceutical companies due in large part to high rediscovery rates. Culture-independent, or "metagenomic," methods, which rely on the heterologous expression of DNA extracted directly from environmental samples (eDNA), have the potential to provide access to metabolites encoded by a large fraction of the earth's microbial biosynthetic diversity. As soil is both ubiquitous and rich in bacterial diversity, it is an appealing starting point for culture-independent natural product discovery efforts. This review provides an overview of the history of soil metagenome-driven natural product discovery studies and elaborates on the recent development of new tools for sequence-based, high-throughput profiling of environmental samples used in discovering novel natural product biosynthetic gene clusters. We conclude with several examples of these new tools being employed to facilitate the recovery of novel secondary metabolite encoding gene clusters from soil metagenomes and the subsequent heterologous expression of these clusters to produce bioactive small molecules.

  20. Cyclodipeptides from metagenomic library of a japanese marine sponge

    International Nuclear Information System (INIS)

    He, Rui; Wang, Bochu; Zhub, Liancai; Wang, Manyuan; Wakimoto, Toshiyuki; Abe, Ikuro

    2013-01-01

    Culture-independent metagenomics is an attractive and promising approach to explore unique bioactive small molecules from marine sponges harboring uncultured symbiotic microbes. Therefore, we conducted functional screening of the metagenomic library constructed from the Japanese marine sponge Discodermia calyx. Bioassay-guided fractionation of plate culture extract of antibacterial clone pDC113 afforded eleven cyclodipeptides: Cyclo(l-Thr-l-Leu) (1), Cyclo(l-Val-d-Pro) (2), Cyclo(l-Ile-d-Pro) (3), Cyclo(l-Leu-l-Pro) (4), Cyclo(l-Val-l-Leu) (5), Cyclo(l-Leu-l-Ile) (6), Cyclo(l-Leu-l-Leu) (7), Cyclo(l-Phe-l-Tyr) (8), Cyclo(l-Trp-l-Pro) (9), Cyclo(l-Val-l-Trp) (10) and Cyclo(l-Ile-l-Trp) (11). To the best of our knowledge, these are first cyclodepeptides isolated from metagenomic library. Sequence analysis suggested that isolated cyclodipeptides were not synthesized by nonribosomal peptide synthetases and there was no significant indication of cyclodipeptide synthetases. (author)

  1. Metagenomic applications in environmental monitoring and bioremediation.

    Science.gov (United States)

    Techtmann, Stephen M; Hazen, Terry C

    2016-10-01

    With the rapid advances in sequencing technology, the cost of sequencing has dramatically dropped and the scale of sequencing projects has increased accordingly. This has provided the opportunity for the routine use of sequencing techniques in the monitoring of environmental microbes. While metagenomic applications have been routinely applied to better understand the ecology and diversity of microbes, their use in environmental monitoring and bioremediation is increasingly common. In this review we seek to provide an overview of some of the metagenomic techniques used in environmental systems biology, addressing their application and limitation. We will also provide several recent examples of the application of metagenomics to bioremediation. We discuss examples where microbial communities have been used to predict the presence and extent of contamination, examples of how metagenomics can be used to characterize the process of natural attenuation by unculturable microbes, as well as examples detailing the use of metagenomics to understand the impact of biostimulation on microbial communities.

  2. Optimal Information Extraction of Laser Scanning Dataset by Scale-Adaptive Reduction

    Science.gov (United States)

    Zang, Y.; Yang, B.

    2018-04-01

    3D laser technology is widely used to collocate the surface information of object. For various applications, we need to extract a good perceptual quality point cloud from the scanned points. To solve the problem, most of existing methods extract important points based on a fixed scale. However, geometric features of 3D object come from various geometric scales. We propose a multi-scale construction method based on radial basis function. For each scale, important points are extracted from the point cloud based on their importance. We apply a perception metric Just-Noticeable-Difference to measure degradation of each geometric scale. Finally, scale-adaptive optimal information extraction is realized. Experiments are undertaken to evaluate the effective of the proposed method, suggesting a reliable solution for optimal information extraction of object.

  3. OPTIMAL INFORMATION EXTRACTION OF LASER SCANNING DATASET BY SCALE-ADAPTIVE REDUCTION

    Directory of Open Access Journals (Sweden)

    Y. Zang

    2018-04-01

    Full Text Available 3D laser technology is widely used to collocate the surface information of object. For various applications, we need to extract a good perceptual quality point cloud from the scanned points. To solve the problem, most of existing methods extract important points based on a fixed scale. However, geometric features of 3D object come from various geometric scales. We propose a multi-scale construction method based on radial basis function. For each scale, important points are extracted from the point cloud based on their importance. We apply a perception metric Just-Noticeable-Difference to measure degradation of each geometric scale. Finally, scale-adaptive optimal information extraction is realized. Experiments are undertaken to evaluate the effective of the proposed method, suggesting a reliable solution for optimal information extraction of object.

  4. Metagenomics: The Next Culture-Independent Game Changer

    Directory of Open Access Journals (Sweden)

    Jessica D. Forbes

    2017-07-01

    Full Text Available A trend towards the abandonment of obtaining pure culture isolates in frontline laboratories is at a crossroads with the ability of public health agencies to perform their basic mandate of foodborne disease surveillance and response. The implementation of culture-independent diagnostic tests (CIDTs including nucleic acid and antigen-based assays for acute gastroenteritis is leaving public health agencies without laboratory evidence to link clinical cases to each other and to food or environmental substances. This limits the efficacy of public health epidemiology and surveillance as well as outbreak detection and investigation. Foodborne outbreaks have the potential to remain undetected or have insufficient evidence to support source attribution and may inadvertently increase the incidence of foodborne diseases. Next-generation sequencing of pure culture isolates in clinical microbiology laboratories has the potential to revolutionize the fields of food safety and public health. Metagenomics and other ‘omics’ disciplines could provide the solution to a cultureless future in clinical microbiology, food safety and public health. Data mining of information obtained from metagenomics assays can be particularly useful for the identification of clinical causative agents or foodborne contamination, detection of AMR and/or virulence factors, in addition to providing high-resolution subtyping data. Thus, metagenomics assays may provide a universal test for clinical diagnostics, foodborne pathogen detection, subtyping and investigation. This information has the potential to reform the field of enteric disease diagnostics and surveillance and also infectious diseases as a whole. The aim of this review will be to present the current state of CIDTs in diagnostic and public health laboratories as they relate to foodborne illness and food safety. Moreover, we will also discuss the diagnostic and subtyping utility and concomitant bias limitations of

  5. Information extraction from FN plots of tungsten microemitters.

    Science.gov (United States)

    Mussa, Khalil O; Mousa, Marwan S; Fischer, Andreas

    2013-09-01

    Tungsten based microemitter tips have been prepared both clean and coated with dielectric materials. For clean tungsten tips, apex radii have been varied ranging from 25 to 500 nm. These tips were manufactured by electrochemical etching a 0.1 mm diameter high purity (99.95%) tungsten wire at the meniscus of two molar NaOH solution. Composite micro-emitters considered here are consisting of a tungsten core coated with different dielectric materials-such as magnesium oxide (MgO), sodium hydroxide (NaOH), tetracyanoethylene (TCNE), and zinc oxide (ZnO). It is worthwhile noting here, that the rather unconventional NaOH coating has shown several interesting properties. Various properties of these emitters were measured including current-voltage (IV) characteristics and the physical shape of the tips. A conventional field emission microscope (FEM) with a tip (cathode)-screen (anode) separation standardized at 10 mm was used to electrically characterize the electron emitters. The system was evacuated down to a base pressure of ∼10(-8) mbar when baked at up to ∼180 °C overnight. This allowed measurements of typical field electron emission (FE) characteristics, namely the IV characteristics and the emission images on a conductive phosphorus screen (the anode). Mechanical characterization has been performed through a FEI scanning electron microscope (SEM). Within this work, the mentioned experimental results are connected to the theory for analyzing Fowler-Nordheim (FN) plots. We compared and evaluated the data extracted from clean tungsten tips of different radii and determined deviations between the results of different extraction methods applied. In particular, we derived the apex radii of several clean and coated tungsten tips by both SEM imaging and analyzing FN plots. The aim of this analysis is to support the ongoing discussion on recently developed improvements of the theory for analyzing FN plots related to metal field electron emitters, which in particular

  6. A Delphi Technology Foresight Study: Mapping Social Construction of Scientific Evidence on Metagenomics Tests for Water Safety.

    Directory of Open Access Journals (Sweden)

    Stanislav Birko

    Full Text Available Access to clean water is a grand challenge in the 21st century. Water safety testing for pathogens currently depends on surrogate measures such as fecal indicator bacteria (e.g., E. coli. Metagenomics concerns high-throughput, culture-independent, unbiased shotgun sequencing of DNA from environmental samples that might transform water safety by detecting waterborne pathogens directly instead of their surrogates. Yet emerging innovations such as metagenomics are often fiercely contested. Innovations are subject to shaping/construction not only by technology but also social systems/values in which they are embedded, such as experts' attitudes towards new scientific evidence. We conducted a classic three-round Delphi survey, comprised of 107 questions. A multidisciplinary expert panel (n = 24 representing the continuum of discovery scientists and policymakers evaluated the emergence of metagenomics tests. To the best of our knowledge, we report here the first Delphi foresight study of experts' attitudes on (1 the top 10 priority evidentiary criteria for adoption of metagenomics tests for water safety, (2 the specific issues critical to governance of metagenomics innovation trajectory where there is consensus or dissensus among experts, (3 the anticipated time lapse from discovery to practice of metagenomics tests, and (4 the role and timing of public engagement in development of metagenomics tests. The ability of a test to distinguish between harmful and benign waterborne organisms, analytical/clinical sensitivity, and reproducibility were the top three evidentiary criteria for adoption of metagenomics. Experts agree that metagenomic testing will provide novel information but there is dissensus on whether metagenomics will replace the current water safety testing methods or impact the public health end points (e.g., reduction in boil water advisories. Interestingly, experts view the publics relevant in a "downstream capacity" for adoption of

  7. Study on methods and techniques of aeroradiometric weak information extraction for sandstone-hosted uranium deposits based on GIS

    International Nuclear Information System (INIS)

    Han Shaoyang; Ke Dan; Hou Huiqun

    2005-01-01

    The weak information extraction is one of the important research contents in the current sandstone-type uranium prospecting in China. This paper introduces the connotation of aeroradiometric weak information extraction, and discusses the formation theories of aeroradiometric weak information extraction, and discusses the formation theories of aeroradiometric weak information and establishes some effective mathematic models for weak information extraction. Models for weak information extraction are realized based on GIS software platform. Application tests of weak information extraction are realized based on GIS software platform. Application tests of weak information extraction are completed in known uranium mineralized areas. Research results prove that the prospective areas of sandstone-type uranium deposits can be rapidly delineated by extracting aeroradiometric weak information. (authors)

  8. Extraction of Graph Information Based on Image Contents and the Use of Ontology

    Science.gov (United States)

    Kanjanawattana, Sarunya; Kimura, Masaomi

    2016-01-01

    A graph is an effective form of data representation used to summarize complex information. Explicit information such as the relationship between the X- and Y-axes can be easily extracted from a graph by applying human intelligence. However, implicit knowledge such as information obtained from other related concepts in an ontology also resides in…

  9. Extracting information of fixational eye movements through pupil tracking

    Science.gov (United States)

    Xiao, JiangWei; Qiu, Jian; Luo, Kaiqin; Peng, Li; Han, Peng

    2018-01-01

    Human eyes are never completely static even when they are fixing a stationary point. These irregular, small movements, which consist of micro-tremors, micro-saccades and drifts, can prevent the fading of the images that enter our eyes. The importance of researching the fixational eye movements has been experimentally demonstrated recently. However, the characteristics of fixational eye movements and their roles in visual process have not been explained clearly, because these signals can hardly be completely extracted by now. In this paper, we developed a new eye movement detection device with a high-speed camera. This device includes a beam splitter mirror, an infrared light source and a high-speed digital video camera with a frame rate of 200Hz. To avoid the influence of head shaking, we made the device wearable by fixing the camera on a safety helmet. Using this device, the experiments of pupil tracking were conducted. By localizing the pupil center and spectrum analysis, the envelope frequency spectrum of micro-saccades, micro-tremors and drifts are shown obviously. The experimental results show that the device is feasible and effective, so that the device can be applied in further characteristic analysis.

  10. Extracting Social Networks and Contact Information From Email and the Web

    National Research Council Canada - National Science Library

    Culotta, Aron; Bekkerman, Ron; McCallum, Andrew

    2005-01-01

    ...-suited for such information extraction tasks. By recursively calling itself on new people discovered on the Web, the system builds a social network with multiple degrees of separation from the user...

  11. Lithium NLP: A System for Rich Information Extraction from Noisy User Generated Text on Social Media

    OpenAIRE

    Bhargava, Preeti; Spasojevic, Nemanja; Hu, Guoning

    2017-01-01

    In this paper, we describe the Lithium Natural Language Processing (NLP) system - a resource-constrained, high- throughput and language-agnostic system for information extraction from noisy user generated text on social media. Lithium NLP extracts a rich set of information including entities, topics, hashtags and sentiment from text. We discuss several real world applications of the system currently incorporated in Lithium products. We also compare our system with existing commercial and acad...

  12. Toward a Standards-Compliant Genomic and Metagenomic Publication Record

    DEFF Research Database (Denmark)

    Garrity, GM; Field, D; Kyrpides, N

    2008-01-01

    Increasingly, we are aware as a community of the growing need to manage the avalanche of genomic and metagenomic data, in addition to related data types like ribosomal RNA and barcode sequences, in a way that tightly integrates contextual data with traditional literature in a machine-readable way...... is in the midst of a publishing revolution. This revolution is marked by a growing shift away from a traditional dichotomy between "journal articles" and "database entries" and an increasing adoption of hybrid models of collecting and disseminating scientific information. With respect to genomes and metagenomes...... or communities) such as the call by the GSC for a central repository of Standard Operating Procedures describing the genomic annotation pipelines of the major sequencing centers. We argue that such an "eJournal," published under the Open Access paradigm by the GSC, could be an attractive publishing forum...

  13. Information Extraction of High Resolution Remote Sensing Images Based on the Calculation of Optimal Segmentation Parameters

    Science.gov (United States)

    Zhu, Hongchun; Cai, Lijie; Liu, Haiying; Huang, Wei

    2016-01-01

    Multi-scale image segmentation and the selection of optimal segmentation parameters are the key processes in the object-oriented information extraction of high-resolution remote sensing images. The accuracy of remote sensing special subject information depends on this extraction. On the basis of WorldView-2 high-resolution data, the optimal segmentation parameters methodof object-oriented image segmentation and high-resolution image information extraction, the following processes were conducted in this study. Firstly, the best combination of the bands and weights was determined for the information extraction of high-resolution remote sensing image. An improved weighted mean-variance method was proposed andused to calculatethe optimal segmentation scale. Thereafter, the best shape factor parameter and compact factor parameters were computed with the use of the control variables and the combination of the heterogeneity and homogeneity indexes. Different types of image segmentation parameters were obtained according to the surface features. The high-resolution remote sensing images were multi-scale segmented with the optimal segmentation parameters. Ahierarchical network structure was established by setting the information extraction rules to achieve object-oriented information extraction. This study presents an effective and practical method that can explain expert input judgment by reproducible quantitative measurements. Furthermore the results of this procedure may be incorporated into a classification scheme. PMID:27362762

  14. A viral metagenomic approach on a non-metagenomic experiment: Mining next generation sequencing datasets from pig DNA identified several porcine parvoviruses for a retrospective evaluation of viral infections.

    Directory of Open Access Journals (Sweden)

    Samuele Bovo

    Full Text Available Shot-gun next generation sequencing (NGS on whole DNA extracted from specimens collected from mammals often produces reads that are not mapped (i.e. unmapped reads on the host reference genome and that are usually discarded as by-products of the experiments. In this study, we mined Ion Torrent reads obtained by sequencing DNA isolated from archived blood samples collected from 100 performance tested Italian Large White pigs. Two reduced representation libraries were prepared from two DNA pools constructed each from 50 equimolar DNA samples. Bioinformatic analyses were carried out to mine unmapped reads on the reference pig genome that were obtained from the two NGS datasets. In silico analyses included read mapping and sequence assembly approaches for a viral metagenomic analysis using the NCBI Viral Genome Resource. Our approach identified sequences matching several viruses of the Parvoviridae family: porcine parvovirus 2 (PPV2, PPV4, PPV5 and PPV6 and porcine bocavirus 1-H18 isolate (PBoV1-H18. The presence of these viruses was confirmed by PCR and Sanger sequencing of individual DNA samples. PPV2, PPV4, PPV5, PPV6 and PBoV1-H18 were all identified in samples collected in 1998-2007, 1998-2000, 1997-2000, 1998-2004 and 2003, respectively. For most of these viruses (PPV4, PPV5, PPV6 and PBoV1-H18 previous studies reported their first occurrence much later (from 5 to more than 10 years than our identification period and in different geographic areas. Our study provided a retrospective evaluation of apparently asymptomatic parvovirus infected pigs providing information that could be important to define occurrence and prevalence of different parvoviruses in South Europe. This study demonstrated the potential of mining NGS datasets non-originally derived by metagenomics experiments for viral metagenomics analyses in a livestock species.

  15. Overview of ImageCLEF 2017: information extraction from images

    OpenAIRE

    Ionescu, Bogdan; Müller, Henning; Villegas, Mauricio; Arenas, Helbert; Boato, Giulia; Dang Nguyen, Duc Tien; Dicente Cid, Yashin; Eickhoff, Carsten; Seco de Herrera, Alba G.; Gurrin, Cathal; Islam, Bayzidul; Kovalev, Vassili; Liauchuk, Vitali; Mothe, Josiane; Piras, Luca

    2017-01-01

    This paper presents an overview of the ImageCLEF 2017 evaluation campaign, an event that was organized as part of the CLEF (Conference and Labs of the Evaluation Forum) labs 2017. ImageCLEF is an ongoing initiative (started in 2003) that promotes the evaluation of technologies for annotation, indexing and retrieval for providing information access to collections of images in various usage scenarios and domains. In 2017, the 15th edition of ImageCLEF, three main tasks were proposed and one pil...

  16. Statistical techniques to extract information during SMAP soil moisture assimilation

    Science.gov (United States)

    Kolassa, J.; Reichle, R. H.; Liu, Q.; Alemohammad, S. H.; Gentine, P.

    2017-12-01

    Statistical techniques permit the retrieval of soil moisture estimates in a model climatology while retaining the spatial and temporal signatures of the satellite observations. As a consequence, the need for bias correction prior to an assimilation of these estimates is reduced, which could result in a more effective use of the independent information provided by the satellite observations. In this study, a statistical neural network (NN) retrieval algorithm is calibrated using SMAP brightness temperature observations and modeled soil moisture estimates (similar to those used to calibrate the SMAP Level 4 DA system). Daily values of surface soil moisture are estimated using the NN and then assimilated into the NASA Catchment model. The skill of the assimilation estimates is assessed based on a comprehensive comparison to in situ measurements from the SMAP core and sparse network sites as well as the International Soil Moisture Network. The NN retrieval assimilation is found to significantly improve the model skill, particularly in areas where the model does not represent processes related to agricultural practices. Additionally, the NN method is compared to assimilation experiments using traditional bias correction techniques. The NN retrieval assimilation is found to more effectively use the independent information provided by SMAP resulting in larger model skill improvements than assimilation experiments using traditional bias correction techniques.

  17. Research on Crowdsourcing Emergency Information Extraction of Based on Events' Frame

    Science.gov (United States)

    Yang, Bo; Wang, Jizhou; Ma, Weijun; Mao, Xi

    2018-01-01

    At present, the common information extraction method cannot extract the structured emergency event information accurately; the general information retrieval tool cannot completely identify the emergency geographic information; these ways also do not have an accurate assessment of these results of distilling. So, this paper proposes an emergency information collection technology based on event framework. This technique is to solve the problem of emergency information picking. It mainly includes emergency information extraction model (EIEM), complete address recognition method (CARM) and the accuracy evaluation model of emergency information (AEMEI). EIEM can be structured to extract emergency information and complements the lack of network data acquisition in emergency mapping. CARM uses a hierarchical model and the shortest path algorithm and allows the toponomy pieces to be joined as a full address. AEMEI analyzes the results of the emergency event and summarizes the advantages and disadvantages of the event framework. Experiments show that event frame technology can solve the problem of emergency information drawing and provides reference cases for other applications. When the emergency disaster is about to occur, the relevant departments query emergency's data that has occurred in the past. They can make arrangements ahead of schedule which defense and reducing disaster. The technology decreases the number of casualties and property damage in the country and world. This is of great significance to the state and society.

  18. [Extraction of management information from the national quality assurance program].

    Science.gov (United States)

    Stausberg, Jürgen; Bartels, Claus; Bobrowski, Christoph

    2007-07-15

    Starting with clinically motivated projects, the national quality assurance program has established a legislative obligatory framework. Annual feedback of results is an important means of quality control. The annual reports cover quality-related information with high granularity. A synopsis for corporate management is missing, however. Therefore, the results of the University Clinics in Greifswald, Germany, have been analyzed and aggregated to support hospital management. Strengths were identified by the ranking of results within the state for each quality indicator, weaknesses by the comparison with national reference values. The assessment was aggregated per clinical discipline and per category (indication, process, and outcome). A composition of quality indicators was claimed multiple times. A coherent concept is still missing. The method presented establishes a plausible summary of strengths and weaknesses of a hospital from the point of view of the national quality assurance program. Nevertheless, further adaptation of the program is needed to better assist corporate management.

  19. Human milk metagenome: a functional capacity analysis

    Science.gov (United States)

    2013-01-01

    Background Human milk contains a diverse population of bacteria that likely influences colonization of the infant gastrointestinal tract. Recent studies, however, have been limited to characterization of this microbial community by 16S rRNA analysis. In the present study, a metagenomic approach using Illumina sequencing of a pooled milk sample (ten donors) was employed to determine the genera of bacteria and the types of bacterial open reading frames in human milk that may influence bacterial establishment and stability in this primal food matrix. The human milk metagenome was also compared to that of breast-fed and formula-fed infants’ feces (n = 5, each) and mothers’ feces (n = 3) at the phylum level and at a functional level using open reading frame abundance. Additionally, immune-modulatory bacterial-DNA motifs were also searched for within human milk. Results The bacterial community in human milk contained over 360 prokaryotic genera, with sequences aligning predominantly to the phyla of Proteobacteria (65%) and Firmicutes (34%), and the genera of Pseudomonas (61.1%), Staphylococcus (33.4%) and Streptococcus (0.5%). From assembled human milk-derived contigs, 30,128 open reading frames were annotated and assigned to functional categories. When compared to the metagenome of infants’ and mothers’ feces, the human milk metagenome was less diverse at the phylum level, and contained more open reading frames associated with nitrogen metabolism, membrane transport and stress response (P milk metagenome also contained a similar occurrence of immune-modulatory DNA motifs to that of infants’ and mothers’ fecal metagenomes. Conclusions Our results further expand the complexity of the human milk metagenome and enforce the benefits of human milk ingestion on the microbial colonization of the infant gut and immunity. Discovery of immune-modulatory motifs in the metagenome of human milk indicates more exhaustive analyses of the functionality of the human

  20. Extracting of implicit information in English advertising texts with phonetic and lexical-morphological means

    Directory of Open Access Journals (Sweden)

    Traikovskaya Natalya Petrovna

    2015-12-01

    Full Text Available The article deals with phonetic and lexical-morphological language means participating in the process of extracting implicit information in English-speaking advertising texts for men and women. The functioning of phonetic means of the English language is not the basis for implication of information in advertising texts. Lexical and morphological means play the role of markers of relevant information, playing the role of the activator ofimplicit information in the texts of advertising.

  1. High-resolution metagenomics targets major functional types in complex microbial communities

    Energy Technology Data Exchange (ETDEWEB)

    Kalyuzhnaya, Marina G.; Lapidus, Alla; Ivanova, Natalia; Copeland, Alex C.; McHardy, Alice C.; Szeto, Ernest; Salamov, Asaf; Grigoriev, Igor V.; Suciu, Dominic; Levine, Samuel R.; Markowitz, Victor M.; Rigoutsos, Isidore; Tringe, Susannah G.; Bruce, David C.; Richardson, Paul M.; Lidstrom, Mary E.; Chistoserdova, Ludmila

    2009-08-01

    Most microbes in the biosphere remain uncultured and unknown. Whole genome shotgun (WGS) sequencing of environmental DNA (metagenomics) allows glimpses into genetic and metabolic potentials of natural microbial communities. However, in communities of high complexity metagenomics fail to link specific microbes to specific ecological functions. To overcome this limitation, we selectively targeted populations involved in oxidizing single-carbon (C{sub 1}) compounds in Lake Washington (Seattle, USA) by labeling their DNA via stable isotope probing (SIP), followed by WGS sequencing. Metagenome analysis demonstrated specific sequence enrichments in response to different C{sub 1} substrates, highlighting ecological roles of individual phylotypes. We further demonstrated the utility of our approach by extracting a nearly complete genome of a novel methylotroph Methylotenera mobilis, reconstructing its metabolism and conducting genome-wide analyses. This approach allowing high-resolution genomic analysis of ecologically relevant species has the potential to be applied to a wide variety of ecosystems.

  2. Post-processing of Deep Web Information Extraction Based on Domain Ontology

    Directory of Open Access Journals (Sweden)

    PENG, T.

    2013-11-01

    Full Text Available Many methods are utilized to extract and process query results in deep Web, which rely on the different structures of Web pages and various designing modes of databases. However, some semantic meanings and relations are ignored. So, in this paper, we present an approach for post-processing deep Web query results based on domain ontology which can utilize the semantic meanings and relations. A block identification model (BIM based on node similarity is defined to extract data blocks that are relevant to specific domain after reducing noisy nodes. Feature vector of domain books is obtained by result set extraction model (RSEM based on vector space model (VSM. RSEM, in combination with BIM, builds the domain ontology on books which can not only remove the limit of Web page structures when extracting data information, but also make use of semantic meanings of domain ontology. After extracting basic information of Web pages, a ranking algorithm is adopted to offer an ordered list of data records to users. Experimental results show that BIM and RSEM extract data blocks and build domain ontology accurately. In addition, relevant data records and basic information are extracted and ranked. The performances precision and recall show that our proposed method is feasible and efficient.

  3. Reconstruction of ribosomal RNA genes from metagenomic data.

    Directory of Open Access Journals (Sweden)

    Lu Fan

    Full Text Available Direct sequencing of environmental DNA (metagenomics has a great potential for describing the 16S rRNA gene diversity of microbial communities. However current approaches using this 16S rRNA gene information to describe community diversity suffer from low taxonomic resolution or chimera problems. Here we describe a new strategy that involves stringent assembly and data filtering to reconstruct full-length 16S rRNA genes from metagenomicpyrosequencing data. Simulations showed that reconstructed 16S rRNA genes provided a true picture of the community diversity, had minimal rates of chimera formation and gave taxonomic resolution down to genus level. The strategy was furthermore compared to PCR-based methods to determine the microbial diversity in two marine sponges. This showed that about 30% of the abundant phylotypes reconstructed from metagenomic data failed to be amplified by PCR. Our approach is readily applicable to existing metagenomic datasets and is expected to lead to the discovery of new microbial phylotypes.

  4. a Statistical Texture Feature for Building Collapse Information Extraction of SAR Image

    Science.gov (United States)

    Li, L.; Yang, H.; Chen, Q.; Liu, X.

    2018-04-01

    Synthetic Aperture Radar (SAR) has become one of the most important ways to extract post-disaster collapsed building information, due to its extreme versatility and almost all-weather, day-and-night working capability, etc. In view of the fact that the inherent statistical distribution of speckle in SAR images is not used to extract collapsed building information, this paper proposed a novel texture feature of statistical models of SAR images to extract the collapsed buildings. In the proposed feature, the texture parameter of G0 distribution from SAR images is used to reflect the uniformity of the target to extract the collapsed building. This feature not only considers the statistical distribution of SAR images, providing more accurate description of the object texture, but also is applied to extract collapsed building information of single-, dual- or full-polarization SAR data. The RADARSAT-2 data of Yushu earthquake which acquired on April 21, 2010 is used to present and analyze the performance of the proposed method. In addition, the applicability of this feature to SAR data with different polarizations is also analysed, which provides decision support for the data selection of collapsed building information extraction.

  5. A method for automating the extraction of specialized information from the web

    NARCIS (Netherlands)

    Lin, L.; Liotta, A.; Hippisley, A.; Hao, Y.; Liu, J.; Wang, Y.; Cheung, Y-M.; Yin, H.; Jiao, L.; Ma, j.; Jiao, Y-C.

    2005-01-01

    The World Wide Web can be viewed as a gigantic distributed database including millions of interconnected hosts some of which publish information via web servers or peer-to-peer systems. We present here a novel method for the extraction of semantically rich information from the web in a fully

  6. Information analysis of iris biometrics for the needs of cryptology key extraction

    Directory of Open Access Journals (Sweden)

    Adamović Saša

    2013-01-01

    Full Text Available The paper presents a rigorous analysis of iris biometric information for the synthesis of an optimized system for the extraction of a high quality cryptology key. Estimations of local entropy and mutual information were identified as segments of the iris most suitable for this purpose. In order to optimize parameters, corresponding wavelets were transformed, in order to obtain the highest possible entropy and mutual information lower in the transformation domain, which set frameworks for the synthesis of systems for the extraction of truly random sequences of iris biometrics, without compromising authentication properties. [Projekat Ministarstva nauke Republike Srbije, br. TR32054 i br. III44006

  7. Tapping uncultured microorganisms through metagenomics for drug ...

    African Journals Online (AJOL)

    African Journal of Biotechnology ... Microorganisms are major source of bioactive natural products, and several ... This review highlights the recent methodologies, limitations, and applications of metagenomics for the discovery of new drugs.

  8. Tapping uncultured microorganisms through metagenomics for drug ...

    African Journals Online (AJOL)

    bdelnasser

    reached the market using this new technology. For these reasons and others, the interest in natural products has ..... Functional metagenomic library screening strategy ..... Bertrand H, Poly F, Van VT, Lombard N, Nalin R, Vogel TM, Simonet P.

  9. Comparative metagenomics of the Red Sea

    KAUST Repository

    Mineta, Katsuhiko

    2016-01-26

    Metagenome produces a tremendous amount of data that comes from the organisms living in the environments. This big data enables us to examine not only microbial genes but also the community structure, interaction and adaptation mechanisms at the specific location and condition. The Red Sea has several unique characteristics such as high salinity, high temperature and low nutrition. These features must contribute to form the unique microbial community during the evolutionary process. Since 2014, we started monthly samplings of the metagenomes in the Red Sea under KAUST-CCF project. In collaboration with Kitasato University, we also collected the metagenome data from the ocean in Japan, which shows contrasting features to the Red Sea. Therefore, the comparative metagenomics of those data provides a comprehensive view of the Red Sea microbes, leading to identify key microbes, genes and networks related to those environmental differences.

  10. Challenges and Opportunities of Airborne Metagenomics

    OpenAIRE

    Behzad, Hayedeh; Gojobori, Takashi; Mineta, Katsuhiko

    2015-01-01

    Recent metagenomic studies of environments, such as marine and soil, have significantly enhanced our understanding of the diverse microbial communities living in these habitats and their essential roles in sustaining vast ecosystems. The increase in the number of publications related to soil and marine metagenomics is in sharp contrast to those of air, yet airborne microbes are thought to have significant impacts on many aspects of our lives from their potential roles in atmospheric events su...

  11. Challenges and Opportunities of Airborne Metagenomics

    KAUST Repository

    Behzad, H.

    2015-05-06

    Recent metagenomic studies of environments, such as marine and soil, have significantly enhanced our understanding of the diverse microbial communities living in these habitats and their essential roles in sustaining vast ecosystems. The increase in the number of publications related to soil and marine metagenomics is in sharp contrast to those of air, yet airborne microbes are thought to have significant impacts on many aspects of our lives from their potential roles in atmospheric events such as cloud formation, precipitation, and atmospheric chemistry to their major impact on human health. In this review, we will discuss the current progress in airborne metagenomics, with a special focus on exploring the challenges and opportunities of undertaking such studies. The main challenges of conducting metagenomic studies of airborne microbes are as follows: 1) Low density of microorganisms in the air, 2) efficient retrieval of microorganisms from the air, 3) variability in airborne microbial community composition, 4) the lack of standardized protocols and methodologies, and 5) DNA sequencing and bioinformatics-related challenges. Overcoming these challenges could provide the groundwork for comprehensive analysis of airborne microbes and their potential impact on the atmosphere, global climate, and our health. Metagenomic studies offer a unique opportunity to examine viral and bacterial diversity in the air and monitor their spread locally or across the globe, including threats from pathogenic microorganisms. Airborne metagenomic studies could also lead to discoveries of novel genes and metabolic pathways relevant to meteorological and industrial applications, environmental bioremediation, and biogeochemical cycles.

  12. MedTime: a temporal information extraction system for clinical narratives.

    Science.gov (United States)

    Lin, Yu-Kai; Chen, Hsinchun; Brown, Randall A

    2013-12-01

    Temporal information extraction from clinical narratives is of critical importance to many clinical applications. We participated in the EVENT/TIMEX3 track of the 2012 i2b2 clinical temporal relations challenge, and presented our temporal information extraction system, MedTime. MedTime comprises a cascade of rule-based and machine-learning pattern recognition procedures. It achieved a micro-averaged f-measure of 0.88 in both the recognitions of clinical events and temporal expressions. We proposed and evaluated three time normalization strategies to normalize relative time expressions in clinical texts. The accuracy was 0.68 in normalizing temporal expressions of dates, times, durations, and frequencies. This study demonstrates and evaluates the integration of rule-based and machine-learning-based approaches for high performance temporal information extraction from clinical narratives. Copyright © 2013 Elsevier Inc. All rights reserved.

  13. Research of building information extraction and evaluation based on high-resolution remote-sensing imagery

    Science.gov (United States)

    Cao, Qiong; Gu, Lingjia; Ren, Ruizhi; Wang, Lang

    2016-09-01

    Building extraction currently is important in the application of high-resolution remote sensing imagery. At present, quite a few algorithms are available for detecting building information, however, most of them still have some obvious disadvantages, such as the ignorance of spectral information, the contradiction between extraction rate and extraction accuracy. The purpose of this research is to develop an effective method to detect building information for Chinese GF-1 data. Firstly, the image preprocessing technique is used to normalize the image and image enhancement is used to highlight the useful information in the image. Secondly, multi-spectral information is analyzed. Subsequently, an improved morphological building index (IMBI) based on remote sensing imagery is proposed to get the candidate building objects. Furthermore, in order to refine building objects and further remove false objects, the post-processing (e.g., the shape features, the vegetation index and the water index) is employed. To validate the effectiveness of the proposed algorithm, the omission errors (OE), commission errors (CE), the overall accuracy (OA) and Kappa are used at final. The proposed method can not only effectively use spectral information and other basic features, but also avoid extracting excessive interference details from high-resolution remote sensing images. Compared to the original MBI algorithm, the proposed method reduces the OE by 33.14% .At the same time, the Kappa increase by 16.09%. In experiments, IMBI achieved satisfactory results and outperformed other algorithms in terms of both accuracies and visual inspection

  14. Binning sequences using very sparse labels within a metagenome

    Directory of Open Access Journals (Sweden)

    Halgamuge Saman K

    2008-04-01

    Full Text Available Abstract Background In metagenomic studies, a process called binning is necessary to assign contigs that belong to multiple species to their respective phylogenetic groups. Most of the current methods of binning, such as BLAST, k-mer and PhyloPythia, involve assigning sequence fragments by comparing sequence similarity or sequence composition with already-sequenced genomes that are still far from comprehensive. We propose a semi-supervised seeding method for binning that does not depend on knowledge of completed genomes. Instead, it extracts the flanking sequences of highly conserved 16S rRNA from the metagenome and uses them as seeds (labels to assign other reads based on their compositional similarity. Results The proposed seeding method is implemented on an unsupervised Growing Self-Organising Map (GSOM, and called Seeded GSOM (S-GSOM. We compared it with four well-known semi-supervised learning methods in a preliminary test, separating random-length prokaryotic sequence fragments sampled from the NCBI genome database. We identified the flanking sequences of the highly conserved 16S rRNA as suitable seeds that could be used to group the sequence fragments according to their species. S-GSOM showed superior performance compared to the semi-supervised methods tested. Additionally, S-GSOM may also be used to visually identify some species that do not have seeds. The proposed method was then applied to simulated metagenomic datasets using two different confidence threshold settings and compared with PhyloPythia, k-mer and BLAST. At the reference taxonomic level Order, S-GSOM outperformed all k-mer and BLAST results and showed comparable results with PhyloPythia for each of the corresponding confidence settings, where S-GSOM performed better than PhyloPythia in the ≥ 10 reads datasets and comparable in the ≥ 8 kb benchmark tests. Conclusion In the task of binning using semi-supervised learning methods, results indicate S-GSOM to be the best of

  15. Information Extraction of High-Resolution Remotely Sensed Image Based on Multiresolution Segmentation

    Directory of Open Access Journals (Sweden)

    Peng Shao

    2014-08-01

    Full Text Available The principle of multiresolution segmentation was represented in detail in this study, and the canny algorithm was applied for edge-detection of a remotely sensed image based on this principle. The target image was divided into regions based on object-oriented multiresolution segmentation and edge-detection. Furthermore, object hierarchy was created, and a series of features (water bodies, vegetation, roads, residential areas, bare land and other information were extracted by the spectral and geometrical features. The results indicate that the edge-detection has a positive effect on multiresolution segmentation, and overall accuracy of information extraction reaches to 94.6% by the confusion matrix.

  16. End-to-end information extraction without token-level supervision

    DEFF Research Database (Denmark)

    Palm, Rasmus Berg; Hovy, Dirk; Laws, Florian

    2017-01-01

    Most state-of-the-art information extraction approaches rely on token-level labels to find the areas of interest in text. Unfortunately, these labels are time-consuming and costly to create, and consequently, not available for many real-life IE tasks. To make matters worse, token-level labels...... and output text. We evaluate our model on the ATIS data set, MIT restaurant corpus and the MIT movie corpus and compare to neural baselines that do use token-level labels. We achieve competitive results, within a few percentage points of the baselines, showing the feasibility of E2E information extraction...

  17. Extraction Method for Earthquake-Collapsed Building Information Based on High-Resolution Remote Sensing

    International Nuclear Information System (INIS)

    Chen, Peng; Wu, Jian; Liu, Yaolin; Wang, Jing

    2014-01-01

    At present, the extraction of earthquake disaster information from remote sensing data relies on visual interpretation. However, this technique cannot effectively and quickly obtain precise and efficient information for earthquake relief and emergency management. Collapsed buildings in the town of Zipingpu after the Wenchuan earthquake were used as a case study to validate two kinds of rapid extraction methods for earthquake-collapsed building information based on pixel-oriented and object-oriented theories. The pixel-oriented method is based on multi-layer regional segments that embody the core layers and segments of the object-oriented method. The key idea is to mask layer by layer all image information, including that on the collapsed buildings. Compared with traditional techniques, the pixel-oriented method is innovative because it allows considerably rapid computer processing. As for the object-oriented method, a multi-scale segment algorithm was applied to build a three-layer hierarchy. By analyzing the spectrum, texture, shape, location, and context of individual object classes in different layers, the fuzzy determined rule system was established for the extraction of earthquake-collapsed building information. We compared the two sets of results using three variables: precision assessment, visual effect, and principle. Both methods can extract earthquake-collapsed building information quickly and accurately. The object-oriented method successfully overcomes the pepper salt noise caused by the spectral diversity of high-resolution remote sensing data and solves the problem of same object, different spectrums and that of same spectrum, different objects. With an overall accuracy of 90.38%, the method achieves more scientific and accurate results compared with the pixel-oriented method (76.84%). The object-oriented image analysis method can be extensively applied in the extraction of earthquake disaster information based on high-resolution remote sensing

  18. Using text mining techniques to extract phenotypic information from the PhenoCHF corpus.

    Science.gov (United States)

    Alnazzawi, Noha; Thompson, Paul; Batista-Navarro, Riza; Ananiadou, Sophia

    2015-01-01

    Phenotypic information locked away in unstructured narrative text presents significant barriers to information accessibility, both for clinical practitioners and for computerised applications used for clinical research purposes. Text mining (TM) techniques have previously been applied successfully to extract different types of information from text in the biomedical domain. They have the potential to be extended to allow the extraction of information relating to phenotypes from free text. To stimulate the development of TM systems that are able to extract phenotypic information from text, we have created a new corpus (PhenoCHF) that is annotated by domain experts with several types of phenotypic information relating to congestive heart failure. To ensure that systems developed using the corpus are robust to multiple text types, it integrates text from heterogeneous sources, i.e., electronic health records (EHRs) and scientific articles from the literature. We have developed several different phenotype extraction methods to demonstrate the utility of the corpus, and tested these methods on a further corpus, i.e., ShARe/CLEF 2013. Evaluation of our automated methods showed that PhenoCHF can facilitate the training of reliable phenotype extraction systems, which are robust to variations in text type. These results have been reinforced by evaluating our trained systems on the ShARe/CLEF corpus, which contains clinical records of various types. Like other studies within the biomedical domain, we found that solutions based on conditional random fields produced the best results, when coupled with a rich feature set. PhenoCHF is the first annotated corpus aimed at encoding detailed phenotypic information. The unique heterogeneous composition of the corpus has been shown to be advantageous in the training of systems that can accurately extract phenotypic information from a range of different text types. Although the scope of our annotation is currently limited to a single

  19. Terrain Extraction by Integrating Terrestrial Laser Scanner Data and Spectral Information

    Science.gov (United States)

    Lau, C. L.; Halim, S.; Zulkepli, M.; Azwan, A. M.; Tang, W. L.; Chong, A. K.

    2015-10-01

    The extraction of true terrain points from unstructured laser point cloud data is an important process in order to produce an accurate digital terrain model (DTM). However, most of these spatial filtering methods just utilizing the geometrical data to discriminate the terrain points from nonterrain points. The point cloud filtering method also can be improved by using the spectral information available with some scanners. Therefore, the objective of this study is to investigate the effectiveness of using the three-channel (red, green and blue) of the colour image captured from built-in digital camera which is available in some Terrestrial Laser Scanner (TLS) for terrain extraction. In this study, the data acquisition was conducted at a mini replica landscape in Universiti Teknologi Malaysia (UTM), Skudai campus using Leica ScanStation C10. The spectral information of the coloured point clouds from selected sample classes are extracted for spectral analysis. The coloured point clouds which within the corresponding preset spectral threshold are identified as that specific feature point from the dataset. This process of terrain extraction is done through using developed Matlab coding. Result demonstrates that a higher spectral resolution passive image is required in order to improve the output. This is because low quality of the colour images captured by the sensor contributes to the low separability in spectral reflectance. In conclusion, this study shows that, spectral information is capable to be used as a parameter for terrain extraction.

  20. Natural history bycatch: a pipeline for identifying metagenomic sequences in RADseq data

    Directory of Open Access Journals (Sweden)

    Iris Holmes

    2018-04-01

    Full Text Available Background Reduced representation genomic datasets are increasingly becoming available from a variety of organisms. These datasets do not target specific genes, and so may contain sequences from parasites and other organisms present in the target tissue sample. In this paper, we demonstrate that (1 RADseq datasets can be used for exploratory analysis of tissue-specific metagenomes, and (2 tissue collections house complete metagenomic communities, which can be investigated and quantified by a variety of techniques. Methods We present an exploratory method for mining metagenomic “bycatch” sequences from a range of host tissue types. We use a combination of the pyRAD assembly pipeline, NCBI’s blastn software, and custom R scripts to isolate metagenomic sequences from RADseq type datasets. Results When we focus on sequences that align with existing references in NCBI’s GenBank, we find that between three and five percent of identifiable double-digest restriction site associated DNA (ddRAD sequences from host tissue samples are from phyla to contain known blood parasites. In addition to tissue samples, we examine ddRAD sequences from metagenomic DNA extracted snake and lizard hind-gut samples. We find that the sequences recovered from these samples match with expected bacterial and eukaryotic gut microbiome phyla. Discussion Our results suggest that (1 museum tissue banks originally collected for host DNA archiving are also preserving valuable parasite and microbiome communities, (2 that publicly available RADseq datasets may include metagenomic sequences that could be explored, and (3 that restriction site approaches are a useful exploratory technique to identify microbiome lineages that could be missed by primer-based approaches.

  1. Denoising PCR-amplified metagenome data

    Directory of Open Access Journals (Sweden)

    Rosen Michael J

    2012-10-01

    Full Text Available Abstract Background PCR amplification and high-throughput sequencing theoretically enable the characterization of the finest-scale diversity in natural microbial and viral populations, but each of these methods introduces random errors that are difficult to distinguish from genuine biological diversity. Several approaches have been proposed to denoise these data but lack either speed or accuracy. Results We introduce a new denoising algorithm that we call DADA (Divisive Amplicon Denoising Algorithm. Without training data, DADA infers both the sample genotypes and error parameters that produced a metagenome data set. We demonstrate performance on control data sequenced on Roche’s 454 platform, and compare the results to the most accurate denoising software currently available, AmpliconNoise. Conclusions DADA is more accurate and over an order of magnitude faster than AmpliconNoise. It eliminates the need for training data to establish error parameters, fully utilizes sequence-abundance information, and enables inclusion of context-dependent PCR error rates. It should be readily extensible to other sequencing platforms such as Illumina.

  2. Metagenomic Taxonomy-Guided Database-Searching Strategy for Improving Metaproteomic Analysis.

    Science.gov (United States)

    Xiao, Jinqiu; Tanca, Alessandro; Jia, Ben; Yang, Runqing; Wang, Bo; Zhang, Yu; Li, Jing

    2018-04-06

    Metaproteomics provides a direct measure of the functional information by investigating all proteins expressed by a microbiota. However, due to the complexity and heterogeneity of microbial communities, it is very hard to construct a sequence database suitable for a metaproteomic study. Using a public database, researchers might not be able to identify proteins from poorly characterized microbial species, while a sequencing-based metagenomic database may not provide adequate coverage for all potentially expressed protein sequences. To address this challenge, we propose a metagenomic taxonomy-guided database-search strategy (MT), in which a merged database is employed, consisting of both taxonomy-guided reference protein sequences from public databases and proteins from metagenome assembly. By applying our MT strategy to a mock microbial mixture, about two times as many peptides were detected as with the metagenomic database only. According to the evaluation of the reliability of taxonomic attribution, the rate of misassignments was comparable to that obtained using an a priori matched database. We also evaluated the MT strategy with a human gut microbial sample, and we found 1.7 times as many peptides as using a standard metagenomic database. In conclusion, our MT strategy allows the construction of databases able to provide high sensitivity and precision in peptide identification in metaproteomic studies, enabling the detection of proteins from poorly characterized species within the microbiota.

  3. Information retrieval and terminology extraction in online resources for patients with diabetes.

    Science.gov (United States)

    Seljan, Sanja; Baretić, Maja; Kucis, Vlasta

    2014-06-01

    Terminology use, as a mean for information retrieval or document indexing, plays an important role in health literacy. Specific types of users, i.e. patients with diabetes need access to various online resources (on foreign and/or native language) searching for information on self-education of basic diabetic knowledge, on self-care activities regarding importance of dietetic food, medications, physical exercises and on self-management of insulin pumps. Automatic extraction of corpus-based terminology from online texts, manuals or professional papers, can help in building terminology lists or list of "browsing phrases" useful in information retrieval or in document indexing. Specific terminology lists represent an intermediate step between free text search and controlled vocabulary, between user's demands and existing online resources in native and foreign language. The research aiming to detect the role of terminology in online resources, is conducted on English and Croatian manuals and Croatian online texts, and divided into three interrelated parts: i) comparison of professional and popular terminology use ii) evaluation of automatic statistically-based terminology extraction on English and Croatian texts iii) comparison and evaluation of extracted terminology performed on English manual using statistical and hybrid approaches. Extracted terminology candidates are evaluated by comparison with three types of reference lists: list created by professional medical person, list of highly professional vocabulary contained in MeSH and list created by non-medical persons, made as intersection of 15 lists. Results report on use of popular and professional terminology in online diabetes resources, on evaluation of automatically extracted terminology candidates in English and Croatian texts and on comparison of statistical and hybrid extraction methods in English text. Evaluation of automatic and semi-automatic terminology extraction methods is performed by recall

  4. OpenCV-Based Nanomanipulation Information Extraction and the Probe Operation in SEM

    Directory of Open Access Journals (Sweden)

    Dongjie Li

    2015-02-01

    Full Text Available Aimed at the established telenanomanipulation system, the method of extracting location information and the strategies of probe operation were studied in this paper. First, the machine learning algorithm of OpenCV was used to extract location information from SEM images. Thus nanowires and probe in SEM images can be automatically tracked and the region of interest (ROI can be marked quickly. Then the location of nanowire and probe can be extracted from the ROI. To study the probe operation strategy, the Van der Waals force between probe and a nanowire was computed; thus relevant operating parameters can be obtained. With these operating parameters, the nanowire in 3D virtual environment can be preoperated and an optimal path of the probe can be obtained. The actual probe runs automatically under the telenanomanipulation system's control. Finally, experiments were carried out to verify the above methods, and results show the designed methods have achieved the expected effect.

  5. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Yu-Wei [Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Simmons, Blake A. [Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Singer, Steven W. [Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

    2015-10-29

    The recovery of genomes from metagenomic datasets is a critical step to defining the functional roles of the underlying uncultivated populations. We previously developed MaxBin, an automated binning approach for high-throughput recovery of microbial genomes from metagenomes. Here, we present an expanded binning algorithm, MaxBin 2.0, which recovers genomes from co-assembly of a collection of metagenomic datasets. Tests on simulated datasets revealed that MaxBin 2.0 is highly accurate in recovering individual genomes, and the application of MaxBin 2.0 to several metagenomes from environmental samples demonstrated that it could achieve two complementary goals: recovering more bacterial genomes compared to binning a single sample as well as comparing the microbial community composition between different sampling environments. Availability and implementation: MaxBin 2.0 is freely available at http://sourceforge.net/projects/maxbin/ under BSD license. Supplementary information: Supplementary data are available at Bioinformatics online.

  6. Marine Metagenome as A Resource for Novel Enzymes

    KAUST Repository

    Alma’ abadi, Amani D.; Gojobori, Takashi; Mineta, Katsuhiko

    2015-01-01

    the metagenomics approach has many limitations, it is expected to provide not only scientific insights but also economic benefits, especially in industry. This review highlights the importance of metagenomics in mining microbial lipases, as an example, by using

  7. Methods to extract information on the atomic and molecular states from scientific abstracts

    International Nuclear Information System (INIS)

    Sasaki, Akira; Ueshima, Yutaka; Yamagiwa, Mitsuru; Murata, Masaki; Kanamaru, Toshiyuki; Shirado, Tamotsu; Isahara, Hitoshi

    2005-01-01

    We propose a new application of information technology to recognize and extract expressions of atomic and molecular states from electrical forms of scientific abstracts. Present results will help scientists to understand atomic states as well as the physics discussed in the articles. Combining with the internet search engines, it will make one possible to collect not only atomic and molecular data but broader scientific information over a wide range of research fields. (author)

  8. System and method for extracting physiological information from remotely detected electromagnetic radiation

    NARCIS (Netherlands)

    2016-01-01

    The present invention relates to a device and a method for extracting physiological information indicative of at least one health symptom from remotely detected electromagnetic radiation. The device comprises an interface (20) for receiving a data stream comprising remotely detected image data

  9. System and method for extracting physiological information from remotely detected electromagnetic radiation

    NARCIS (Netherlands)

    2015-01-01

    The present invention relates to a device and a method for extracting physiological information indicative of at least one health symptom from remotely detected electromagnetic radiation. The device comprises an interface (20) for receiving a data stream comprising remotely detected image data

  10. Network and Ensemble Enabled Entity Extraction in Informal Text (NEEEEIT) final report

    Energy Technology Data Exchange (ETDEWEB)

    Kegelmeyer, Philip W. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Shead, Timothy M. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Dunlavy, Daniel M. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2013-09-01

    This SAND report summarizes the activities and outcomes of the Network and Ensemble Enabled Entity Extraction in Information Text (NEEEEIT) LDRD project, which addressed improving the accuracy of conditional random fields for named entity recognition through the use of ensemble methods.

  11. A construction scheme of web page comment information extraction system based on frequent subtree mining

    Science.gov (United States)

    Zhang, Xiaowen; Chen, Bingfeng

    2017-08-01

    Based on the frequent sub-tree mining algorithm, this paper proposes a construction scheme of web page comment information extraction system based on frequent subtree mining, referred to as FSM system. The entire system architecture and the various modules to do a brief introduction, and then the core of the system to do a detailed description, and finally give the system prototype.

  12. EXTRACT

    DEFF Research Database (Denmark)

    Pafilis, Evangelos; Buttigieg, Pier Luigi; Ferrell, Barbra

    2016-01-01

    The microbial and molecular ecology research communities have made substantial progress on developing standards for annotating samples with environment metadata. However, sample manual annotation is a highly labor intensive process and requires familiarity with the terminologies used. We have the...... and text-mining-assisted curation revealed that EXTRACT speeds up annotation by 15-25% and helps curators to detect terms that would otherwise have been missed.Database URL: https://extract.hcmr.gr/......., organism, tissue and disease terms. The evaluators in the BioCreative V Interactive Annotation Task found the system to be intuitive, useful, well documented and sufficiently accurate to be helpful in spotting relevant text passages and extracting organism and environment terms. Comparison of fully manual...

  13. Semi-automatic building extraction in informal settlements from high-resolution satellite imagery

    Science.gov (United States)

    Mayunga, Selassie David

    The extraction of man-made features from digital remotely sensed images is considered as an important step underpinning management of human settlements in any country. Man-made features and buildings in particular are required for varieties of applications such as urban planning, creation of geographical information systems (GIS) databases and Urban City models. The traditional man-made feature extraction methods are very expensive in terms of equipment, labour intensive, need well-trained personnel and cannot cope with changing environments, particularly in dense urban settlement areas. This research presents an approach for extracting buildings in dense informal settlement areas using high-resolution satellite imagery. The proposed system uses a novel strategy of extracting building by measuring a single point at the approximate centre of the building. The fine measurement of the building outlines is then effected using a modified snake model. The original snake model on which this framework is based, incorporates an external constraint energy term which is tailored to preserving the convergence properties of the snake model; its use to unstructured objects will negatively affect their actual shapes. The external constrained energy term was removed from the original snake model formulation, thereby, giving ability to cope with high variability of building shapes in informal settlement areas. The proposed building extraction system was tested on two areas, which have different situations. The first area was Tungi in Dar Es Salaam, Tanzania where three sites were tested. This area is characterized by informal settlements, which are illegally formulated within the city boundaries. The second area was Oromocto in New Brunswick, Canada where two sites were tested. Oromocto area is mostly flat and the buildings are constructed using similar materials. Qualitative and quantitative measures were employed to evaluate the accuracy of the results as well as the performance

  14. Viral Metagenomics: MetaView Software

    Energy Technology Data Exchange (ETDEWEB)

    Zhou, C; Smith, J

    2007-10-22

    The purpose of this report is to design and develop a tool for analysis of raw sequence read data from viral metagenomics experiments. The tool should compare read sequences of known viral nucleic acid sequence data and enable a user to attempt to determine, with some degree of confidence, what virus groups may be present in the sample. This project was conducted in two phases. In phase 1 we surveyed the literature and examined existing metagenomics tools to educate ourselves and to more precisely define the problem of analyzing raw read data from viral metagenomic experiments. In phase 2 we devised an approach and built a prototype code and database. This code takes viral metagenomic read data in fasta format as input and accesses all complete viral genomes from Kpath for sequence comparison. The system executes at the UNIX command line, producing output that is stored in an Oracle relational database. We provide here a description of the approach we came up with for handling un-assembled, short read data sets from viral metagenomics experiments. We include a discussion of the current MetaView code capabilities and additional functionality that we believe should be added, should additional funding be acquired to continue the work.

  15. Preliminary High-Throughput Metagenome Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Dusheyko, Serge; Furman, Craig; Pangilinan, Jasmyn; Shapiro, Harris; Tu, Hank

    2007-03-26

    Metagenome data sets present a qualitatively different assembly problem than traditional single-organism whole-genome shotgun (WGS) assembly. The unique aspects of such projects include the presence of a potentially large number of distinct organisms and their representation in the data set at widely different fractions. In addition, multiple closely related strains could be present, which would be difficult to assemble separately. Failure to take these issues into account can result in poor assemblies that either jumble together different strains or which fail to yield useful results. The DOE Joint Genome Institute has sequenced a number of metagenomic projects and plans to considerably increase this number in the coming year. As a result, the JGI has a need for high-throughput tools and techniques for handling metagenome projects. We present the techniques developed to handle metagenome assemblies in a high-throughput environment. This includes a streamlined assembly wrapper, based on the JGI?s in-house WGS assembler, Jazz. It also includes the selection of sensible defaults targeted for metagenome data sets, as well as quality control automation for cleaning up the raw results. While analysis is ongoing, we will discuss preliminary assessments of the quality of the assembly results (http://fames.jgi-psf.org).

  16. RESEARCH ON REMOTE SENSING GEOLOGICAL INFORMATION EXTRACTION BASED ON OBJECT ORIENTED CLASSIFICATION

    Directory of Open Access Journals (Sweden)

    H. Gao

    2018-04-01

    Full Text Available The northern Tibet belongs to the Sub cold arid climate zone in the plateau. It is rarely visited by people. The geological working conditions are very poor. However, the stratum exposures are good and human interference is very small. Therefore, the research on the automatic classification and extraction of remote sensing geological information has typical significance and good application prospect. Based on the object-oriented classification in Northern Tibet, using the Worldview2 high-resolution remote sensing data, combined with the tectonic information and image enhancement, the lithological spectral features, shape features, spatial locations and topological relations of various geological information are excavated. By setting the threshold, based on the hierarchical classification, eight kinds of geological information were classified and extracted. Compared with the existing geological maps, the accuracy analysis shows that the overall accuracy reached 87.8561 %, indicating that the classification-oriented method is effective and feasible for this study area and provides a new idea for the automatic extraction of remote sensing geological information.

  17. A Method for Extracting Road Boundary Information from Crowdsourcing Vehicle GPS Trajectories.

    Science.gov (United States)

    Yang, Wei; Ai, Tinghua; Lu, Wei

    2018-04-19

    Crowdsourcing trajectory data is an important approach for accessing and updating road information. In this paper, we present a novel approach for extracting road boundary information from crowdsourcing vehicle traces based on Delaunay triangulation (DT). First, an optimization and interpolation method is proposed to filter abnormal trace segments from raw global positioning system (GPS) traces and interpolate the optimization segments adaptively to ensure there are enough tracking points. Second, constructing the DT and the Voronoi diagram within interpolated tracking lines to calculate road boundary descriptors using the area of Voronoi cell and the length of triangle edge. Then, the road boundary detection model is established integrating the boundary descriptors and trajectory movement features (e.g., direction) by DT. Third, using the boundary detection model to detect road boundary from the DT constructed by trajectory lines, and a regional growing method based on seed polygons is proposed to extract the road boundary. Experiments were conducted using the GPS traces of taxis in Beijing, China, and the results show that the proposed method is suitable for extracting the road boundary from low-frequency GPS traces, multi-type road structures, and different time intervals. Compared with two existing methods, the automatically extracted boundary information was proved to be of higher quality.

  18. A Method for Extracting Road Boundary Information from Crowdsourcing Vehicle GPS Trajectories

    Directory of Open Access Journals (Sweden)

    Wei Yang

    2018-04-01

    Full Text Available Crowdsourcing trajectory data is an important approach for accessing and updating road information. In this paper, we present a novel approach for extracting road boundary information from crowdsourcing vehicle traces based on Delaunay triangulation (DT. First, an optimization and interpolation method is proposed to filter abnormal trace segments from raw global positioning system (GPS traces and interpolate the optimization segments adaptively to ensure there are enough tracking points. Second, constructing the DT and the Voronoi diagram within interpolated tracking lines to calculate road boundary descriptors using the area of Voronoi cell and the length of triangle edge. Then, the road boundary detection model is established integrating the boundary descriptors and trajectory movement features (e.g., direction by DT. Third, using the boundary detection model to detect road boundary from the DT constructed by trajectory lines, and a regional growing method based on seed polygons is proposed to extract the road boundary. Experiments were conducted using the GPS traces of taxis in Beijing, China, and the results show that the proposed method is suitable for extracting the road boundary from low-frequency GPS traces, multi-type road structures, and different time intervals. Compared with two existing methods, the automatically extracted boundary information was proved to be of higher quality.

  19. YAdumper: extracting and translating large information volumes from relational databases to structured flat files.

    Science.gov (United States)

    Fernández, José M; Valencia, Alfonso

    2004-10-12

    Downloading the information stored in relational databases into XML and other flat formats is a common task in bioinformatics. This periodical dumping of information requires considerable CPU time, disk and memory resources. YAdumper has been developed as a purpose-specific tool to deal with the integral structured information download of relational databases. YAdumper is a Java application that organizes database extraction following an XML template based on an external Document Type Declaration. Compared with other non-native alternatives, YAdumper substantially reduces memory requirements and considerably improves writing performance.

  20. A catalog of the mouse gut metagenome

    DEFF Research Database (Denmark)

    Xiao, Liang; Feng, Qiang; Liang, Suisha

    2015-01-01

    laboratories and fed either a low-fat or high-fat diet. Similar to the human gut microbiome, >99% of the cataloged genes are bacterial. We identified 541 metagenomic species and defined a core set of 26 metagenomic species found in 95% of the mice. The mouse gut microbiome is functionally similar to its human......We established a catalog of the mouse gut metagenome comprising ∼2.6 million nonredundant genes by sequencing DNA from fecal samples of 184 mice. To secure high microbiome diversity, we used mouse strains of diverse genetic backgrounds, from different providers, kept in different housing...... counterpart, with 95.2% of its Kyoto Encyclopedia of Genes and Genomes (KEGG) orthologous groups in common. However, only 4.0% of the mouse gut microbial genes were shared (95% identity, 90% coverage) with those of the human gut microbiome. This catalog provides a useful reference for future studies....

  1. Challenges and opportunities of airborne metagenomics.

    Science.gov (United States)

    Behzad, Hayedeh; Gojobori, Takashi; Mineta, Katsuhiko

    2015-05-06

    Recent metagenomic studies of environments, such as marine and soil, have significantly enhanced our understanding of the diverse microbial communities living in these habitats and their essential roles in sustaining vast ecosystems. The increase in the number of publications related to soil and marine metagenomics is in sharp contrast to those of air, yet airborne microbes are thought to have significant impacts on many aspects of our lives from their potential roles in atmospheric events such as cloud formation, precipitation, and atmospheric chemistry to their major impact on human health. In this review, we will discuss the current progress in airborne metagenomics, with a special focus on exploring the challenges and opportunities of undertaking such studies. The main challenges of conducting metagenomic studies of airborne microbes are as follows: 1) Low density of microorganisms in the air, 2) efficient retrieval of microorganisms from the air, 3) variability in airborne microbial community composition, 4) the lack of standardized protocols and methodologies, and 5) DNA sequencing and bioinformatics-related challenges. Overcoming these challenges could provide the groundwork for comprehensive analysis of airborne microbes and their potential impact on the atmosphere, global climate, and our health. Metagenomic studies offer a unique opportunity to examine viral and bacterial diversity in the air and monitor their spread locally or across the globe, including threats from pathogenic microorganisms. Airborne metagenomic studies could also lead to discoveries of novel genes and metabolic pathways relevant to meteorological and industrial applications, environmental bioremediation, and biogeochemical cycles. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  2. Metagenomic Detection Methods in Biopreparedness Outbreak Scenarios

    DEFF Research Database (Denmark)

    Karlsson, Oskar Erik; Hansen, Trine; Knutsson, Rickard

    2013-01-01

    In the field of diagnostic microbiology, rapid molecular methods are critically important for detecting pathogens. With rapid and accurate detection, preventive measures can be put in place early, thereby preventing loss of life and further spread of a disease. From a preparedness perspective...... of a clinical sample, creating a metagenome, in a single week of laboratory work. As new technologies emerge, their dissemination and capacity building must be facilitated, and criteria for use, as well as guidelines on how to report results, must be established. This article focuses on the use of metagenomics...

  3. Gene Prediction in Metagenomic Fragments with Deep Learning

    Directory of Open Access Journals (Sweden)

    Shao-Wu Zhang

    2017-01-01

    Full Text Available Next generation sequencing technologies used in metagenomics yield numerous sequencing fragments which come from thousands of different species. Accurately identifying genes from metagenomics fragments is one of the most fundamental issues in metagenomics. In this article, by fusing multifeatures (i.e., monocodon usage, monoamino acid usage, ORF length coverage, and Z-curve features and using deep stacking networks learning model, we present a novel method (called Meta-MFDL to predict the metagenomic genes. The results with 10 CV and independent tests show that Meta-MFDL is a powerful tool for identifying genes from metagenomic fragments.

  4. Using Short-Term Enrichments and Metagenomics to Obtain Genomes from uncultured Activated Sludge Microorganisms

    DEFF Research Database (Denmark)

    Karst, Søren Michael; Nielsen, Per Halkjær; Albertsen, Mads

    is that they depend on system-specific reference genomes in order to analyze the vast amounts of data (Albertsen et al., 2012). This limits the application of -omics to environments for which a comprehensive catalogue of reference genomes exists e.g. the human gut. Several strategies for obtaining microbial genomes...... exist today, but their ability to obtain complete genomes from complex microbial communities on a large scale is still inadequate (Lasken, 2012). In theory, conventional metagenomics should be able to recover genomes from complex communities, but in practice the approach is hampered by the presence...... of microdiversity. This leads to fragmented and chimeric de novo assemblies, which prevent the extraction of complete genomes. The new approach presented here involves reducing the impact of microdiversity and increasing genome extraction efficiency by what we term “metagenome triangulation”. The microdiversity...

  5. Metagenomics as a Tool for Enzyme Discovery: Hydrolytic Enzymes from Marine-Related Metagenomes.

    Science.gov (United States)

    Popovic, Ana; Tchigvintsev, Anatoly; Tran, Hai; Chernikova, Tatyana N; Golyshina, Olga V; Yakimov, Michail M; Golyshin, Peter N; Yakunin, Alexander F

    2015-01-01

    This chapter discusses metagenomics and its application for enzyme discovery, with a focus on hydrolytic enzymes from marine metagenomic libraries. With less than one percent of culturable microorganisms in the environment, metagenomics, or the collective study of community genetics, has opened up a rich pool of uncharacterized metabolic pathways, enzymes, and adaptations. This great untapped pool of genes provides the particularly exciting potential to mine for new biochemical activities or novel enzymes with activities tailored to peculiar sets of environmental conditions. Metagenomes also represent a huge reservoir of novel enzymes for applications in biocatalysis, biofuels, and bioremediation. Here we present the results of enzyme discovery for four enzyme activities, of particular industrial or environmental interest, including esterase/lipase, glycosyl hydrolase, protease and dehalogenase.

  6. Extracting information from two-dimensional electrophoresis gels by partial least squares regression

    DEFF Research Database (Denmark)

    Jessen, Flemming; Lametsch, R.; Bendixen, E.

    2002-01-01

    of all proteins/spots in the gels. In the present study it is demonstrated how information can be extracted by multivariate data analysis. The strategy is based on partial least squares regression followed by variable selection to find proteins that individually or in combination with other proteins vary......Two-dimensional gel electrophoresis (2-DE) produces large amounts of data and extraction of relevant information from these data demands a cautious and time consuming process of spot pattern matching between gels. The classical approach of data analysis is to detect protein markers that appear...... or disappear depending on the experimental conditions. Such biomarkers are found by comparing the relative volumes of individual spots in the individual gels. Multivariate statistical analysis and modelling of 2-DE data for comparison and classification is an alternative approach utilising the combination...

  7. From remote sensing data about information extraction for 3D geovisualization - Development of a workflow

    International Nuclear Information System (INIS)

    Tiede, D.

    2010-01-01

    With an increased availability of high (spatial) resolution remote sensing imagery since the late nineties, the need to develop operative workflows for the automated extraction, provision and communication of information from such data has grown. Monitoring requirements, aimed at the implementation of environmental or conservation targets, management of (environmental-) resources, and regional planning as well as international initiatives, especially the joint initiative of the European Commission and ESA (European Space Agency) for Global Monitoring for Environment and Security (GMES) play also a major part. This thesis addresses the development of an integrated workflow for the automated provision of information derived from remote sensing data. Considering applied data and fields of application, this work aims to design the workflow as generic as possible. Following research questions are discussed: What are the requirements of a workflow architecture that seamlessly links the individual workflow elements in a timely manner and secures accuracy of the extracted information effectively? How can the workflow retain its efficiency if mounds of data are processed? How can the workflow be improved with regards to automated object-based image analysis (OBIA)? Which recent developments could be of use? What are the limitations or which workarounds could be applied in order to generate relevant results? How can relevant information be prepared target-oriented and communicated effectively? How can the more recently developed freely available virtual globes be used for the delivery of conditioned information under consideration of the third dimension as an additional, explicit carrier of information? Based on case studies comprising different data sets and fields of application it is demonstrated how methods to extract and process information as well as to effectively communicate results can be improved and successfully combined within one workflow. It is shown that (1

  8. Addressing Risk Assessment for Patient Safety in Hospitals through Information Extraction in Medical Reports

    Science.gov (United States)

    Proux, Denys; Segond, Frédérique; Gerbier, Solweig; Metzger, Marie Hélène

    Hospital Acquired Infections (HAI) is a real burden for doctors and risk surveillance experts. The impact on patients' health and related healthcare cost is very significant and a major concern even for rich countries. Furthermore required data to evaluate the threat is generally not available to experts and that prevents from fast reaction. However, recent advances in Computational Intelligence Techniques such as Information Extraction, Risk Patterns Detection in documents and Decision Support Systems allow now to address this problem.

  9. From Specific Information Extraction to Inferences: A Hierarchical Framework of Graph Comprehension

    Science.gov (United States)

    2004-09-01

    The skill to interpret the information displayed in graphs is so important to have, the National Council of Teachers of Mathematics has created...guidelines to ensure that students learn these skills ( NCTM : Standards for Mathematics , 2003). These guidelines are based primarily on the extraction of...graphical perception. Human Computer Interaction, 8, 353-388. NCTM : Standards for Mathematics . (2003, 2003). Peebles, D., & Cheng, P. C.-H. (2002

  10. Extracting breathing rate information from a wearable reflectance pulse oximeter sensor.

    Science.gov (United States)

    Johnston, W S; Mendelson, Y

    2004-01-01

    The integration of multiple vital physiological measurements could help combat medics and field commanders to better predict a soldier's health condition and enhance their ability to perform remote triage procedures. In this paper we demonstrate the feasibility of extracting accurate breathing rate information from a photoplethysmographic signal that was recorded by a reflectance pulse oximeter sensor mounted on the forehead and subsequently processed by a simple time domain filtering and frequency domain Fourier analysis.

  11. Extraction of land cover change information from ENVISAT-ASAR data in Chengdu Plain

    Science.gov (United States)

    Xu, Wenbo; Fan, Jinlong; Huang, Jianxi; Tian, Yichen; Zhang, Yong

    2006-10-01

    Land cover data are essential to most global change research objectives, including the assessment of current environmental conditions and the simulation of future environmental scenarios that ultimately lead to public policy development. Chinese Academy of Sciences generated a nationwide land cover database in order to carry out the quantification and spatial characterization of land use/cover changes (LUCC) in 1990s. In order to improve the reliability of the database, we will update the database anytime. But it is difficult to obtain remote sensing data to extract land cover change information in large-scale. It is hard to acquire optical remote sensing data in Chengdu plain, so the objective of this research was to evaluate multitemporal ENVISAT advanced synthetic aperture radar (ASAR) data for extracting land cover change information. Based on the fieldwork and the nationwide 1:100000 land cover database, the paper assesses several land cover changes in Chengdu plain, for example: crop to buildings, forest to buildings, and forest to bare land. The results show that ENVISAT ASAR data have great potential for the applications of extracting land cover change information.

  12. KneeTex: an ontology-driven system for information extraction from MRI reports.

    Science.gov (United States)

    Spasić, Irena; Zhao, Bo; Jones, Christopher B; Button, Kate

    2015-01-01

    In the realm of knee pathology, magnetic resonance imaging (MRI) has the advantage of visualising all structures within the knee joint, which makes it a valuable tool for increasing diagnostic accuracy and planning surgical treatments. Therefore, clinical narratives found in MRI reports convey valuable diagnostic information. A range of studies have proven the feasibility of natural language processing for information extraction from clinical narratives. However, no study focused specifically on MRI reports in relation to knee pathology, possibly due to the complexity of knee anatomy and a wide range of conditions that may be associated with different anatomical entities. In this paper we describe KneeTex, an information extraction system that operates in this domain. As an ontology-driven information extraction system, KneeTex makes active use of an ontology to strongly guide and constrain text analysis. We used automatic term recognition to facilitate the development of a domain-specific ontology with sufficient detail and coverage for text mining applications. In combination with the ontology, high regularity of the sublanguage used in knee MRI reports allowed us to model its processing by a set of sophisticated lexico-semantic rules with minimal syntactic analysis. The main processing steps involve named entity recognition combined with coordination, enumeration, ambiguity and co-reference resolution, followed by text segmentation. Ontology-based semantic typing is then used to drive the template filling process. We adopted an existing ontology, TRAK (Taxonomy for RehAbilitation of Knee conditions), for use within KneeTex. The original TRAK ontology expanded from 1,292 concepts, 1,720 synonyms and 518 relationship instances to 1,621 concepts, 2,550 synonyms and 560 relationship instances. This provided KneeTex with a very fine-grained lexico-semantic knowledge base, which is highly attuned to the given sublanguage. Information extraction results were evaluated

  13. The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata

    Science.gov (United States)

    Pagani, Ioanna; Liolios, Konstantinos; Jansson, Jakob; Chen, I-Min A.; Smirnova, Tatyana; Nosrat, Bahador; Markowitz, Victor M.; Kyrpides, Nikos C.

    2012-01-01

    The Genomes OnLine Database (GOLD, http://www.genomesonline.org/) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2011, GOLD, now on version 4.0, contains information for 11 472 sequencing projects, of which 2907 have been completed and their sequence data has been deposited in a public repository. Out of these complete projects, 1918 are finished and 989 are permanent drafts. Moreover, GOLD contains information for 340 metagenome studies associated with 1927 metagenome samples. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about any (x) Sequence specification and beyond. PMID:22135293

  14. SAR matrices: automated extraction of information-rich SAR tables from large compound data sets.

    Science.gov (United States)

    Wassermann, Anne Mai; Haebel, Peter; Weskamp, Nils; Bajorath, Jürgen

    2012-07-23

    We introduce the SAR matrix data structure that is designed to elucidate SAR patterns produced by groups of structurally related active compounds, which are extracted from large data sets. SAR matrices are systematically generated and sorted on the basis of SAR information content. Matrix generation is computationally efficient and enables processing of large compound sets. The matrix format is reminiscent of SAR tables, and SAR patterns revealed by different categories of matrices are easily interpretable. The structural organization underlying matrix formation is more flexible than standard R-group decomposition schemes. Hence, the resulting matrices capture SAR information in a comprehensive manner.

  15. Comparison of methods of extracting information for meta-analysis of observational studies in nutritional epidemiology

    Directory of Open Access Journals (Sweden)

    Jong-Myon Bae

    2016-01-01

    Full Text Available OBJECTIVES: A common method for conducting a quantitative systematic review (QSR for observational studies related to nutritional epidemiology is the “highest versus lowest intake” method (HLM, in which only the information concerning the effect size (ES of the highest category of a food item is collected on the basis of its lowest category. However, in the interval collapsing method (ICM, a method suggested to enable a maximum utilization of all available information, the ES information is collected by collapsing all categories into a single category. This study aimed to compare the ES and summary effect size (SES between the HLM and ICM. METHODS: A QSR for evaluating the citrus fruit intake and risk of pancreatic cancer and calculating the SES by using the HLM was selected. The ES and SES were estimated by performing a meta-analysis using the fixed-effect model. The directionality and statistical significance of the ES and SES were used as criteria for determining the concordance between the HLM and ICM outcomes. RESULTS: No significant differences were observed in the directionality of SES extracted by using the HLM or ICM. The application of the ICM, which uses a broader information base, yielded more-consistent ES and SES, and narrower confidence intervals than the HLM. CONCLUSIONS: The ICM is advantageous over the HLM owing to its higher statistical accuracy in extracting information for QSR on nutritional epidemiology. The application of the ICM should hence be recommended for future studies.

  16. Evaluation of FTA ® paper for storage of oral meta-genomic DNA.

    Science.gov (United States)

    Foitzik, Magdalena; Stumpp, Sascha N; Grischke, Jasmin; Eberhard, Jörg; Stiesch, Meike

    2014-10-01

    The purpose of the present study was to evaluate the short-term storage of meta-genomic DNA from native oral biofilms on FTA(®) paper. Thirteen volunteers of both sexes received an acrylic splint for intraoral biofilm formation over a period of 48 hours. The biofilms were collected, resuspended in phosphate-buffered saline, and either stored on FTA(®) paper or directly processed by standard laboratory DNA extraction. The nucleic acid extraction efficiencies were evaluated by 16S rDNA targeted SSCP fingerprinting. The acquired banding pattern of FTA-derived meta-genomic DNA was compared to a standard DNA preparation protocol. Sensitivity and positive predictive values were calculated. The volunteers showed inter-individual differences in their bacterial species composition. A total of 200 bands were found for both methods and 85% of the banding patterns were equal, representing a sensitivity of 0.941 and a false-negative predictive value of 0.059. Meta-genomic DNA sampling, extraction, and adhesion using FTA(®) paper is a reliable method for storage of microbial DNA for a short period of time.

  17. Assembly of viral genomes from metagenomes

    Directory of Open Access Journals (Sweden)

    Saskia L Smits

    2014-12-01

    Full Text Available Viral infections remain a serious global health issue. Metagenomic approaches are increasingly used in the detection of novel viral pathogens but also to generate complete genomes of uncultivated viruses. In silico identification of complete viral genomes from sequence data would allow rapid phylogenetic characterization of these new viruses. Often, however, complete viral genomes are not recovered, but rather several distinct contigs derived from a single entity, some of which have no sequence homology to any known proteins. De novo assembly of single viruses from a metagenome is challenging, not only because of the lack of a reference genome, but also because of intrapopulation variation and uneven or insufficient coverage. Here we explored different assembly algorithms, remote homology searches, genome-specific sequence motifs, k-mer frequency ranking, and coverage profile binning to detect and obtain viral target genomes from metagenomes. All methods were tested on 454-generated sequencing datasets containing three recently described RNA viruses with a relatively large genome which were divergent to previously known viruses from the viral families Rhabdoviridae and Coronaviridae. Depending on specific characteristics of the target virus and the metagenomic community, different assembly and in silico gap closure strategies were successful in obtaining near complete viral genomes.

  18. Assembly of viral genomes from metagenomes

    NARCIS (Netherlands)

    S.L. Smits (Saskia); R. Bodewes (Rogier); A. Ruiz-Gonzalez (Aritz); V. Baumgärtner (Volkmar); M.P.G. Koopmans D.V.M. (Marion); A.D.M.E. Osterhaus (Albert); A. Schürch (Anita)

    2014-01-01

    textabstractViral infections remain a serious global health issue. Metagenomic approaches are increasingly used in the detection of novel viral pathogens but also to generate complete genomes of uncultivated viruses. In silico identification of complete viral genomes from sequence data would allow

  19. Tentacle: distributed quantification of genes in metagenomes.

    Science.gov (United States)

    Boulund, Fredrik; Sjögren, Anders; Kristiansson, Erik

    2015-01-01

    In metagenomics, microbial communities are sequenced at increasingly high resolution, generating datasets with billions of DNA fragments. Novel methods that can efficiently process the growing volumes of sequence data are necessary for the accurate analysis and interpretation of existing and upcoming metagenomes. Here we present Tentacle, which is a novel framework that uses distributed computational resources for gene quantification in metagenomes. Tentacle is implemented using a dynamic master-worker approach in which DNA fragments are streamed via a network and processed in parallel on worker nodes. Tentacle is modular, extensible, and comes with support for six commonly used sequence aligners. It is easy to adapt Tentacle to different applications in metagenomics and easy to integrate into existing workflows. Evaluations show that Tentacle scales very well with increasing computing resources. We illustrate the versatility of Tentacle on three different use cases. Tentacle is written for Linux in Python 2.7 and is published as open source under the GNU General Public License (v3). Documentation, tutorials, installation instructions, and the source code are freely available online at: http://bioinformatics.math.chalmers.se/tentacle.

  20. Separating metagenomic short reads into genomes via clustering

    Directory of Open Access Journals (Sweden)

    Tanaseichuk Olga

    2012-09-01

    Full Text Available Abstract Background The metagenomics approach allows the simultaneous sequencing of all genomes in an environmental sample. This results in high complexity datasets, where in addition to repeats and sequencing errors, the number of genomes and their abundance ratios are unknown. Recently developed next-generation sequencing (NGS technologies significantly improve the sequencing efficiency and cost. On the other hand, they result in shorter reads, which makes the separation of reads from different species harder. Among the existing computational tools for metagenomic analysis, there are similarity-based methods that use reference databases to align reads and composition-based methods that use composition patterns (i.e., frequencies of short words or l-mers to cluster reads. Similarity-based methods are unable to classify reads from unknown species without close references (which constitute the majority of reads. Since composition patterns are preserved only in significantly large fragments, composition-based tools cannot be used for very short reads, which becomes a significant limitation with the development of NGS. A recently proposed algorithm, AbundanceBin, introduced another method that bins reads based on predicted abundances of the genomes sequenced. However, it does not separate reads from genomes of similar abundance levels. Results In this work, we present a two-phase heuristic algorithm for separating short paired-end reads from different genomes in a metagenomic dataset. We use the observation that most of the l-mers belong to unique genomes when l is sufficiently large. The first phase of the algorithm results in clusters of l-mers each of which belongs to one genome. During the second phase, clusters are merged based on l-mer repeat information. These final clusters are used to assign reads. The algorithm could handle very short reads and sequencing errors. It is initially designed for genomes with similar abundance levels and then

  1. A viral metagenomic approach on a nonmetagenomic experiment

    DEFF Research Database (Denmark)

    Bovo, Samuele; Mazzoni, Gianluca; Ribani, Anisa

    2017-01-01

    Shot-gun next generation sequencing (NGS) on whole DNA extracted from specimens collected from mammals often produces reads that are not mapped (i.e. unmapped reads) on the host reference genome and that are usually discarded as by-products of the experiments. In this study, we mined Ion Torrent...... reads obtained by sequencing DNA isolated from archived blood samples collected from 100 performance tested Italian Large White pigs. Two reduced representation libraries were prepared from two DNA pools constructed each from 50 equimolar DNA samples. Bioinformatic analyses were carried out to mine...... unmapped reads on the reference pig genome that were obtained from the two NGS datasets. In silico analyses included read mapping and sequence assembly approaches for a viral metagenomic analysis using the NCBI Viral Genome Resource. Our approach identified sequences matching several viruses...

  2. Feature extraction and learning using context cue and Rényi entropy based mutual information

    DEFF Research Database (Denmark)

    Pan, Hong; Olsen, Søren Ingvor; Zhu, Yaping

    2015-01-01

    information. In particular, for feature extraction, we develop a new set of kernel descriptors−Context Kernel Descriptors (CKD), which enhance the original KDES by embedding the spatial context into the descriptors. Context cues contained in the context kernel enforce some degree of spatial consistency, thus...... improving the robustness of CKD. For feature learning and reduction, we propose a novel codebook learning method, based on a Rényi quadratic entropy based mutual information measure called Cauchy-Schwarz Quadratic Mutual Information (CSQMI), to learn a compact and discriminative CKD codebook. Projecting...... as the information about the underlying labels of the CKD using CSQMI. Thus the resulting codebook and reduced CKD are discriminative. We verify the effectiveness of our method on several public image benchmark datasets such as YaleB, Caltech-101 and CIFAR-10, as well as a challenging chicken feet dataset of our own...

  3. Method of extracting significant trouble information of nuclear power plants using probabilistic analysis technique

    International Nuclear Information System (INIS)

    Shimada, Yoshio; Miyazaki, Takamasa

    2005-01-01

    In order to analyze and evaluate large amounts of trouble information of overseas nuclear power plants, it is necessary to select information that is significant in terms of both safety and reliability. In this research, a method of efficiently and simply classifying degrees of importance of components in terms of safety and reliability while paying attention to root-cause components appearing in the information was developed. Regarding safety, the reactor core damage frequency (CDF), which is used in the probabilistic analysis of a reactor, was used. Regarding reliability, the automatic plant trip probability (APTP), which is used in the probabilistic analysis of automatic reactor trips, was used. These two aspects were reflected in the development of criteria for classifying degrees of importance of components. By applying these criteria, a simple method of extracting significant trouble information of overseas nuclear power plants was developed. (author)

  4. Automated concept-level information extraction to reduce the need for custom software and rules development.

    Science.gov (United States)

    D'Avolio, Leonard W; Nguyen, Thien M; Goryachev, Sergey; Fiore, Louis D

    2011-01-01

    Despite at least 40 years of promising empirical performance, very few clinical natural language processing (NLP) or information extraction systems currently contribute to medical science or care. The authors address this gap by reducing the need for custom software and rules development with a graphical user interface-driven, highly generalizable approach to concept-level retrieval. A 'learn by example' approach combines features derived from open-source NLP pipelines with open-source machine learning classifiers to automatically and iteratively evaluate top-performing configurations. The Fourth i2b2/VA Shared Task Challenge's concept extraction task provided the data sets and metrics used to evaluate performance. Top F-measure scores for each of the tasks were medical problems (0.83), treatments (0.82), and tests (0.83). Recall lagged precision in all experiments. Precision was near or above 0.90 in all tasks. Discussion With no customization for the tasks and less than 5 min of end-user time to configure and launch each experiment, the average F-measure was 0.83, one point behind the mean F-measure of the 22 entrants in the competition. Strong precision scores indicate the potential of applying the approach for more specific clinical information extraction tasks. There was not one best configuration, supporting an iterative approach to model creation. Acceptable levels of performance can be achieved using fully automated and generalizable approaches to concept-level information extraction. The described implementation and related documentation is available for download.

  5. Metagenomic analyses of bacteria on human hairs: a qualitative assessment for applications in forensic science.

    Science.gov (United States)

    Tridico, Silvana R; Murray, Dáithí C; Addison, Jayne; Kirkbride, Kenneth P; Bunce, Michael

    2014-01-01

    Mammalian hairs are one of the most ubiquitous types of trace evidence collected in the course of forensic investigations. However, hairs that are naturally shed or that lack roots are problematic substrates for DNA profiling; these hair types often contain insufficient nuclear DNA to yield short tandem repeat (STR) profiles. Whilst there have been a number of initial investigations evaluating the value of metagenomics analyses for forensic applications (e.g. examination of computer keyboards), there have been no metagenomic evaluations of human hairs-a substrate commonly encountered during forensic practice. This present study attempts to address this forensic capability gap, by conducting a qualitative assessment into the applicability of metagenomic analyses of human scalp and pubic hair. Forty-two DNA extracts obtained from human scalp and pubic hairs generated a total of 79,766 reads, yielding 39,814 reads post control and abundance filtering. The results revealed the presence of unique combinations of microbial taxa that can enable discrimination between individuals and signature taxa indigenous to female pubic hairs. Microbial data from a single co-habiting couple added an extra dimension to the study by suggesting that metagenomic analyses might be of evidentiary value in sexual assault cases when other associative evidence is not present. Of all the data generated in this study, the next-generation sequencing (NGS) data generated from pubic hair held the most potential for forensic applications. Metagenomic analyses of human hairs may provide independent data to augment other forensic results and possibly provide association between victims of sexual assault and offender when other associative evidence is absent. Based on results garnered in the present study, we believe that with further development, bacterial profiling of hair will become a valuable addition to the forensic toolkit.

  6. The Feature Extraction Based on Texture Image Information for Emotion Sensing in Speech

    Directory of Open Access Journals (Sweden)

    Kun-Ching Wang

    2014-09-01

    Full Text Available In this paper, we present a novel texture image feature for Emotion Sensing in Speech (ESS. This idea is based on the fact that the texture images carry emotion-related information. The feature extraction is derived from time-frequency representation of spectrogram images. First, we transform the spectrogram as a recognizable image. Next, we use a cubic curve to enhance the image contrast. Then, the texture image information (TII derived from the spectrogram image can be extracted by using Laws’ masks to characterize emotional state. In order to evaluate the effectiveness of the proposed emotion recognition in different languages, we use two open emotional databases including the Berlin Emotional Speech Database (EMO-DB and eNTERFACE corpus and one self-recorded database (KHUSC-EmoDB, to evaluate the performance cross-corpora. The results of the proposed ESS system are presented using support vector machine (SVM as a classifier. Experimental results show that the proposed TII-based feature extraction inspired by visual perception can provide significant classification for ESS systems. The two-dimensional (2-D TII feature can provide the discrimination between different emotions in visual expressions except for the conveyance pitch and formant tracks. In addition, the de-noising in 2-D images can be more easily completed than de-noising in 1-D speech.

  7. An Accurate Integral Method for Vibration Signal Based on Feature Information Extraction

    Directory of Open Access Journals (Sweden)

    Yong Zhu

    2015-01-01

    Full Text Available After summarizing the advantages and disadvantages of current integral methods, a novel vibration signal integral method based on feature information extraction was proposed. This method took full advantage of the self-adaptive filter characteristic and waveform correction feature of ensemble empirical mode decomposition in dealing with nonlinear and nonstationary signals. This research merged the superiorities of kurtosis, mean square error, energy, and singular value decomposition on signal feature extraction. The values of the four indexes aforementioned were combined into a feature vector. Then, the connotative characteristic components in vibration signal were accurately extracted by Euclidean distance search, and the desired integral signals were precisely reconstructed. With this method, the interference problem of invalid signal such as trend item and noise which plague traditional methods is commendably solved. The great cumulative error from the traditional time-domain integral is effectively overcome. Moreover, the large low-frequency error from the traditional frequency-domain integral is successfully avoided. Comparing with the traditional integral methods, this method is outstanding at removing noise and retaining useful feature information and shows higher accuracy and superiority.

  8. A cascade of classifiers for extracting medication information from discharge summaries

    Directory of Open Access Journals (Sweden)

    Halgrim Scott

    2011-07-01

    Full Text Available Abstract Background Extracting medication information from clinical records has many potential applications, and recently published research, systems, and competitions reflect an interest therein. Much of the early extraction work involved rules and lexicons, but more recently machine learning has been applied to the task. Methods We present a hybrid system consisting of two parts. The first part, field detection, uses a cascade of statistical classifiers to identify medication-related named entities. The second part uses simple heuristics to link those entities into medication events. Results The system achieved performance that is comparable to other approaches to the same task. This performance is further improved by adding features that reference external medication name lists. Conclusions This study demonstrates that our hybrid approach outperforms purely statistical or rule-based systems. The study also shows that a cascade of classifiers works better than a single classifier in extracting medication information. The system is available as is upon request from the first author.

  9. Three-dimensional information extraction from GaoFen-1 satellite images for landslide monitoring

    Science.gov (United States)

    Wang, Shixin; Yang, Baolin; Zhou, Yi; Wang, Futao; Zhang, Rui; Zhao, Qing

    2018-05-01

    To more efficiently use GaoFen-1 (GF-1) satellite images for landslide emergency monitoring, a Digital Surface Model (DSM) can be generated from GF-1 across-track stereo image pairs to build a terrain dataset. This study proposes a landslide 3D information extraction method based on the terrain changes of slope objects. The slope objects are mergences of segmented image objects which have similar aspects; and the terrain changes are calculated from the post-disaster Digital Elevation Model (DEM) from GF-1 and the pre-disaster DEM from GDEM V2. A high mountain landslide that occurred in Wenchuan County, Sichuan Province is used to conduct a 3D information extraction test. The extracted total area of the landslide is 22.58 ha; the displaced earth volume is 652,100 m3; and the average sliding direction is 263.83°. The accuracies of them are 0.89, 0.87 and 0.95, respectively. Thus, the proposed method expands the application of GF-1 satellite images to the field of landslide emergency monitoring.

  10. THE EXTRACTION OF INDOOR BUILDING INFORMATION FROM BIM TO OGC INDOORGML

    Directory of Open Access Journals (Sweden)

    T.-A. Teo

    2017-07-01

    Full Text Available Indoor Spatial Data Infrastructure (indoor-SDI is an important SDI for geosptial analysis and location-based services. Building Information Model (BIM has high degree of details in geometric and semantic information for building. This study proposed direct conversion schemes to extract indoor building information from BIM to OGC IndoorGML. The major steps of the research include (1 topological conversion from building model into indoor network model; and (2 generation of IndoorGML. The topological conversion is a major process of generating and mapping nodes and edges from IFC to indoorGML. Node represents every space (e.g. IfcSpace and objects (e.g. IfcDoor in the building while edge shows the relationships between nodes. According to the definition of IndoorGML, the topological model in the dual space is also represented as a set of nodes and edges. These definitions of IndoorGML are the same as in the indoor network. Therefore, we can extract the necessary data in the indoor network and easily convert them into IndoorGML based on IndoorGML Schema. The experiment utilized a real BIM model to examine the proposed method. The experimental results indicated that the 3D indoor model (i.e. IndoorGML model can be automatically imported from IFC model by the proposed procedure. In addition, the geometric and attribute of building elements are completely and correctly converted from BIM to indoor-SDI.

  11. Methods from Information Extraction from LIDAR Intensity Data and Multispectral LIDAR Technology

    Science.gov (United States)

    Scaioni, M.; Höfle, B.; Baungarten Kersting, A. P.; Barazzetti, L.; Previtali, M.; Wujanz, D.

    2018-04-01

    LiDAR is a consolidated technology for topographic mapping and 3D reconstruction, which is implemented in several platforms On the other hand, the exploitation of the geometric information has been coupled by the use of laser intensity, which may provide additional data for multiple purposes. This option has been emphasized by the availability of sensors working on different wavelength, thus able to provide additional information for classification of surfaces and objects. Several applications ofmonochromatic and multi-spectral LiDAR data have been already developed in different fields: geosciences, agriculture, forestry, building and cultural heritage. The use of intensity data to extract measures of point cloud quality has been also developed. The paper would like to give an overview on the state-of-the-art of these techniques, and to present the modern technologies for the acquisition of multispectral LiDAR data. In addition, the ISPRS WG III/5 on `Information Extraction from LiDAR Intensity Data' has collected and made available a few open data sets to support scholars to do research on this field. This service is presented and data sets delivered so far as are described.

  12. The effect of informed consent on stress levels associated with extraction of impacted mandibular third molars.

    Science.gov (United States)

    Casap, Nardy; Alterman, Michael; Sharon, Guy; Samuni, Yuval

    2008-05-01

    To evaluate the effect of informed consent on stress levels associated with removal of impacted mandibular third molars. A total of 60 patients scheduled for extraction of impacted mandibular third molars participated in this study. The patients were unaware of the study's objectives. Data from 20 patients established the baseline levels of electrodermal activity (EDA). The remaining 40 patients were randomly assigned into 2 equal groups receiving either a detailed document of informed consent, disclosing the possible risks involved with the surgery, or a simplified version. Pulse, blood pressure, and EDA were monitored before, during, and after completion of the consent document. Changes in EDA, but not in blood pressure, were measured on completion of either version of the consent document. A greater increase in EDA was associated with the detailed version of the consent document (P = .004). A similar concomitant increase (although nonsignificant) in pulse values was monitored on completion of both versions. Completion of overdisclosed document of informed consent is associated with changes in physiological parameters. The results suggest that overdetailed listing and disclosure before extraction of impacted mandibular third molars can increase patient stress.

  13. METHODS FROM INFORMATION EXTRACTION FROM LIDAR INTENSITY DATA AND MULTISPECTRAL LIDAR TECHNOLOGY

    Directory of Open Access Journals (Sweden)

    M. Scaioni

    2018-04-01

    Full Text Available LiDAR is a consolidated technology for topographic mapping and 3D reconstruction, which is implemented in several platforms On the other hand, the exploitation of the geometric information has been coupled by the use of laser intensity, which may provide additional data for multiple purposes. This option has been emphasized by the availability of sensors working on different wavelength, thus able to provide additional information for classification of surfaces and objects. Several applications ofmonochromatic and multi-spectral LiDAR data have been already developed in different fields: geosciences, agriculture, forestry, building and cultural heritage. The use of intensity data to extract measures of point cloud quality has been also developed. The paper would like to give an overview on the state-of-the-art of these techniques, and to present the modern technologies for the acquisition of multispectral LiDAR data. In addition, the ISPRS WG III/5 on ‘Information Extraction from LiDAR Intensity Data’ has collected and made available a few open data sets to support scholars to do research on this field. This service is presented and data sets delivered so far as are described.

  14. Metagenomic studies of the Red Sea.

    Science.gov (United States)

    Behzad, Hayedeh; Ibarra, Martin Augusto; Mineta, Katsuhiko; Gojobori, Takashi

    2016-02-01

    Metagenomics has significantly advanced the field of marine microbial ecology, revealing the vast diversity of previously unknown microbial life forms in different marine niches. The tremendous amount of data generated has enabled identification of a large number of microbial genes (metagenomes), their community interactions, adaptation mechanisms, and their potential applications in pharmaceutical and biotechnology-based industries. Comparative metagenomics reveals that microbial diversity is a function of the local environment, meaning that unique or unusual environments typically harbor novel microbial species with unique genes and metabolic pathways. The Red Sea has an abundance of unique characteristics; however, its microbiota is one of the least studied among marine environments. The Red Sea harbors approximately 25 hot anoxic brine pools, plus a vibrant coral reef ecosystem. Physiochemical studies describe the Red Sea as an oligotrophic environment that contains one of the warmest and saltiest waters in the world with year-round high UV radiations. These characteristics are believed to have shaped the evolution of microbial communities in the Red Sea. Over-representation of genes involved in DNA repair, high-intensity light responses, and osmoregulation were found in the Red Sea metagenomic databases suggesting acquisition of specific environmental adaptation by the Red Sea microbiota. The Red Sea brine pools harbor a diverse range of halophilic and thermophilic bacterial and archaeal communities, which are potential sources of enzymes for pharmaceutical and biotechnology-based application. Understanding the mechanisms of these adaptations and their function within the larger ecosystem could also prove useful in light of predicted global warming scenarios where global ocean temperatures are expected to rise by 1-3°C in the next few decades. In this review, we provide an overview of the published metagenomic studies that were conducted in the Red Sea, and

  15. From cultured to uncultured genome sequences: metagenomics and modeling microbial ecosystems.

    Science.gov (United States)

    Garza, Daniel R; Dutilh, Bas E

    2015-11-01

    Microorganisms and the viruses that infect them are the most numerous biological entities on Earth and enclose its greatest biodiversity and genetic reservoir. With strength in their numbers, these microscopic organisms are major players in the cycles of energy and matter that sustain all life. Scientists have only scratched the surface of this vast microbial world through culture-dependent methods. Recent developments in generating metagenomes, large random samples of nucleic acid sequences isolated directly from the environment, are providing comprehensive portraits of the composition, structure, and functioning of microbial communities. Moreover, advances in metagenomic analysis have created the possibility of obtaining complete or nearly complete genome sequences from uncultured microorganisms, providing important means to study their biology, ecology, and evolution. Here we review some of the recent developments in the field of metagenomics, focusing on the discovery of genetic novelty and on methods for obtaining uncultured genome sequences, including through the recycling of previously published datasets. Moreover we discuss how metagenomics has become a core scientific tool to characterize eco-evolutionary patterns of microbial ecosystems, thus allowing us to simultaneously discover new microbes and study their natural communities. We conclude by discussing general guidelines and challenges for modeling the interactions between uncultured microorganisms and viruses based on the information contained in their genome sequences. These models will significantly advance our understanding of the functioning of microbial ecosystems and the roles of microbes in the environment.

  16. Ten years of maintaining and expanding a microbial genome and metagenome analysis system.

    Science.gov (United States)

    Markowitz, Victor M; Chen, I-Min A; Chu, Ken; Pati, Amrita; Ivanova, Natalia N; Kyrpides, Nikos C

    2015-11-01

    Launched in March 2005, the Integrated Microbial Genomes (IMG) system is a comprehensive data management system that supports multidimensional comparative analysis of genomic data. At the core of the IMG system is a data warehouse that contains genome and metagenome datasets sequenced at the Joint Genome Institute or provided by scientific users, as well as public genome datasets available at the National Center for Biotechnology Information Genbank sequence data archive. Genomes and metagenome datasets are processed using IMG's microbial genome and metagenome sequence data processing pipelines and are integrated into the data warehouse using IMG's data integration toolkits. Microbial genome and metagenome application specific data marts and user interfaces provide access to different subsets of IMG's data and analysis toolkits. This review article revisits IMG's original aims, highlights key milestones reached by the system during the past 10 years, and discusses the main challenges faced by a rapidly expanding system, in particular the complexity of maintaining such a system in an academic setting with limited budgets and computing and data management infrastructure. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. Enrichment allows identification of diverse, rare elements in metagenomic resistome-virulome sequencing.

    Science.gov (United States)

    Noyes, Noelle R; Weinroth, Maggie E; Parker, Jennifer K; Dean, Chris J; Lakin, Steven M; Raymond, Robert A; Rovira, Pablo; Doster, Enrique; Abdo, Zaid; Martin, Jennifer N; Jones, Kenneth L; Ruiz, Jaime; Boucher, Christina A; Belk, Keith E; Morley, Paul S

    2017-10-17

    Shotgun metagenomic sequencing is increasingly utilized as a tool to evaluate ecological-level dynamics of antimicrobial resistance and virulence, in conjunction with microbiome analysis. Interest in use of this method for environmental surveillance of antimicrobial resistance and pathogenic microorganisms is also increasing. In published metagenomic datasets, the total of all resistance- and virulence-related sequences accounts for enrichment system that incorporates unique molecular indices to count DNA molecules and correct for enrichment bias. The use of the bait-capture and enrichment system significantly increased on-target sequencing of the resistome-virulome, enabling detection of an additional 1441 gene accessions and revealing a low-abundance portion of the resistome-virulome that was more diverse and compositionally different than that detected by more traditional metagenomic assays. The low-abundance portion of the resistome-virulome also contained resistance genes with public health importance, such as extended-spectrum betalactamases, that were not detected using traditional shotgun metagenomic sequencing. In addition, the use of the bait-capture and enrichment system enabled identification of rare resistance gene haplotypes that were used to discriminate between sample origins. These results demonstrate that the rare resistome-virulome contains valuable and unique information that can be utilized for both surveillance and population genetic investigations of resistance. Access to the rare resistome-virulome using the bait-capture and enrichment system validated in this study can greatly advance our understanding of microbiome-resistome dynamics.

  18. About increasing informativity of diagnostic system of asynchronous electric motor by extracting additional information from values of consumed current parameter

    Science.gov (United States)

    Zhukovskiy, Y.; Korolev, N.; Koteleva, N.

    2018-05-01

    This article is devoted to expanding the possibilities of assessing the technical state of the current consumption of asynchronous electric drives, as well as increasing the information capacity of diagnostic methods, in conditions of limited access to equipment and incompleteness of information. The method of spectral analysis of the electric drive current can be supplemented by an analysis of the components of the current of the Park's vector. The research of the hodograph evolution in the moment of appearance and development of defects was carried out using the example of current asymmetry in the phases of an induction motor. The result of the study is the new diagnostic parameters of the asynchronous electric drive. During the research, it was proved that the proposed diagnostic parameters allow determining the type and level of the defect. At the same time, there is no need to stop the equipment and taky it out of service for repair. Modern digital control and monitoring systems can use the proposed parameters based on the stator current of an electrical machine to improve the accuracy and reliability of obtaining diagnostic patterns and predicting their changes in order to improve the equipment maintenance systems. This approach can also be used in systems and objects where there are significant parasitic vibrations and unsteady loads. The extraction of useful information can be carried out in electric drive systems in the structure of which there is a power electric converter.

  19. Multi-Paradigm and Multi-Lingual Information Extraction as Support for Medical Web Labelling Authorities

    Directory of Open Access Journals (Sweden)

    Martin Labsky

    2010-10-01

    Full Text Available Until recently, quality labelling of medical web content has been a pre-dominantly manual activity. However, the advances in automated text processing opened the way to computerised support of this activity. The core enabling technology is information extraction (IE. However, the heterogeneity of websites offering medical content imposes particular requirements on the IE techniques to be applied. In the paper we discuss these requirements and describe a multi-paradigm approach to IE addressing them. Experiments on multi-lingual data are reported. The research has been carried out within the EU MedIEQ project.

  20. Scholarly Information Extraction Is Going to Make a Quantum Leap with PubMed Central (PMC).

    Science.gov (United States)

    Matthies, Franz; Hahn, Udo

    2017-01-01

    With the increasing availability of complete full texts (journal articles), rather than their surrogates (titles, abstracts), as resources for text analytics, entirely new opportunities arise for information extraction and text mining from scholarly publications. Yet, we gathered evidence that a range of problems are encountered for full-text processing when biomedical text analytics simply reuse existing NLP pipelines which were developed on the basis of abstracts (rather than full texts). We conducted experiments with four different relation extraction engines all of which were top performers in previous BioNLP Event Extraction Challenges. We found that abstract-trained engines loose up to 6.6% F-score points when run on full-text data. Hence, the reuse of existing abstract-based NLP software in a full-text scenario is considered harmful because of heavy performance losses. Given the current lack of annotated full-text resources to train on, our study quantifies the price paid for this short cut.

  1. GenomePeek—an online tool for prokaryotic genome and metagenome analysis

    Directory of Open Access Journals (Sweden)

    Katelyn McNair

    2015-06-01

    Full Text Available As more and more prokaryotic sequencing takes place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations; such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping error rates low, as well as offering unique data visualization options.

  2. Metagenomics as a tool to obtain full genomes of process-critical bacteria in engineered systems

    DEFF Research Database (Denmark)

    Albertsen, Mads; Hugenholtz, Philip; Tyson, Gene W.

    of the community. The assembled genomes include many of the process-critical bacteria involved in wastewater treatment, such as Competibacter, Tetrasphaera and TM7. The approach is not limited to different extraction methods, but can be applied to any treatment that results in different relative abundance......Bacteria play a pivotal role in engineered systems such as wastewater treatment plants. Obtaining genomes of the bacteria provides the genetic potential of the system and also allows studies of in situ functions through transcriptomics and proteomics. Hence, it enables correlations of operational......, the sequencing of bulk genomic DNA from environmental samples, has the potential to provide genomes of this uncultured majority. However, so far only few bacterial genomes have been obtained from metagenomic data. In this study we present a new approach to obtain individual genomes from metagenomes. We deeply...

  3. Accurate facade feature extraction method for buildings from three-dimensional point cloud data considering structural information

    Science.gov (United States)

    Wang, Yongzhi; Ma, Yuqing; Zhu, A.-xing; Zhao, Hui; Liao, Lixia

    2018-05-01

    Facade features represent segmentations of building surfaces and can serve as a building framework. Extracting facade features from three-dimensional (3D) point cloud data (3D PCD) is an efficient method for 3D building modeling. By combining the advantages of 3D PCD and two-dimensional optical images, this study describes the creation of a highly accurate building facade feature extraction method from 3D PCD with a focus on structural information. The new extraction method involves three major steps: image feature extraction, exploration of the mapping method between the image features and 3D PCD, and optimization of the initial 3D PCD facade features considering structural information. Results show that the new method can extract the 3D PCD facade features of buildings more accurately and continuously. The new method is validated using a case study. In addition, the effectiveness of the new method is demonstrated by comparing it with the range image-extraction method and the optical image-extraction method in the absence of structural information. The 3D PCD facade features extracted by the new method can be applied in many fields, such as 3D building modeling and building information modeling.

  4. Genomics and metagenomics in medical microbiology.

    Science.gov (United States)

    Padmanabhan, Roshan; Mishra, Ajay Kumar; Raoult, Didier; Fournier, Pierre-Edouard

    2013-12-01

    Over the last two decades, sequencing tools have evolved from laborious time-consuming methodologies to real-time detection and deciphering of genomic DNA. Genome sequencing, especially using next generation sequencing (NGS) has revolutionized the landscape of microbiology and infectious disease. This deluge of sequencing data has not only enabled advances in fundamental biology but also helped improve diagnosis, typing of pathogen, virulence and antibiotic resistance detection, and development of new vaccines and culture media. In addition, NGS also enabled efficient analysis of complex human micro-floras, both commensal, and pathological, through metagenomic methods, thus helping the comprehension and management of human diseases such as obesity. This review summarizes technological advances in genomics and metagenomics relevant to the field of medical microbiology. Copyright © 2013 Elsevier B.V. All rights reserved.

  5. Construction and screening of marine metagenomic libraries.

    Science.gov (United States)

    Weiland, Nancy; Löscher, Carolin; Metzger, Rebekka; Schmitz, Ruth

    2010-01-01

    Marine microbial communities are highly diverse and have evolved during extended evolutionary processes of physiological adaptations under the influence of a variety of ecological conditions and selection pressures. They harbor an enormous diversity of microbes with still unknown and probably new physiological characteristics. Besides, the surfaces of marine multicellular organisms are typically covered by a consortium of epibiotic bacteria and act as barriers, where diverse interactions between microorganisms and hosts take place. Thus, microbial diversity in the water column of the oceans and the microbial consortia on marine tissues of multicellular organisms are rich sources for isolating novel bioactive compounds and genes. Here we describe the sampling, construction of large-insert metagenomic libraries from marine habitats and exemplarily one function based screen of metagenomic clones.

  6. Developing an Approach to Prioritize River Restoration using Data Extracted from Flood Risk Information System Databases.

    Science.gov (United States)

    Vimal, S.; Tarboton, D. G.; Band, L. E.; Duncan, J. M.; Lovette, J. P.; Corzo, G.; Miles, B.

    2015-12-01

    Prioritizing river restoration requires information on river geometry. In many states in the US detailed river geometry has been collected for floodplain mapping and is available in Flood Risk Information Systems (FRIS). In particular, North Carolina has, for its 100 Counties, developed a database of numerous HEC-RAS models which are available through its Flood Risk Information System (FRIS). These models that include over 260 variables were developed and updated by numerous contractors. They contain detailed surveyed or LiDAR derived cross-sections and modeled flood extents for different extreme event return periods. In this work, over 4700 HEC-RAS models' data was integrated and upscaled to utilize detailed cross-section information and 100-year modelled flood extent information to enable river restoration prioritization for the entire state of North Carolina. We developed procedures to extract geomorphic properties such as entrenchment ratio, incision ratio, etc. from these models. Entrenchment ratio quantifies the vertical containment of rivers and thereby their vulnerability to flooding and incision ratio quantifies the depth per unit width. A map of entrenchment ratio for the whole state was derived by linking these model results to a geodatabase. A ranking of highly entrenched counties enabling prioritization for flood allowance and mitigation was obtained. The results were shared through HydroShare and web maps developed for their visualization using Google Maps Engine API.

  7. Extracting Low-Frequency Information from Time Attenuation in Elastic Waveform Inversion

    Science.gov (United States)

    Guo, Xuebao; Liu, Hong; Shi, Ying; Wang, Weihong

    2017-03-01

    Low-frequency information is crucial for recovering background velocity, but the lack of low-frequency information in field data makes inversion impractical without accurate initial models. Laplace-Fourier domain waveform inversion can recover a smooth model from real data without low-frequency information, which can be used for subsequent inversion as an ideal starting model. In general, it also starts with low frequencies and includes higher frequencies at later inversion stages, while the difference is that its ultralow frequency information comes from the Laplace-Fourier domain. Meanwhile, a direct implementation of the Laplace-transformed wavefield using frequency domain inversion is also very convenient. However, because broad frequency bands are often used in the pure time domain waveform inversion, it is difficult to extract the wavefields dominated by low frequencies in this case. In this paper, low-frequency components are constructed by introducing time attenuation into the recorded residuals, and the rest of the method is identical to the traditional time domain inversion. Time windowing and frequency filtering are also applied to mitigate the ambiguity of the inverse problem. Therefore, we can start at low frequencies and to move to higher frequencies. The experiment shows that the proposed method can achieve a good inversion result in the presence of a linear initial model and records without low-frequency information.

  8. An Experimental Metagenome Data Management and AnalysisSystem

    Energy Technology Data Exchange (ETDEWEB)

    Markowitz, Victor M.; Korzeniewski, Frank; Palaniappan, Krishna; Szeto, Ernest; Ivanova, Natalia N.; Kyrpides, Nikos C.; Hugenholtz, Philip

    2006-03-01

    The application of shotgun sequencing to environmental samples has revealed a new universe of microbial community genomes (metagenomes) involving previously uncultured organisms. Metagenome analysis, which is expected to provide a comprehensive picture of the gene functions and metabolic capacity of microbial community, needs to be conducted in the context of a comprehensive data management and analysis system. We present in this paper IMG/M, an experimental metagenome data management and analysis system that is based on the Integrated Microbial Genomes (IMG) system. IMG/M provides tools and viewers for analyzing both metagenomes and isolate genomes individually or in a comparative context.

  9. Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

    Directory of Open Access Journals (Sweden)

    Koji Iwano

    2007-03-01

    Full Text Available This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.

  10. Approaching the largest ‘API’: extracting information from the Internet with Python

    Directory of Open Access Journals (Sweden)

    Jonathan E. Germann

    2018-02-01

    Full Text Available This article explores the need for libraries to algorithmically access and manipulate the world’s largest API: the Internet. The billions of pages on the ‘Internet API’ (HTTP, HTML, CSS, XPath, DOM, etc. are easily accessible and manipulable. Libraries can assist in creating meaning through the datafication of information on the world wide web. Because most information is created for human consumption, some programming is required for automated extraction. Python is an easy-to-learn programming language with extensive packages and community support for web page automation. Four packages (Urllib, Selenium, BeautifulSoup, Scrapy in Python can automate almost any web page for all sized projects. An example warrant data project is explained to illustrate how well Python packages can manipulate web pages to create meaning through assembling custom datasets.

  11. DEVELOPMENT OF AUTOMATIC EXTRACTION METHOD FOR ROAD UPDATE INFORMATION BASED ON PUBLIC WORK ORDER OUTLOOK

    Science.gov (United States)

    Sekimoto, Yoshihide; Nakajo, Satoru; Minami, Yoshitaka; Yamaguchi, Syohei; Yamada, Harutoshi; Fuse, Takashi

    Recently, disclosure of statistic data, representing financial effects or burden for public work, through each web site of national or local government, enables us to discuss macroscopic financial trends. However, it is still difficult to grasp a basic property nationwide how each spot was changed by public work. In this research, our research purpose is to collect road update information reasonably which various road managers provide, in order to realize efficient updating of various maps such as car navigation maps. In particular, we develop the system extracting public work concerned and registering summary including position information to database automatically from public work order outlook, released by each local government, combinating some web mining technologies. Finally, we collect and register several tens of thousands from web site all over Japan, and confirm the feasibility of our method.

  12. Geopositioning with a quadcopter: Extracted feature locations and predicted accuracy without a priori sensor attitude information

    Science.gov (United States)

    Dolloff, John; Hottel, Bryant; Edwards, David; Theiss, Henry; Braun, Aaron

    2017-05-01

    This paper presents an overview of the Full Motion Video-Geopositioning Test Bed (FMV-GTB) developed to investigate algorithm performance and issues related to the registration of motion imagery and subsequent extraction of feature locations along with predicted accuracy. A case study is included corresponding to a video taken from a quadcopter. Registration of the corresponding video frames is performed without the benefit of a priori sensor attitude (pointing) information. In particular, tie points are automatically measured between adjacent frames using standard optical flow matching techniques from computer vision, an a priori estimate of sensor attitude is then computed based on supplied GPS sensor positions contained in the video metadata and a photogrammetric/search-based structure from motion algorithm, and then a Weighted Least Squares adjustment of all a priori metadata across the frames is performed. Extraction of absolute 3D feature locations, including their predicted accuracy based on the principles of rigorous error propagation, is then performed using a subset of the registered frames. Results are compared to known locations (check points) over a test site. Throughout this entire process, no external control information (e.g. surveyed points) is used other than for evaluation of solution errors and corresponding accuracy.

  13. Inexperienced clinicians can extract pathoanatomic information from MRI narrative reports with high reproducability for use in research/quality assurance

    DEFF Research Database (Denmark)

    Kent, Peter; Briggs, Andrew M; Albert, Hanne Birgit

    2011-01-01

    Background Although reproducibility in reading MRI images amongst radiologists and clinicians has been studied previously, no studies have examined the reproducibility of inexperienced clinicians in extracting pathoanatomic information from magnetic resonance imaging (MRI) narrative reports and t...

  14. [Extraction of buildings three-dimensional information from high-resolution satellite imagery based on Barista software].

    Science.gov (United States)

    Zhang, Pei-feng; Hu, Yuan-man; He, Hong-shi

    2010-05-01

    The demand for accurate and up-to-date spatial information of urban buildings is becoming more and more important for urban planning, environmental protection, and other vocations. Today's commercial high-resolution satellite imagery offers the potential to extract the three-dimensional information of urban buildings. This paper extracted the three-dimensional information of urban buildings from QuickBird imagery, and validated the precision of the extraction based on Barista software. It was shown that the extraction of three-dimensional information of the buildings from high-resolution satellite imagery based on Barista software had the advantages of low professional level demand, powerful universality, simple operation, and high precision. One pixel level of point positioning and height determination accuracy could be achieved if the digital elevation model (DEM) and sensor orientation model had higher precision and the off-Nadir View Angle was relatively perfect.

  15. MetaQUAST: evaluation of metagenome assemblies.

    Science.gov (United States)

    Mikheenko, Alla; Saveliev, Vladislav; Gurevich, Alexey

    2016-04-01

    During the past years we have witnessed the rapid development of new metagenome assembly methods. Although there are many benchmark utilities designed for single-genome assemblies, there is no well-recognized evaluation and comparison tool for metagenomic-specific analogues. In this article, we present MetaQUAST, a modification of QUAST, the state-of-the-art tool for genome assembly evaluation based on alignment of contigs to a reference. MetaQUAST addresses such metagenome datasets features as (i) unknown species content by detecting and downloading reference sequences, (ii) huge diversity by giving comprehensive reports for multiple genomes and (iii) presence of highly relative species by detecting chimeric contigs. We demonstrate MetaQUAST performance by comparing several leading assemblers on one simulated and two real datasets. http://bioinf.spbau.ru/metaquast aleksey.gurevich@spbu.ru Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  16. Phylogenetic convolutional neural networks in metagenomics.

    Science.gov (United States)

    Fioravanti, Diego; Giarratano, Ylenia; Maggio, Valerio; Agostinelli, Claudio; Chierici, Marco; Jurman, Giuseppe; Furlanello, Cesare

    2018-03-08

    Convolutional Neural Networks can be effectively used only when data are endowed with an intrinsic concept of neighbourhood in the input space, as is the case of pixels in images. We introduce here Ph-CNN, a novel deep learning architecture for the classification of metagenomics data based on the Convolutional Neural Networks, with the patristic distance defined on the phylogenetic tree being used as the proximity measure. The patristic distance between variables is used together with a sparsified version of MultiDimensional Scaling to embed the phylogenetic tree in a Euclidean space. Ph-CNN is tested with a domain adaptation approach on synthetic data and on a metagenomics collection of gut microbiota of 38 healthy subjects and 222 Inflammatory Bowel Disease patients, divided in 6 subclasses. Classification performance is promising when compared to classical algorithms like Support Vector Machines and Random Forest and a baseline fully connected neural network, e.g. the Multi-Layer Perceptron. Ph-CNN represents a novel deep learning approach for the classification of metagenomics data. Operatively, the algorithm has been implemented as a custom Keras layer taking care of passing to the following convolutional layer not only the data but also the ranked list of neighbourhood of each sample, thus mimicking the case of image data, transparently to the user.

  17. A retrospective metagenomics approach to studying Blastocystis.

    Science.gov (United States)

    Andersen, Lee O'Brien; Bonde, Ida; Nielsen, Henrik Bjørn; Stensvold, Christen Rune

    2015-07-01

    Blastocystis is a common single-celled intestinal parasitic genus, comprising several subtypes. Here, we screened data obtained by metagenomic analysis of faecal DNA for Blastocystis by searching for subtype-specific genes in coabundance gene groups, which are groups of genes that covary across a selection of 316 human faecal samples, hence representing genes originating from a single subtype. The 316 faecal samples were from 236 healthy individuals, 13 patients with Crohn's disease (CD) and 67 patients with ulcerative colitis (UC). The prevalence of Blastocystis was 20.3% in the healthy individuals and 14.9% in patients with UC. Meanwhile, Blastocystis was absent in patients with CD. Individuals with intestinal microbiota dominated by Bacteroides were much less prone to having Blastocystis-positive stool (Matthew's correlation coefficient = -0.25, P < 0.0001) than individuals with Ruminococcus- and Prevotella-driven enterotypes. This is the first study to investigate the relationship between Blastocystis and communities of gut bacteria using a metagenomics approach. The study serves as an example of how it is possible to retrospectively investigate microbial eukaryotic communities in the gut using metagenomic datasets targeting the bacterial component of the intestinal microbiome and the interplay between these microbial communities. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  18. Bayesian mixture analysis for metagenomic community profiling.

    Science.gov (United States)

    Morfopoulou, Sofia; Plagnol, Vincent

    2015-09-15

    Deep sequencing of clinical samples is now an established tool for the detection of infectious pathogens, with direct medical applications. The large amount of data generated produces an opportunity to detect species even at very low levels, provided that computational tools can effectively profile the relevant metagenomic communities. Data interpretation is complicated by the fact that short sequencing reads can match multiple organisms and by the lack of completeness of existing databases, in particular for viral pathogens. Here we present metaMix, a Bayesian mixture model framework for resolving complex metagenomic mixtures. We show that the use of parallel Monte Carlo Markov chains for the exploration of the species space enables the identification of the set of species most likely to contribute to the mixture. We demonstrate the greater accuracy of metaMix compared with relevant methods, particularly for profiling complex communities consisting of several related species. We designed metaMix specifically for the analysis of deep transcriptome sequencing datasets, with a focus on viral pathogen detection; however, the principles are generally applicable to all types of metagenomic mixtures. metaMix is implemented as a user friendly R package, freely available on CRAN: http://cran.r-project.org/web/packages/metaMix sofia.morfopoulou.10@ucl.ac.uk Supplementary data are available at Bionformatics online. © The Author 2015. Published by Oxford University Press.

  19. Overview of image processing tools to extract physical information from JET videos

    Science.gov (United States)

    Craciunescu, T.; Murari, A.; Gelfusa, M.; Tiseanu, I.; Zoita, V.; EFDA Contributors, JET

    2014-11-01

    In magnetic confinement nuclear fusion devices such as JET, the last few years have witnessed a significant increase in the use of digital imagery, not only for the surveying and control of experiments, but also for the physical interpretation of results. More than 25 cameras are routinely used for imaging on JET in the infrared (IR) and visible spectral regions. These cameras can produce up to tens of Gbytes per shot and their information content can be very different, depending on the experimental conditions. However, the relevant information about the underlying physical processes is generally of much reduced dimensionality compared to the recorded data. The extraction of this information, which allows full exploitation of these diagnostics, is a challenging task. The image analysis consists, in most cases, of inverse problems which are typically ill-posed mathematically. The typology of objects to be analysed is very wide, and usually the images are affected by noise, low levels of contrast, low grey-level in-depth resolution, reshaping of moving objects, etc. Moreover, the plasma events have time constants of ms or tens of ms, which imposes tough conditions for real-time applications. On JET, in the last few years new tools and methods have been developed for physical information retrieval. The methodology of optical flow has allowed, under certain assumptions, the derivation of information about the dynamics of video objects associated with different physical phenomena, such as instabilities, pellets and filaments. The approach has been extended in order to approximate the optical flow within the MPEG compressed domain, allowing the manipulation of the large JET video databases and, in specific cases, even real-time data processing. The fast visible camera may provide new information that is potentially useful for disruption prediction. A set of methods, based on the extraction of structural information from the visual scene, have been developed for the

  20. Overview of image processing tools to extract physical information from JET videos

    International Nuclear Information System (INIS)

    Craciunescu, T; Tiseanu, I; Zoita, V; Murari, A; Gelfusa, M

    2014-01-01

    In magnetic confinement nuclear fusion devices such as JET, the last few years have witnessed a significant increase in the use of digital imagery, not only for the surveying and control of experiments, but also for the physical interpretation of results. More than 25 cameras are routinely used for imaging on JET in the infrared (IR) and visible spectral regions. These cameras can produce up to tens of Gbytes per shot and their information content can be very different, depending on the experimental conditions. However, the relevant information about the underlying physical processes is generally of much reduced dimensionality compared to the recorded data. The extraction of this information, which allows full exploitation of these diagnostics, is a challenging task. The image analysis consists, in most cases, of inverse problems which are typically ill-posed mathematically. The typology of objects to be analysed is very wide, and usually the images are affected by noise, low levels of contrast, low grey-level in-depth resolution, reshaping of moving objects, etc. Moreover, the plasma events have time constants of ms or tens of ms, which imposes tough conditions for real-time applications. On JET, in the last few years new tools and methods have been developed for physical information retrieval. The methodology of optical flow has allowed, under certain assumptions, the derivation of information about the dynamics of video objects associated with different physical phenomena, such as instabilities, pellets and filaments. The approach has been extended in order to approximate the optical flow within the MPEG compressed domain, allowing the manipulation of the large JET video databases and, in specific cases, even real-time data processing. The fast visible camera may provide new information that is potentially useful for disruption prediction. A set of methods, based on the extraction of structural information from the visual scene, have been developed for the

  1. Extraction and Analysis of Information Related to Research & Development Declared Under an Additional Protocol

    International Nuclear Information System (INIS)

    Idinger, J.; Labella, R.; Rialhe, A.; Teller, N.

    2015-01-01

    The additional protocol (AP) provides important tools to strengthen and improve the effectiveness and efficiency of the safeguards system. Safeguards are designed to verify that States comply with their international commitments not to use nuclear material or to engage in nuclear-related activities for the purpose of developing nuclear weapons or other nuclear explosive devices. Under an AP based on INFCIRC/540, a State must provide to the IAEA additional information about, and inspector access to, all parts of its nuclear fuel cycle. In addition, the State has to supply information about its nuclear fuel cycle-related research and development (R&D) activities. The majority of States declare their R&D activities under the AP Articles 2.a.(i), 2.a.(x), and 2.b.(i) as part of initial declarations and their annual updates under the AP. In order to verify consistency and completeness of information provided under the AP by States, the Agency has started to analyze declared R&D information by identifying interrelationships between States in different R&D areas relevant to safeguards. The paper outlines the quality of R&D information provided by States to the Agency, describes how the extraction and analysis of relevant declarations are currently carried out at the Agency and specifies what kinds of difficulties arise during evaluation in respect to cross-linking international projects and finding gaps in reporting. In addition, the paper tries to elaborate how the reporting quality of AP information with reference to R&D activities and the assessment process of R&D information could be improved. (author)

  2. Exploration of soil metagenome diversity for prospection of enzymes involved in lignocellulosic biomass conversion

    Energy Technology Data Exchange (ETDEWEB)

    Alvarez, T.M.; Squina, F.M. [Laboratorio Nacional de Luz Sincrotron (LNLS), Campinas, SP (Brazil); Paixao, D.A.A.; Franco Cairo, J.P.L.; Buchli, F.; Ruller, R. [Laboratorio Nacional de Ciencia e Tecnologia do Bioetanol (CTBE), Campinas, SP (Brazil); Prade, R. [Oklahoma State University, Sillwater, OK (United States)

    2012-07-01

    Full text: Metagenomics allows access to genetic information encoded in DNA of microorganisms recalcitrant to cultivation. They represent a reservoir of novel biocatalyst with potential application in environmental friendly techniques aiming to overcome the dependence on fossil fuels and also to diminish air and water pollution. The focus of our work is the generation of a tool kit of lignocellulolytic enzymes from soil metagenome, which could be used for second generation ethanol production. Environmental samples were collected at a sugarcane field after harvesting, where it is expected that the microbial population involved on lignocellulose degradation was enriched due to the presence of straws covering the soil. Sugarcane Bagasse-Degrading-Soil (SBDS) metagenome was massively-parallel-454-Roche-sequenced. We identified a full repertoire of genes with significant match to glycosyl hydrolases catalytic domain and carbohydrate-binding modules. Soil metagenomics libraries cloned into pUC19 were screened through functional assays. CMC-agar screening resulted in positive clones, revealing new cellulases coding genes. Through a CMC-zymogram it was possible to observe that one of these genes, nominated as E-1, corresponds to an enzyme that is secreted to the extracellular medium, suggesting that the cloned gene carried the original signal peptide. Enzymatic assays and analysis through capillary electrophoresis showed that E-1 was able to cleave internal glycosidic bonds of cellulose. New rounds of functional screenings through chromogenic substrates are being conducted aiming the generation of a library of lignocellulolytic enzymes derived from soil metagenome, which may become key component for development of second generation biofuels. (author)

  3. Genome diversity of marine phages recovered from Mediterranean metagenomes: Size matters.

    Directory of Open Access Journals (Sweden)

    Mario López-Pérez

    2017-09-01

    Full Text Available Marine viruses play a critical role not only in the global geochemical cycles but also in the biology and evolution of their hosts. Despite their importance, viral diversity remains underexplored mostly due to sampling and cultivation challenges. Direct sequencing approaches such as viromics has provided new insights into the marine viral world. As a complementary approach, we analysed 24 microbial metagenomes (>0.2 μm size range obtained from six sites in the Mediterranean Sea that vary by depth, season and filter used to retrieve the fraction. Filter-size comparison showed a significant number of viral sequences that were retained on the larger-pore filters and were different from those found in the viral fraction from the same sample, indicating that some important viral information is missing using only assembly from viromes. Besides, we were able to describe 1,323 viral genomic fragments that were more than 10Kb in length, of which 36 represented complete viral genomes including some of them retrieved from a cross-assembly from different metagenomes. Host prediction based on sequence methods revealed new phage groups belonging to marine prokaryotes like SAR11, Cyanobacteria or SAR116. We also identified the first complete virophage from deep seawater and a new endemic clade of the recently discovered Marine group II Euryarchaeota virus. Furthermore, analysis of viral distribution using metagenomes and viromes indicated that most of the new phages were found exclusively in the Mediterranean Sea and some of them, mostly the ones recovered from deep metagenomes, do not recruit in any database probably indicating higher variability and endemicity in Mediterranean bathypelagic waters. Together these data provide the first detailed picture of genomic diversity, spatial and depth variations of viral communities within the Mediterranean Sea using metagenome assembly.

  4. Zone analysis in biology articles as a basis for information extraction.

    Science.gov (United States)

    Mizuta, Yoko; Korhonen, Anna; Mullen, Tony; Collier, Nigel

    2006-06-01

    In the field of biomedicine, an overwhelming amount of experimental data has become available as a result of the high throughput of research in this domain. The amount of results reported has now grown beyond the limits of what can be managed by manual means. This makes it increasingly difficult for the researchers in this area to keep up with the latest developments. Information extraction (IE) in the biological domain aims to provide an effective automatic means to dynamically manage the information contained in archived journal articles and abstract collections and thus help researchers in their work. However, while considerable advances have been made in certain areas of IE, pinpointing and organizing factual information (such as experimental results) remains a challenge. In this paper we propose tackling this task by incorporating into IE information about rhetorical zones, i.e. classification of spans of text in terms of argumentation and intellectual attribution. As the first step towards this goal, we introduce a scheme for annotating biological texts for rhetorical zones and provide a qualitative and quantitative analysis of the data annotated according to this scheme. We also discuss our preliminary research on automatic zone analysis, and its incorporation into our IE framework.

  5. Extract the Relational Information of Static Features and Motion Features for Human Activities Recognition in Videos

    Directory of Open Access Journals (Sweden)

    Li Yao

    2016-01-01

    Full Text Available Both static features and motion features have shown promising performance in human activities recognition task. However, the information included in these features is insufficient for complex human activities. In this paper, we propose extracting relational information of static features and motion features for human activities recognition. The videos are represented by a classical Bag-of-Word (BoW model which is useful in many works. To get a compact and discriminative codebook with small dimension, we employ the divisive algorithm based on KL-divergence to reconstruct the codebook. After that, to further capture strong relational information, we construct a bipartite graph to model the relationship between words of different feature set. Then we use a k-way partition to create a new codebook in which similar words are getting together. With this new codebook, videos can be represented by a new BoW vector with strong relational information. Moreover, we propose a method to compute new clusters from the divisive algorithm’s projective function. We test our work on the several datasets and obtain very promising results.

  6. Quantitative metagenomic analyses based on average genome size normalization

    DEFF Research Database (Denmark)

    Frank, Jeremy Alexander; Sørensen, Søren Johannes

    2011-01-01

    provide not just a census of the community members but direct information on metabolic capabilities and potential interactions among community members. Here we introduce a method for the quantitative characterization and comparison of microbial communities based on the normalization of metagenomic data...... marine sources using both conventional small-subunit (SSU) rRNA gene analyses and our quantitative method to calculate the proportion of genomes in each sample that are capable of a particular metabolic trait. With both environments, to determine what proportion of each community they make up and how......). These analyses demonstrate how genome proportionality compares to SSU rRNA gene relative abundance and how factors such as average genome size and SSU rRNA gene copy number affect sampling probability and therefore both types of community analysis....

  7. Meta4: a web application for sharing and annotating metagenomic gene predictions using web services.

    Science.gov (United States)

    Richardson, Emily J; Escalettes, Franck; Fotheringham, Ian; Wallace, Robert J; Watson, Mick

    2013-01-01

    Whole-genome shotgun metagenomics experiments produce DNA sequence data from entire ecosystems, and provide a huge amount of novel information. Gene discovery projects require up-to-date information about sequence homology and domain structure for millions of predicted proteins to be presented in a simple, easy-to-use system. There is a lack of simple, open, flexible tools that allow the rapid sharing of metagenomics datasets with collaborators in a format they can easily interrogate. We present Meta4, a flexible and extensible web application that can be used to share and annotate metagenomic gene predictions. Proteins and predicted domains are stored in a simple relational database, with a dynamic front-end which displays the results in an internet browser. Web services are used to provide up-to-date information about the proteins from homology searches against public databases. Information about Meta4 can be found on the project website, code is available on Github, a cloud image is available, and an example implementation can be seen at.

  8. Stable isotope probing in the metagenomics era: a bridge towards improved bioremediation

    Science.gov (United States)

    Uhlik, Ondrej; Leewis, Mary-Cathrine; Strejcek, Michal; Musilova, Lucie; Mackova, Martina; Leigh, Mary Beth; Macek, Tomas

    2012-01-01

    Microbial biodegradation and biotransformation reactions are essential to most bioremediation processes, yet the specific organisms, genes, and mechanisms involved are often not well understood. Stable isotope probing (SIP) enables researchers to directly link microbial metabolic capability to phylogenetic and metagenomic information within a community context by tracking isotopically labeled substances into phylogenetically and functionally informative biomarkers. SIP is thus applicable as a tool for the identification of active members of the microbial community and associated genes integral to the community functional potential, such as biodegradative processes. The rapid evolution of SIP over the last decade and integration with metagenomics provides researchers with a much deeper insight into potential biodegradative genes, processes, and applications, thereby enabling an improved mechanistic understanding that can facilitate advances in the field of bioremediation. PMID:23022353

  9. MedEx: a medication information extraction system for clinical narratives

    Science.gov (United States)

    Stenner, Shane P; Doan, Son; Johnson, Kevin B; Waitman, Lemuel R; Denny, Joshua C

    2010-01-01

    Medication information is one of the most important types of clinical data in electronic medical records. It is critical for healthcare safety and quality, as well as for clinical research that uses electronic medical record data. However, medication data are often recorded in clinical notes as free-text. As such, they are not accessible to other computerized applications that rely on coded data. We describe a new natural language processing system (MedEx), which extracts medication information from clinical notes. MedEx was initially developed using discharge summaries. An evaluation using a data set of 50 discharge summaries showed it performed well on identifying not only drug names (F-measure 93.2%), but also signature information, such as strength, route, and frequency, with F-measures of 94.5%, 93.9%, and 96.0% respectively. We then applied MedEx unchanged to outpatient clinic visit notes. It performed similarly with F-measures over 90% on a set of 25 clinic visit notes. PMID:20064797

  10. Videomicroscopic extraction of specific information on cell proliferation and migration in vitro

    International Nuclear Information System (INIS)

    Debeir, Olivier; Megalizzi, Veronique; Warzee, Nadine; Kiss, Robert; Decaestecker, Christine

    2008-01-01

    In vitro cell imaging is a useful exploratory tool for cell behavior monitoring with a wide range of applications in cell biology and pharmacology. Combined with appropriate image analysis techniques, this approach has been shown to provide useful information on the detection and dynamic analysis of cell events. In this context, numerous efforts have been focused on cell migration analysis. In contrast, the cell division process has been the subject of fewer investigations. The present work focuses on this latter aspect and shows that, in complement to cell migration data, interesting information related to cell division can be extracted from phase-contrast time-lapse image series, in particular cell division duration, which is not provided by standard cell assays using endpoint analyses. We illustrate our approach by analyzing the effects induced by two sigma-1 receptor ligands (haloperidol and 4-IBP) on the behavior of two glioma cell lines using two in vitro cell models, i.e., the low-density individual cell model and the high-density scratch wound model. This illustration also shows that the data provided by our approach are suggestive as to the mechanism of action of compounds, and are thus capable of informing the appropriate selection of further time-consuming and more expensive biological evaluations required to elucidate a mechanism

  11. 5W1H Information Extraction with CNN-Bidirectional LSTM

    Science.gov (United States)

    Nurdin, A.; Maulidevi, N. U.

    2018-03-01

    In this work, information about who, did what, when, where, why, and how on Indonesian news articles were extracted by combining Convolutional Neural Network and Bidirectional Long Short-Term Memory. Convolutional Neural Network can learn semantically meaningful representations of sentences. Bidirectional LSTM can analyze the relations among words in the sequence. We also use word embedding word2vec for word representation. By combining these algorithms, we obtained F-measure 0.808. Our experiments show that CNN-BLSTM outperforms other shallow methods, namely IBk, C4.5, and Naïve Bayes with the F-measure 0.655, 0.645, and 0.595, respectively.

  12. Developing a Process Model for the Forensic Extraction of Information from Desktop Search Applications

    Directory of Open Access Journals (Sweden)

    Timothy Pavlic

    2008-03-01

    Full Text Available Desktop search applications can contain cached copies of files that were deleted from the file system. Forensic investigators see this as a potential source of evidence, as documents deleted by suspects may still exist in the cache. Whilst there have been attempts at recovering data collected by desktop search applications, there is no methodology governing the process, nor discussion on the most appropriate means to do so. This article seeks to address this issue by developing a process model that can be applied when developing an information extraction application for desktop search applications, discussing preferred methods and the limitations of each. This work represents a more structured approach than other forms of current research.

  13. An innovative method for extracting isotopic information from low-resolution gamma spectra

    International Nuclear Information System (INIS)

    Miko, D.; Estep, R.J.; Rawool-Sullivan, M.W.

    1998-01-01

    A method is described for the extraction of isotopic information from attenuated gamma ray spectra using the gross-count material basis set (GC-MBS) model. This method solves for the isotopic composition of an unknown mixture of isotopes attenuated through an absorber of unknown material. For binary isotopic combinations the problem is nonlinear in only one variable and is easily solved using standard line optimization techniques. Results are presented for NaI spectrum analyses of various binary combinations of enriched uranium, depleted uranium, low burnup Pu, 137 Cs, and 133 Ba attenuated through a suite of absorbers ranging in Z from polyethylene through lead. The GC-MBS method results are compared to those computed using ordinary response function fitting and with a simple net peak area method. The GC-MBS method was found to be significantly more accurate than the other methods over the range of absorbers and isotopic blends studied

  14. SmashCommunity: A metagenomic annotation and analysis tool

    DEFF Research Database (Denmark)

    Arumugam, Manimozhiyan; Harrington, Eoghan D; Foerstner, Konrad U

    2010-01-01

    the quantitative phylogenetic and functional compositions of metagenomes, to compare compositions of multiple metagenomes and to produce intuitive visual representations of such analyses. AVAILABILITY: SmashCommunity is freely available at http://www.bork.embl.de/software/smash CONTACT: bork@embl.de....

  15. Antibiotic Resistome: Improving Detection and Quantification Accuracy for Comparative Metagenomics.

    Science.gov (United States)

    Elbehery, Ali H A; Aziz, Ramy K; Siam, Rania

    2016-04-01

    The unprecedented rise of life-threatening antibiotic resistance (AR), combined with the unparalleled advances in DNA sequencing of genomes and metagenomes, has pushed the need for in silico detection of the resistance potential of clinical and environmental metagenomic samples through the quantification of AR genes (i.e., genes conferring antibiotic resistance). Therefore, determining an optimal methodology to quantitatively and accurately assess AR genes in a given environment is pivotal. Here, we optimized and improved existing AR detection methodologies from metagenomic datasets to properly consider AR-generating mutations in antibiotic target genes. Through comparative metagenomic analysis of previously published AR gene abundance in three publicly available metagenomes, we illustrate how mutation-generated resistance genes are either falsely assigned or neglected, which alters the detection and quantitation of the antibiotic resistome. In addition, we inspected factors influencing the outcome of AR gene quantification using metagenome simulation experiments, and identified that genome size, AR gene length, total number of metagenomics reads and selected sequencing platforms had pronounced effects on the level of detected AR. In conclusion, our proposed improvements in the current methodologies for accurate AR detection and resistome assessment show reliable results when tested on real and simulated metagenomic datasets.

  16. Unlocking the potential of metagenomics through replicated experimental design

    NARCIS (Netherlands)

    Knight, R.; Jansson, J.; Field, D.; Fierer, N.; Desai, N.; Fuhrman, J.A.; Hugenholtz, P.; Van der Lelie, D.; Meyer, F.; Stevens, R.; Bailey, M.J.; Gordon, J.I.; Kowalchuk, G.A.; Gilbert, J.A.

    2012-01-01

    Metagenomics holds enormous promise for discovering novel enzymes and organisms that are biomarkers or drivers of processes relevant to disease, industry and the environment. In the past two years, we have seen a paradigm shift in metagenomics to the application of cross-sectional and longitudinal

  17. Unlocking the potential of metagenomics through replicated experimental design.

    NARCIS (Netherlands)

    Knight, R.; Jansson, J.; Field, D.; Fierer, N.; Desai, N.; Fuhrman, J.A.; Hugenholtz, P.; van der Lelie, D.; Meyer, F.; Stevens, R.; Bailey, M.J.; Gordon, J.I.; Kowalchuk, G.A.; Gilbert, J.A.

    2012-01-01

    Metagenomics holds enormous promise for discovering novel enzymes and organisms that are biomarkers or drivers of processes relevant to disease, industry and the environment. In the past two years, we have seen a paradigm shift in metagenomics to the application of cross-sectional and longitudinal

  18. Cross-cutting activities: Soil quality and soil metagenomics

    OpenAIRE

    Motavalli, Peter P.; Garrett, Karen A.

    2008-01-01

    This presentation reports on the work of the SANREM CRSP cross-cutting activities "Assessing and Managing Soil Quality for Sustainable Agricultural Systems" and "Soil Metagenomics to Construct Indicators of Soil Degradation." The introduction gives an overview of the extensiveness of soil degradation globally and defines soil quality. The objectives of the soil quality cross cutting activity are: CCRA-4 (Soil Metagenomics)

  19. EnvMine: A text-mining system for the automatic extraction of contextual information

    Directory of Open Access Journals (Sweden)

    de Lorenzo Victor

    2010-06-01

    Full Text Available Abstract Background For ecological studies, it is crucial to count on adequate descriptions of the environments and samples being studied. Such a description must be done in terms of their physicochemical characteristics, allowing a direct comparison between different environments that would be difficult to do otherwise. Also the characterization must include the precise geographical location, to make possible the study of geographical distributions and biogeographical patterns. Currently, there is no schema for annotating these environmental features, and these data have to be extracted from textual sources (published articles. So far, this had to be performed by manual inspection of the corresponding documents. To facilitate this task, we have developed EnvMine, a set of text-mining tools devoted to retrieve contextual information (physicochemical variables and geographical locations from textual sources of any kind. Results EnvMine is capable of retrieving the physicochemical variables cited in the text, by means of the accurate identification of their associated units of measurement. In this task, the system achieves a recall (percentage of items retrieved of 92% with less than 1% error. Also a Bayesian classifier was tested for distinguishing parts of the text describing environmental characteristics from others dealing with, for instance, experimental settings. Regarding the identification of geographical locations, the system takes advantage of existing databases such as GeoNames to achieve 86% recall with 92% precision. The identification of a location includes also the determination of its exact coordinates (latitude and longitude, thus allowing the calculation of distance between the individual locations. Conclusion EnvMine is a very efficient method for extracting contextual information from different text sources, like published articles or web pages. This tool can help in determining the precise location and physicochemical

  20. Critical Assessment of Metagenome Interpretation – a benchmark of computational metagenomics software

    Science.gov (United States)

    Sczyrba, Alexander; Hofmann, Peter; Belmann, Peter; Koslicki, David; Janssen, Stefan; Dröge, Johannes; Gregor, Ivan; Majda, Stephan; Fiedler, Jessika; Dahms, Eik; Bremges, Andreas; Fritz, Adrian; Garrido-Oter, Ruben; Jørgensen, Tue Sparholt; Shapiro, Nicole; Blood, Philip D.; Gurevich, Alexey; Bai, Yang; Turaev, Dmitrij; DeMaere, Matthew Z.; Chikhi, Rayan; Nagarajan, Niranjan; Quince, Christopher; Meyer, Fernando; Balvočiūtė, Monika; Hansen, Lars Hestbjerg; Sørensen, Søren J.; Chia, Burton K. H.; Denis, Bertrand; Froula, Jeff L.; Wang, Zhong; Egan, Robert; Kang, Dongwan Don; Cook, Jeffrey J.; Deltel, Charles; Beckstette, Michael; Lemaitre, Claire; Peterlongo, Pierre; Rizk, Guillaume; Lavenier, Dominique; Wu, Yu-Wei; Singer, Steven W.; Jain, Chirag; Strous, Marc; Klingenberg, Heiner; Meinicke, Peter; Barton, Michael; Lingner, Thomas; Lin, Hsin-Hung; Liao, Yu-Chieh; Silva, Genivaldo Gueiros Z.; Cuevas, Daniel A.; Edwards, Robert A.; Saha, Surya; Piro, Vitor C.; Renard, Bernhard Y.; Pop, Mihai; Klenk, Hans-Peter; Göker, Markus; Kyrpides, Nikos C.; Woyke, Tanja; Vorholt, Julia A.; Schulze-Lefert, Paul; Rubin, Edward M.; Darling, Aaron E.; Rattei, Thomas; McHardy, Alice C.

    2018-01-01

    In metagenome analysis, computational methods for assembly, taxonomic profiling and binning are key components facilitating downstream biological data interpretation. However, a lack of consensus about benchmarking datasets and evaluation metrics complicates proper performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on datasets of unprecedented complexity and realism. Benchmark metagenomes were generated from ~700 newly sequenced microorganisms and ~600 novel viruses and plasmids, including genomes with varying degrees of relatedness to each other and to publicly available ones and representing common experimental setups. Across all datasets, assembly and genome binning programs performed well for species represented by individual genomes, while performance was substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below the family level. Parameter settings substantially impacted performances, underscoring the importance of program reproducibility. While highlighting current challenges in computational metagenomics, the CAMI results provide a roadmap for software selection to answer specific research questions. PMID:28967888

  1. Draft Genome Sequence of Uncultured SAR324 Bacterium lautmerah10, Binned from a Red Sea Metagenome

    KAUST Repository

    Haroon, Mohamed; Thompson, Luke R.; Stingl, Ulrich

    2016-01-01

    A draft genome of SAR324 bacterium lautmerah10 was assembled from a metagenome of a surface water sample from the Red Sea, Saudi Arabia. The genome is more complete and has a higher G+C content than that of previously sequenced SAR324 representatives. Its genomic information shows a versatile metabolism that confers an advantage to SAR324, which is reflected in its distribution throughout different depths of the marine water column.

  2. Draft Genome Sequence of Uncultured SAR324 Bacterium lautmerah10, Binned from a Red Sea Metagenome

    KAUST Repository

    Haroon, Mohamed

    2016-02-11

    A draft genome of SAR324 bacterium lautmerah10 was assembled from a metagenome of a surface water sample from the Red Sea, Saudi Arabia. The genome is more complete and has a higher G+C content than that of previously sequenced SAR324 representatives. Its genomic information shows a versatile metabolism that confers an advantage to SAR324, which is reflected in its distribution throughout different depths of the marine water column.

  3. Metagenomics of Bacterial Diversity in Villa Luz Caves with Sulfur Water Springs

    Directory of Open Access Journals (Sweden)

    Giuseppe D’Auria

    2018-01-01

    Full Text Available New biotechnology applications require in-depth preliminary studies of biodiversity. The methods of massive sequencing using metagenomics and bioinformatics tools offer us sufficient and reliable knowledge to understand environmental diversity, to know new microorganisms, and to take advantage of their functional genes. Villa Luz caves, in the southern Mexican state of Tabasco, are fed by at least 26 groundwater inlets, containing 300–500 mg L-1 H2S and <0.1 mg L-1 O2. We extracted environmental DNA for metagenomic analysis of collected samples in five selected Villa Luz caves sites, with pH values from 2.5 to 7. Foreign organisms found in this underground ecosystem can oxidize H2S to H2SO4. These include: biovermiculites, a bacterial association that can grow on the rock walls; snottites, that are whitish, viscous biofilms hanging from the rock walls, and sacks or bags of phlegm, which live within the aquatic environment of the springs. Through the emergency food assistance program (TEFAP pyrosequencing, a total of 20,901 readings of amplification products from hypervariable regions V1 and V3 of 16S rRNA bacterial gene in whole and pure metagenomic DNA samples were generated. Seven bacterial phyla were identified. As a result, Proteobacteria was more frequent than Acidobacteria. Finally, acidophilic Proteobacteria was detected in UJAT5 sample

  4. Extraction of prospecting information of uranium deposit based on high spatial resolution satellite data. Taking bashibulake region as an example

    International Nuclear Information System (INIS)

    Yang Xu; Liu Dechang; Zhang Jielin

    2008-01-01

    In this study, the signification and content of prospecting information of uranium deposit are expounded. Quickbird high spatial resolution satellite data are used to extract the prospecting information of uranium deposit in Bashibulake area in the north of Tarim Basin. By using the pertinent methods of image processing, the information of ore-bearing bed, ore-control structure and mineralized alteration have been extracted. The results show a high consistency with the field survey. The aim of this study is to explore practicability of high spatial resolution satellite data for prospecting minerals, and to broaden the thinking of prospectation at similar area. (authors)

  5. Exploiting HPC Platforms for Metagenomics: Challenges and Opportunities (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    Energy Technology Data Exchange (ETDEWEB)

    Canon, Shane

    2011-10-12

    DOE JGI's Zhong Wang, chair of the High-performance Computing session, gives a brief introduction before Berkeley Lab's Shane Canon talks about "Exploiting HPC Platforms for Metagenomics: Challenges and Opportunities" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  6. Analysis of composition-based metagenomic classification.

    Science.gov (United States)

    Higashi, Susan; Barreto, André da Motta Salles; Cantão, Maurício Egidio; de Vasconcelos, Ana Tereza Ribeiro

    2012-01-01

    An essential step of a metagenomic study is the taxonomic classification, that is, the identification of the taxonomic lineage of the organisms in a given sample. The taxonomic classification process involves a series of decisions. Currently, in the context of metagenomics, such decisions are usually based on empirical studies that consider one specific type of classifier. In this study we propose a general framework for analyzing the impact that several decisions can have on the classification problem. Instead of focusing on any specific classifier, we define a generic score function that provides a measure of the difficulty of the classification task. Using this framework, we analyze the impact of the following parameters on the taxonomic classification problem: (i) the length of n-mers used to encode the metagenomic sequences, (ii) the similarity measure used to compare sequences, and (iii) the type of taxonomic classification, which can be conventional or hierarchical, depending on whether the classification process occurs in a single shot or in several steps according to the taxonomic tree. We defined a score function that measures the degree of separability of the taxonomic classes under a given configuration induced by the parameters above. We conducted an extensive computational experiment and found out that reasonable values for the parameters of interest could be (i) intermediate values of n, the length of the n-mers; (ii) any similarity measure, because all of them resulted in similar scores; and (iii) the hierarchical strategy, which performed better in all of the cases. As expected, short n-mers generate lower configuration scores because they give rise to frequency vectors that represent distinct sequences in a similar way. On the other hand, large values for n result in sparse frequency vectors that represent differently metagenomic fragments that are in fact similar, also leading to low configuration scores. Regarding the similarity measure, in

  7. Extracting chemical information from high-resolution Kβ X-ray emission spectroscopy

    Science.gov (United States)

    Limandri, S.; Robledo, J.; Tirao, G.

    2018-06-01

    High-resolution X-ray emission spectroscopy allows studying the chemical environment of a wide variety of materials. Chemical information can be obtained by fitting the X-ray spectra and observing the behavior of some spectral features. Spectral changes can also be quantified by means of statistical parameters calculated by considering the spectrum as a probability distribution. Another possibility is to perform statistical multivariate analysis, such as principal component analysis. In this work the performance of these procedures for extracting chemical information in X-ray emission spectroscopy spectra for mixtures of Mn2+ and Mn4+ oxides are studied. A detail analysis of the parameters obtained, as well as the associated uncertainties is shown. The methodologies are also applied for Mn oxidation state characterization of double perovskite oxides Ba1+xLa1-xMnSbO6 (with 0 ≤ x ≤ 0.7). The results show that statistical parameters and multivariate analysis are the most suitable for the analysis of this kind of spectra.

  8. Information Extraction of Tourist Geological Resources Based on 3d Visualization Remote Sensing Image

    Science.gov (United States)

    Wang, X.

    2018-04-01

    Tourism geological resources are of high value in admiration, scientific research and universal education, which need to be protected and rationally utilized. In the past, most of the remote sensing investigations of tourism geological resources used two-dimensional remote sensing interpretation method, which made it difficult for some geological heritages to be interpreted and led to the omission of some information. This aim of this paper is to assess the value of a method using the three-dimensional visual remote sensing image to extract information of geological heritages. skyline software system is applied to fuse the 0.36 m aerial images and 5m interval DEM to establish the digital earth model. Based on the three-dimensional shape, color tone, shadow, texture and other image features, the distribution of tourism geological resources in Shandong Province and the location of geological heritage sites were obtained, such as geological structure, DaiGu landform, granite landform, Volcanic landform, sandy landform, Waterscapes, etc. The results show that using this method for remote sensing interpretation is highly recognizable, making the interpretation more accurate and comprehensive.

  9. Information Management Processes for Extraction of Student Dropout Indicators in Courses in Distance Mode

    Directory of Open Access Journals (Sweden)

    Renata Maria Abrantes Baracho

    2016-04-01

    Full Text Available This research addresses the use of information management processes in order to extract student dropout indicators in distance mode courses. Distance education in Brazil aims to facilitate access to information. The MEC (Ministry of Education announced, in the second semester of 2013, that the main obstacles faced by institutions offering courses in this mode were students dropping out and the resistance of both educators and students to this mode. The research used a mixed methodology, qualitative and quantitative, to obtain student dropout indicators. The factors found and validated in this research were: the lack of interest from students, insufficient training in the use of the virtual learning environment for students, structural problems in the schools that were chosen to offer the course, students without e-mail, incoherent answers to activities to the course, lack of knowledge on the part of the student when using the computer tool. The scenario considered was a course offered in distance mode called Aluno Integrado (Integrated Student

  10. Measuring nuclear reaction cross sections to extract information on neutrinoless double beta decay

    Science.gov (United States)

    Cavallaro, M.; Cappuzzello, F.; Agodi, C.; Acosta, L.; Auerbach, N.; Bellone, J.; Bijker, R.; Bonanno, D.; Bongiovanni, D.; Borello-Lewin, T.; Boztosun, I.; Branchina, V.; Bussa, M. P.; Calabrese, S.; Calabretta, L.; Calanna, A.; Calvo, D.; Carbone, D.; Chávez Lomelí, E. R.; Coban, A.; Colonna, M.; D'Agostino, G.; De Geronimo, G.; Delaunay, F.; Deshmukh, N.; de Faria, P. N.; Ferraresi, C.; Ferreira, J. L.; Finocchiaro, P.; Fisichella, M.; Foti, A.; Gallo, G.; Garcia, U.; Giraudo, G.; Greco, V.; Hacisalihoglu, A.; Kotila, J.; Iazzi, F.; Introzzi, R.; Lanzalone, G.; Lavagno, A.; La Via, F.; Lay, J. A.; Lenske, H.; Linares, R.; Litrico, G.; Longhitano, F.; Lo Presti, D.; Lubian, J.; Medina, N.; Mendes, D. R.; Muoio, A.; Oliveira, J. R. B.; Pakou, A.; Pandola, L.; Petrascu, H.; Pinna, F.; Reito, S.; Rifuggiato, D.; Rodrigues, M. R. D.; Russo, A. D.; Russo, G.; Santagati, G.; Santopinto, E.; Sgouros, O.; Solakci, S. O.; Souliotis, G.; Soukeras, V.; Spatafora, A.; Torresi, D.; Tudisco, S.; Vsevolodovna, R. I. M.; Wheadon, R. J.; Yildirin, A.; Zagatto, V. A. B.

    2018-02-01

    Neutrinoless double beta decay (0vββ) is considered the best potential resource to access the absolute neutrino mass scale. Moreover, if observed, it will signal that neutrinos are their own anti-particles (Majorana particles). Presently, this physics case is one of the most important research “beyond Standard Model” and might guide the way towards a Grand Unified Theory of fundamental interactions. Since the 0vββ decay process involves nuclei, its analysis necessarily implies nuclear structure issues. In the NURE project, supported by a Starting Grant of the European Research Council (ERC), nuclear reactions of double charge-exchange (DCE) are used as a tool to extract information on the 0vββ Nuclear Matrix Elements. In DCE reactions and ββ decay indeed the initial and final nuclear states are the same and the transition operators have similar structure. Thus the measurement of the DCE absolute cross-sections can give crucial information on ββ matrix elements. In a wider view, the NUMEN international collaboration plans a major upgrade of the INFN-LNS facilities in the next years in order to increase the experimental production of nuclei of at least two orders of magnitude, thus making feasible a systematic study of all the cases of interest as candidates for 0vββ.

  11. Unsupervised Symbolization of Signal Time Series for Extraction of the Embedded Information

    Directory of Open Access Journals (Sweden)

    Yue Li

    2017-03-01

    Full Text Available This paper formulates an unsupervised algorithm for symbolization of signal time series to capture the embedded dynamic behavior. The key idea is to convert time series of the digital signal into a string of (spatially discrete symbols from which the embedded dynamic information can be extracted in an unsupervised manner (i.e., no requirement for labeling of time series. The main challenges here are: (1 definition of the symbol assignment for the time series; (2 identification of the partitioning segment locations in the signal space of time series; and (3 construction of probabilistic finite-state automata (PFSA from the symbol strings that contain temporal patterns. The reported work addresses these challenges by maximizing the mutual information measures between symbol strings and PFSA states. The proposed symbolization method has been validated by numerical simulation as well as by experimentation in a laboratory environment. Performance of the proposed algorithm has been compared to that of two commonly used algorithms of time series partitioning.

  12. A methodology for the extraction of quantitative information from electron microscopy images at the atomic level

    International Nuclear Information System (INIS)

    Galindo, P L; Pizarro, J; Guerrero, E; Guerrero-Lebrero, M P; Scavello, G; Yáñez, A; Sales, D L; Herrera, M; Molina, S I; Núñez-Moraleda, B M; Maestre, J M

    2014-01-01

    In this paper we describe a methodology developed at the University of Cadiz (Spain) in the past few years for the extraction of quantitative information from electron microscopy images at the atomic level. This work is based on a coordinated and synergic activity of several research groups that have been working together over the last decade in two different and complementary fields: Materials Science and Computer Science. The aim of our joint research has been to develop innovative high-performance computing techniques and simulation methods in order to address computationally challenging problems in the analysis, modelling and simulation of materials at the atomic scale, providing significant advances with respect to existing techniques. The methodology involves several fundamental areas of research including the analysis of high resolution electron microscopy images, materials modelling, image simulation and 3D reconstruction using quantitative information from experimental images. These techniques for the analysis, modelling and simulation allow optimizing the control and functionality of devices developed using materials under study, and have been tested using data obtained from experimental samples

  13. Dual-wavelength phase-shifting digital holography selectively extracting wavelength information from wavelength-multiplexed holograms.

    Science.gov (United States)

    Tahara, Tatsuki; Mori, Ryota; Kikunaga, Shuhei; Arai, Yasuhiko; Takaki, Yasuhiro

    2015-06-15

    Dual-wavelength phase-shifting digital holography that selectively extracts wavelength information from five wavelength-multiplexed holograms is presented. Specific phase shifts for respective wavelengths are introduced to remove the crosstalk components and extract only the object wave at the desired wavelength from the holograms. Object waves in multiple wavelengths are selectively extracted by utilizing 2π ambiguity and the subtraction procedures based on phase-shifting interferometry. Numerical results show the validity of the proposed technique. The proposed technique is also experimentally demonstrated.

  14. Information Extraction and Dependency on Open Government Data (ogd) for Environmental Monitoring

    Science.gov (United States)

    Abdulmuttalib, Hussein

    2016-06-01

    Environmental monitoring practices support decision makers of different government / private institutions, besides environmentalists and planners among others. This support helps them act towards the sustainability of our environment, and also take efficient measures for protecting human beings in general, but it is difficult to explore useful information from 'OGD' and assure its quality for the purpose. On the other hand, Monitoring itself comprises detecting changes as happens, or within the mitigation period range, which means that any source of data, that is to be used for monitoring, should replicate the information related to the period of environmental monitoring, or otherwise it's considered almost useless or history. In this paper the assessment of information extraction and structuring from Open Government Data 'OGD', that can be useful to environmental monitoring is performed, looking into availability, usefulness to environmental monitoring of a certain type, checking its repetition period and dependences. The particular assessment is being performed on a small sample selected from OGD, bearing in mind the type of the environmental change monitored, such as the increase and concentrations of built up areas, and reduction of green areas, or monitoring the change of temperature in a specific area. The World Bank mentioned in its blog that Data is open if it satisfies both conditions of, being technically open, and legally open. The use of Open Data thus, is regulated by published terms of use, or an agreement which implies some conditions without violating the above mentioned two conditions. Within the scope of the paper I wish to share the experience of using some OGD for supporting an environmental monitoring work, that is performed to mitigate the production of carbon dioxide, by regulating energy consumption, and by properly designing the test area's landscapes, thus using Geodesign tactics, meanwhile wish to add to the results achieved by many

  15. Metagenomic Analysis of Chicken Gut Microbiota for Improving Metabolism and Health of Chickens — A Review

    Directory of Open Access Journals (Sweden)

    Ki Young Choi

    2015-09-01

    Full Text Available Chicken is a major food source for humans, hence it is important to understand the mechanisms involved in nutrient absorption in chicken. In the gastrointestinal tract (GIT, the microbiota plays a central role in enhancing nutrient absorption and strengthening the immune system, thereby affecting both growth and health of chicken. There is little information on the diversity and functions of chicken GIT microbiota, its impact on the host, and the interactions between the microbiota and host. Here, we review the recent metagenomic strategies to analyze the chicken GIT microbiota composition and its functions related to improving metabolism and health. We summarize methodology of metagenomics in order to obtain bacterial taxonomy and functional inferences of the GIT microbiota and suggest a set of indicator genes for monitoring and manipulating the microbiota to promote host health in future.

  16. Genomic and metagenomic challenges and opportunities for bioleaching: a mini-review.

    Science.gov (United States)

    Cárdenas, Juan Pablo; Quatrini, Raquel; Holmes, David S

    2016-09-01

    High-throughput genomic technologies are accelerating progress in understanding the diversity of microbial life in many environments. Here we highlight advances in genomics and metagenomics of microorganisms from bioleaching heaps and related acidic mining environments. Bioleaching heaps used for copper recovery provide significant opportunities to study the processes and mechanisms underlying microbial successions and the influence of community composition on ecosystem functioning. Obtaining quantitative and process-level knowledge of these dynamics is pivotal for understanding how microorganisms contribute to the solubilization of copper for industrial recovery. Advances in DNA sequencing technology provide unprecedented opportunities to obtain information about the genomes of bioleaching microorganisms, allowing predictive models of metabolic potential and ecosystem-level interactions to be constructed. These approaches are enabling predictive phenotyping of organisms many of which are recalcitrant to genetic approaches or are unculturable. This mini-review describes current bioleaching genomic and metagenomic projects and addresses the use of genome information to: (i) build metabolic models; (ii) predict microbial interactions; (iii) estimate genetic diversity; and (iv) study microbial evolution. Key challenges and perspectives of bioleaching genomics/metagenomics are addressed. Copyright © 2016 The Author(s). Published by Elsevier Masson SAS.. All rights reserved.

  17. Metagenome Fragment Classification Using -Mer Frequency Profiles

    Directory of Open Access Journals (Sweden)

    Gail Rosen

    2008-01-01

    Full Text Available A vast amount of microbial sequencing data is being generated through large-scale projects in ecology, agriculture, and human health. Efficient high-throughput methods are needed to analyze the mass amounts of metagenomic data, all DNA present in an environmental sample. A major obstacle in metagenomics is the inability to obtain accuracy using technology that yields short reads. We construct the unique -mer frequency profiles of 635 microbial genomes publicly available as of February 2008. These profiles are used to train a naive Bayes classifier (NBC that can be used to identify the genome of any fragment. We show that our method is comparable to BLAST for small 25 bp fragments but does not have the ambiguity of BLAST's tied top scores. We demonstrate that this approach is scalable to identify any fragment from hundreds of genomes. It also performs quite well at the strain, species, and genera levels and achieves strain resolution despite classifying ubiquitous genomic fragments (gene and nongene regions. Cross-validation analysis demonstrates that species-accuracy achieves 90% for highly-represented species containing an average of 8 strains. We demonstrate that such a tool can be used on the Sargasso Sea dataset, and our analysis shows that NBC can be further enhanced.

  18. New Bacterial Phytase through Metagenomic Prospection

    Directory of Open Access Journals (Sweden)

    Nathálya Farias

    2018-02-01

    Full Text Available Alkaline phytases from uncultured microorganisms, which hydrolyze phytate to less phosphorylated myo-inositols and inorganic phosphate, have great potential as additives in agricultural industry. The development of metagenomics has stemmed from the ineluctable evidence that as-yet-uncultured microorganisms represent the vast majority of organisms in most environments on earth. In this study, a gene encoding a phytase was cloned from red rice crop residues and castor bean cake using a metagenomics strategy. The amino acid identity between this gene and its closest published counterparts is lower than 60%. The phytase was named PhyRC001 and was biochemically characterized. This recombinant protein showed activity on sodium phytate, indicating that PhyRC001 is a hydrolase enzyme. The enzymatic activity was optimal at a pH of 7.0 and at a temperature of 35 °C. β-propeller phytases possess great potential as feed additives because they are the only type of phytase with high activity at neutral pH. Therefore, to explore and exploit the underlying mechanism for β-propeller phytase functions could be of great benefit to biotechnology.

  19. Machine learning classification of surgical pathology reports and chunk recognition for information extraction noise reduction.

    Science.gov (United States)

    Napolitano, Giulio; Marshall, Adele; Hamilton, Peter; Gavin, Anna T

    2016-06-01

    Machine learning techniques for the text mining of cancer-related clinical documents have not been sufficiently explored. Here some techniques are presented for the pre-processing of free-text breast cancer pathology reports, with the aim of facilitating the extraction of information relevant to cancer staging. The first technique was implemented using the freely available software RapidMiner to classify the reports according to their general layout: 'semi-structured' and 'unstructured'. The second technique was developed using the open source language engineering framework GATE and aimed at the prediction of chunks of the report text containing information pertaining to the cancer morphology, the tumour size, its hormone receptor status and the number of positive nodes. The classifiers were trained and tested respectively on sets of 635 and 163 manually classified or annotated reports, from the Northern Ireland Cancer Registry. The best result of 99.4% accuracy - which included only one semi-structured report predicted as unstructured - was produced by the layout classifier with the k nearest algorithm, using the binary term occurrence word vector type with stopword filter and pruning. For chunk recognition, the best results were found using the PAUM algorithm with the same parameters for all cases, except for the prediction of chunks containing cancer morphology. For semi-structured reports the performance ranged from 0.97 to 0.94 and from 0.92 to 0.83 in precision and recall, while for unstructured reports performance ranged from 0.91 to 0.64 and from 0.68 to 0.41 in precision and recall. Poor results were found when the classifier was trained on semi-structured reports but tested on unstructured. These results show that it is possible and beneficial to predict the layout of reports and that the accuracy of prediction of which segments of a report may contain certain information is sensitive to the report layout and the type of information sought. Copyright

  20. Study of time-frequency characteristics of single snores: extracting new information for sleep apnea diagnosis

    Energy Technology Data Exchange (ETDEWEB)

    Castillo Escario, Y.; Blanco Almazan, D.; Camara Vazquez, M.A.; Jane Campos, R.

    2016-07-01

    Obstructive sleep apnea (OSA) is a highly prevalent chronic disease, especially in elderly and obese population. Despite constituting a huge health and economic problem, most patients remain undiagnosed due to limitations in current strategies. Therefore, it is essential to find cost-effective diagnostic alternatives. One of these novel approaches is the analysis of acoustic snoring signals. Snoring is an early symptom of OSA which carries pathophysiological information of high diagnostic value. For this reason, the main objective of this work is to study the characteristics of single snores of different types, from healthy and OSA subjects. To do that, we analyzed snoring signals from previous databases and developed an experimental protocol to record simulated OSA-related sounds and characterize the response of two commercial tracheal microphones. Automatic programs for filtering, downsampling, event detection and time-frequency analysis were built in MATLAB. We found that time-frequency maps and spectral parameters (central, mean and peak frequency and energy in the 100-500 Hz band) allow distinguishing regular snores of healthy subjects from non-regular snores and snores of OSA subjects. Regarding the two commercial microphones, we found that one of them was a suitable snoring sensor, while the other had a too restricted frequency response. Future work shall include a higher number of episodes and subjects, but our study has contributed to show how important the differences between regular and non-regular snores can be for OSA diagnosis, and how much clinically relevant information can be extracted from time-frequency maps and spectral parameters of single snores. (Author)

  1. Quantum measurement information as a key to energy extraction from local vacuums

    International Nuclear Information System (INIS)

    Hotta, Masahiro

    2008-01-01

    In this paper, a protocol is proposed in which energy extraction from local vacuum states is possible by using quantum measurement information for the vacuum state of quantum fields. In the protocol, Alice, who stays at a spatial point, excites the ground state of the fields by a local measurement. Consequently, wave packets generated by Alice's measurement propagate the vacuum to spatial infinity. Let us assume that Bob stays away from Alice and fails to catch the excitation energy when the wave packets pass in front of him. Next Alice announces her local measurement result to Bob by classical communication. Bob performs a local unitary operation depending on the measurement result. In this process, positive energy is released from the fields to Bob's apparatus of the unitary operation. In the field systems, wave packets are generated with negative energy around Bob's location. Soon afterwards, the negative-energy wave packets begin to chase after the positive-energy wave packets generated by Alice and form loosely bound states.

  2. Oxygen octahedra picker: A software tool to extract quantitative information from STEM images

    Energy Technology Data Exchange (ETDEWEB)

    Wang, Yi, E-mail: y.wang@fkf.mpg.de; Salzberger, Ute; Sigle, Wilfried; Eren Suyolcu, Y.; Aken, Peter A. van

    2016-09-15

    In perovskite oxide based materials and hetero-structures there are often strong correlations between oxygen octahedral distortions and functionality. Thus, atomistic understanding of the octahedral distortion, which requires accurate measurements of atomic column positions, will greatly help to engineer their properties. Here, we report the development of a software tool to extract quantitative information of the lattice and of BO{sub 6} octahedral distortions from STEM images. Center-of-mass and 2D Gaussian fitting methods are implemented to locate positions of individual atom columns. The precision of atomic column distance measurements is evaluated on both simulated and experimental images. The application of the software tool is demonstrated using practical examples. - Highlights: • We report a software tool for mapping atomic positions from HAADF and ABF images. • It enables quantification of both crystal lattice and oxygen octahedral distortions. • We test the measurement accuracy and precision on simulated and experimental images. • It works well for different orientations of perovskite structures and interfaces.

  3. Note on difference spectra for fast extraction of global image information.

    CSIR Research Space (South Africa)

    Van Wyk, BJ

    2007-06-01

    Full Text Available FOR FAST EXTRACTION OF GLOBAL IMAGE INFORMATION. B.J van Wyk* M.A. van Wyk* and F. van den Bergh** * c29c55c48c51c46c4bc03 c36c52c58c57c4bc03 c24c49c55c4cc46c44c51c03 c37c48c46c4bc51c4cc46c44c4fc03 c2cc51c56c57c4cc57c58c57c48c03 c4cc51c03 c28c4fc48c...46c57c55c52c51c4cc46c56c03 c0bc29cb6c36c24c37c2cc28c0cc03 c44c57c03 c57c4bc48c03 c37c56c4bc5ac44c51c48c03 c38c51c4cc59c48c55c56c4cc57c5cc03 c52c49c03 Technology, Private Bag X680, Pretoria 0001. ** Remote Sensing Research Group, Meraka Institute...

  4. Effective Analysis of NGS Metagenomic Data with Ultra-Fast Clustering Algorithms (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    Energy Technology Data Exchange (ETDEWEB)

    Li, Weizhong

    2011-10-12

    San Diego Supercomputer Center's Weizhong Li on "Effective Analysis of NGS Metagenomic Data with Ultra-fast Clustering Algorithms" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  5. The microbiome of Brazilian mangrove sediments as revealed by metagenomics.

    Directory of Open Access Journals (Sweden)

    Fernando Dini Andreote

    Full Text Available Here we embark in a deep metagenomic survey that revealed the taxonomic and potential metabolic pathways aspects of mangrove sediment microbiology. The extraction of DNA from sediment samples and the direct application of pyrosequencing resulted in approximately 215 Mb of data from four distinct mangrove areas (BrMgv01 to 04 in Brazil. The taxonomic approaches applied revealed the dominance of Deltaproteobacteria and Gammaproteobacteria in the samples. Paired statistical analysis showed higher proportions of specific taxonomic groups in each dataset. The metabolic reconstruction indicated the possible occurrence of processes modulated by the prevailing conditions found in mangrove sediments. In terms of carbon cycling, the sequences indicated the prevalence of genes involved in the metabolism of methane, formaldehyde, and carbon dioxide. With respect to the nitrogen cycle, evidence for sequences associated with dissimilatory reduction of nitrate, nitrogen immobilization, and denitrification was detected. Sequences related to the production of adenylsulfate, sulfite, and H(2S were relevant to the sulphur cycle. These data indicate that the microbial core involved in methane, nitrogen, and sulphur metabolism consists mainly of Burkholderiaceae, Planctomycetaceae, Rhodobacteraceae, and Desulfobacteraceae. Comparison of our data to datasets from soil and sea samples resulted in the allotment of the mangrove sediments between those samples. The results of this study add valuable data about the composition of microbial communities in mangroves and also shed light on possible transformations promoted by microbial organisms in mangrove sediments.

  6. The microbiome of Brazilian mangrove sediments as revealed by metagenomics.

    Science.gov (United States)

    Andreote, Fernando Dini; Jiménez, Diego Javier; Chaves, Diego; Dias, Armando Cavalcante Franco; Luvizotto, Danice Mazzer; Dini-Andreote, Francisco; Fasanella, Cristiane Cipola; Lopez, Maryeimy Varon; Baena, Sandra; Taketani, Rodrigo Gouvêa; de Melo, Itamar Soares

    2012-01-01

    Here we embark in a deep metagenomic survey that revealed the taxonomic and potential metabolic pathways aspects of mangrove sediment microbiology. The extraction of DNA from sediment samples and the direct application of pyrosequencing resulted in approximately 215 Mb of data from four distinct mangrove areas (BrMgv01 to 04) in Brazil. The taxonomic approaches applied revealed the dominance of Deltaproteobacteria and Gammaproteobacteria in the samples. Paired statistical analysis showed higher proportions of specific taxonomic groups in each dataset. The metabolic reconstruction indicated the possible occurrence of processes modulated by the prevailing conditions found in mangrove sediments. In terms of carbon cycling, the sequences indicated the prevalence of genes involved in the metabolism of methane, formaldehyde, and carbon dioxide. With respect to the nitrogen cycle, evidence for sequences associated with dissimilatory reduction of nitrate, nitrogen immobilization, and denitrification was detected. Sequences related to the production of adenylsulfate, sulfite, and H(2)S were relevant to the sulphur cycle. These data indicate that the microbial core involved in methane, nitrogen, and sulphur metabolism consists mainly of Burkholderiaceae, Planctomycetaceae, Rhodobacteraceae, and Desulfobacteraceae. Comparison of our data to datasets from soil and sea samples resulted in the allotment of the mangrove sediments between those samples. The results of this study add valuable data about the composition of microbial communities in mangroves and also shed light on possible transformations promoted by microbial organisms in mangrove sediments.

  7. Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software.

    Science.gov (United States)

    Sczyrba, Alexander; Hofmann, Peter; Belmann, Peter; Koslicki, David; Janssen, Stefan; Dröge, Johannes; Gregor, Ivan; Majda, Stephan; Fiedler, Jessika; Dahms, Eik; Bremges, Andreas; Fritz, Adrian; Garrido-Oter, Ruben; Jørgensen, Tue Sparholt; Shapiro, Nicole; Blood, Philip D; Gurevich, Alexey; Bai, Yang; Turaev, Dmitrij; DeMaere, Matthew Z; Chikhi, Rayan; Nagarajan, Niranjan; Quince, Christopher; Meyer, Fernando; Balvočiūtė, Monika; Hansen, Lars Hestbjerg; Sørensen, Søren J; Chia, Burton K H; Denis, Bertrand; Froula, Jeff L; Wang, Zhong; Egan, Robert; Don Kang, Dongwan; Cook, Jeffrey J; Deltel, Charles; Beckstette, Michael; Lemaitre, Claire; Peterlongo, Pierre; Rizk, Guillaume; Lavenier, Dominique; Wu, Yu-Wei; Singer, Steven W; Jain, Chirag; Strous, Marc; Klingenberg, Heiner; Meinicke, Peter; Barton, Michael D; Lingner, Thomas; Lin, Hsin-Hung; Liao, Yu-Chieh; Silva, Genivaldo Gueiros Z; Cuevas, Daniel A; Edwards, Robert A; Saha, Surya; Piro, Vitor C; Renard, Bernhard Y; Pop, Mihai; Klenk, Hans-Peter; Göker, Markus; Kyrpides, Nikos C; Woyke, Tanja; Vorholt, Julia A; Schulze-Lefert, Paul; Rubin, Edward M; Darling, Aaron E; Rattei, Thomas; McHardy, Alice C

    2017-11-01

    Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.

  8. Unravelling core microbial metabolisms in the hypersaline microbial mats of Shark Bay using high-throughput metagenomics

    Energy Technology Data Exchange (ETDEWEB)

    Ruvindy, Rendy; White III, Richard Allen; Neilan, Brett Anthony; Burns, Brendan Paul

    2015-05-29

    Modern microbial mats are potential analogues of some of Earth’s earliest ecosystems. Excellent examples can be found in Shark Bay, Australia, with mats of various morphologies. To further our understanding of the functional genetic potential of these complex microbial ecosystems, we conducted for the first time shotgun metagenomic analyses. We assembled metagenomic nextgeneration sequencing data to classify the taxonomic and metabolic potential across diverse morphologies of marine mats in Shark Bay. The microbial community across taxonomic classifications using protein-coding and small subunit rRNA genes directly extracted from the metagenomes suggests that three phyla Proteobacteria, Cyanobacteria and Bacteriodetes dominate all marine mats. However, the microbial community structure between Shark Bay and Highbourne Cay (Bahamas) marine systems appears to be distinct from each other. The metabolic potential (based on SEED subsystem classifications) of the Shark Bay and Highbourne Cay microbial communities were also distinct. Shark Bay metagenomes have a metabolic pathway profile consisting of both heterotrophic and photosynthetic pathways, whereas Highbourne Cay appears to be dominated almost exclusively by photosynthetic pathways. Alternative non-rubisco-based carbon metabolism including reductive TCA cycle and 3-hydroxypropionate/4-hydroxybutyrate pathways is highly represented in Shark Bay metagenomes while not represented in Highbourne Cay microbial mats or any other mat forming ecosystems investigated to date. Potentially novel aspects of nitrogen cycling were also observed, as well as putative heavy metal cycling (arsenic, mercury, copper and cadmium). Finally, archaea are highly represented in Shark Bay and may have critical roles in overall ecosystem function in these modern microbial mats.

  9. Analysis Methods for Extracting Knowledge from Large-Scale WiFi Monitoring to Inform Building Facility Planning

    DEFF Research Database (Denmark)

    Ruiz-Ruiz, Antonio; Blunck, Henrik; Prentow, Thor Siiger

    2014-01-01

    realistic data to inform facility planning. In this paper, we propose analysis methods to extract knowledge from large sets of network collected WiFi traces to better inform facility management and planning in large building complexes. The analysis methods, which build on a rich set of temporal and spatial......The optimization of logistics in large building com- plexes with many resources, such as hospitals, require realistic facility management and planning. Current planning practices rely foremost on manual observations or coarse unverified as- sumptions and therefore do not properly scale or provide....... Spatio-temporal visualization tools built on top of these methods enable planners to inspect and explore extracted information to inform facility-planning activities. To evaluate the methods, we present results for a large hospital complex covering more than 10 hectares. The evaluation is based on Wi...

  10. Novel resistance functions uncovered using functional metagenomic investigations of resistance reservoirs

    Directory of Open Access Journals (Sweden)

    Erica C. Pehrsson

    2013-06-01

    Full Text Available Rates of infection with antibiotic-resistant bacteria have increased precipitously over the past several decades, with far-reaching healthcare and societal costs. Recent evidence has established a link between antibiotic resistance genes in human pathogens and those found in non-pathogenic, commensal, and environmental organisms, prompting deeper investigation of natural and human-associated reservoirs of antibiotic resistance. Functional metagenomic selections, in which shotgun-cloned DNA fragments are selected for their ability to confer survival to an indicator host, have been increasingly applied to the characterization of many antibiotic resistance reservoirs. These experiments have demonstrated that antibiotic resistance genes are highly diverse and widely distributed, many times bearing little to no similarity to known sequences. Through unbiased selections for survival to antibiotic exposure, functional metagenomics can improve annotations by reducing the discovery of false-positive resistance and by allowing for the identification of previously unrecognizable resistance genes. In this review, we summarize the novel resistance functions uncovered using functional metagenomic investigations of natural and human-impacted resistance reservoirs. Examples of novel antibiotic resistance genes include those highly divergent from known sequences, those for which sequence is entirely unable to predict resistance function, bifunctional resistance genes, and those with unconventional, atypical resistance mechanisms. Overcoming antibiotic resistance in the clinic will require a better understanding of existing resistance reservoirs and the dissemination networks that govern horizontal gene exchange, informing best practices to limit the spread of resistance-conferring genes to human pathogens.

  11. Comparative Metagenomics of Freshwater Microbial Communities

    International Nuclear Information System (INIS)

    Hemme, Chris; Deng, Ye; Tu, Qichao; Fields, Matthew; Gentry, Terry; Wu, Liyou; Tringe, Susannah; Watson, David; He, Zhili; Hazen, Terry; Tiedje, James; Rubin, Eddy; Zhou, Jizhong

    2010-01-01

    Previous analyses of a microbial metagenome from uranium and nitric-acid contaminated groundwater (FW106) showed significant environmental effects resulting from the rapid introduction of multiple contaminants. Effects include a massive loss of species and strain biodiversity, accumulation of toxin resistant genes in the metagenome and lateral transfer of toxin resistance genes between community members. To better understand these results in an ecological context, a second metagenome from a pristine groundwater system located along the same geological strike was sequenced and analyzed (FW301). It is hypothesized that FW301 approximates the ancestral FW106 community based on phylogenetic profiles and common geological parameters; however, even if is not the case, the datasets still permit comparisons between healthy and stressed groundwater ecosystems. Complex carbohydrate metabolism has been almost entirely lost in the stressed ecosystem. In contrast, the pristine system encodes a wide diversity of complex carbohydrate metabolism systems, suggesting that carbon turnover is very rapid and less leaky in the healthy groundwater system. FW301 encodes many (∼160+) carbon monoxide dehydrogenase genes while FW106 encodes none. This result suggests that the community is frequently exposed to oxygen from aerated rainwater percolating into the subsurface, with a resulting high rate of carbon metabolism and CO production. When oxygen levels fall, the CO then serves as a major carbon source for the community. FW301 appears to be capable of CO2 fixation via the reductive carboxylase (reverse TCA) cycle and possibly acetogenesis, activities; these activities are lacking in the heterotrophic FW106 system which relies exclusively on respiration of nitrate and/or oxygen for energy production. FW301 encodes a complete set of B12 biosynthesis pathway at high abundance suggesting the use of sodium gradients for energy production in the healthy groundwater community. Overall

  12. Extraction as a source of additional information when concentrations in multicomponent systems are simultaneously determined

    International Nuclear Information System (INIS)

    Perkov, I.G.

    1988-01-01

    Using as an example photometric determination of Nd and Sm in their joint presence, the possibility to use the influence of extraction on analytic signal increase is considered. It is shown that interligand exchange in extracts in combination with simultaneous determination of concentrations can be used as a simple means increasing the accuracy of determination. 5 refs.; 2 figs.; 3 tabs

  13. Chronic Meningitis Investigated via Metagenomic Next-Generation Sequencing

    Science.gov (United States)

    O’Donovan, Brian D.; Gelfand, Jeffrey M.; Sample, Hannah A.; Chow, Felicia C.; Betjemann, John P.; Shah, Maulik P.; Richie, Megan B.; Gorman, Mark P.; Hajj-Ali, Rula A.; Calabrese, Leonard H.; Zorn, Kelsey C.; Chow, Eric D.; Greenlee, John E.; Blum, Jonathan H.; Green, Gary; Khan, Lillian M.; Banerji, Debarko; Langelier, Charles; Bryson-Cahn, Chloe; Harrington, Whitney; Lingappa, Jairam R.; Shanbhag, Niraj M.; Green, Ari J.; Brew, Bruce J.; Soldatos, Ariane; Strnad, Luke; Doernberg, Sarah B.; Jay, Cheryl A.; Douglas, Vanja; Josephson, S. Andrew; DeRisi, Joseph L.

    2018-01-01

    Importance Identifying infectious causes of subacute or chronic meningitis can be challenging. Enhanced, unbiased diagnostic approaches are needed. Objective To present a case series of patients with diagnostically challenging subacute or chronic meningitis using metagenomic next-generation sequencing (mNGS) of cerebrospinal fluid (CSF) supported by a statistical framework generated from mNGS of control samples from the environment and from patients who were noninfectious. Design, Setting, and Participants In this case series, mNGS data obtained from the CSF of 94 patients with noninfectious neuroinflammatory disorders and from 24 water and reagent control samples were used to develop and implement a weighted scoring metric based on z scores at the species and genus levels for both nucleotide and protein alignments to prioritize and rank the mNGS results. Total RNA was extracted for mNGS from the CSF of 7 participants with subacute or chronic meningitis who were recruited between September 2013 and March 2017 as part of a multicenter study of mNGS pathogen discovery among patients with suspected neuroinflammatory conditions. The neurologic infections identified by mNGS in these 7 participants represented a diverse array of pathogens. The patients were referred from the University of California, San Francisco Medical Center (n = 2), Zuckerberg San Francisco General Hospital and Trauma Center (n = 2), Cleveland Clinic (n = 1), University of Washington (n = 1), and Kaiser Permanente (n = 1). A weighted z score was used to filter out environmental contaminants and facilitate efficient data triage and analysis. Main Outcomes and Measures Pathogens identified by mNGS and the ability of a statistical model to prioritize, rank, and simplify mNGS results. Results The 7 participants ranged in age from 10 to 55 years, and 3 (43%) were female. A parasitic worm (Taenia solium, in 2 participants), a virus (HIV-1), and 4 fungi (Cryptococcus neoformans

  14. Culture-independent detection and characterisation of Mycobacterium tuberculosis and M. africanum in sputum samples using shotgun metagenomics on a benchtop sequencer

    Directory of Open Access Journals (Sweden)

    Emma L. Doughty

    2014-09-01

    Full Text Available Tuberculosis remains a major global health problem. Laboratory diagnostic methods that allow effective, early detection of cases are central to management of tuberculosis in the individual patient and in the community. Since the 1880s, laboratory diagnosis of tuberculosis has relied primarily on microscopy and culture. However, microscopy fails to provide species- or lineage-level identification and culture-based workflows for diagnosis of tuberculosis remain complex, expensive, slow, technically demanding and poorly able to handle mixed infections. We therefore explored the potential of shotgun metagenomics, sequencing of DNA from samples without culture or target-specific amplification or capture, to detect and characterise strains from the Mycobacterium tuberculosis complex in smear-positive sputum samples obtained from The Gambia in West Africa. Eight smear- and culture-positive sputum samples were investigated using a differential-lysis protocol followed by a kit-based DNA extraction method, with sequencing performed on a benchtop sequencing instrument, the Illumina MiSeq. The number of sequence reads in each sputum-derived metagenome ranged from 989,442 to 2,818,238. The proportion of reads in each metagenome mapping against the human genome ranged from 20% to 99%. We were able to detect sequences from the M. tuberculosis complex in all eight samples, with coverage of the H37Rv reference genome ranging from 0.002X to 0.7X. By analysing the distribution of large sequence polymorphisms (deletions and the locations of the insertion element IS6110 and single nucleotide polymorphisms (SNPs, we were able to assign seven of eight metagenome-derived genomes to a species and lineage within the M. tuberculosis complex. Two metagenome-derived mycobacterial genomes were assigned to M. africanum, a species largely confined to West Africa; the others that could be assigned belonged to lineages T, H or LAM within the clade of “modern” M. tuberculosis

  15. Single Cell and Metagenomic Assemblies: Biology Drives Technical Choices and Goals (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    Energy Technology Data Exchange (ETDEWEB)

    Stepanauskas, Ramunas

    2011-10-13

    DOE JGI's Tanja Woyke, chair of the Single Cells and Metagenomes session, delivers an introduction, followed by Bigelow Laboratory's Ramunas Stepanauskas on "Single Cell and Metagenomic Assemblies: Biology Drives Technical Choices and Goals" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  16. Analysis and comparison of very large metagenomes with fast clustering and functional annotation

    Directory of Open Access Journals (Sweden)

    Li Weizhong

    2009-10-01

    Full Text Available Abstract Background The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely time-consuming, and there are often many novel sequences in these metagenomes that are not fully utilized. The number of available metagenomes is rapidly increasing, so fast and efficient metagenome comparison methods are in great demand. Results The new metagenomic data analysis method Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (RAMMCAP was developed using an ultra-fast sequence clustering algorithm, fast protein family annotation tools, and a novel statistical metagenome comparison method that employs a unique graphic interface. RAMMCAP processes extremely large datasets with only moderate computational effort. It identifies raw read clusters and protein clusters that may include novel gene families, and compares metagenomes using clusters or functional annotations calculated by RAMMCAP. In this study, RAMMCAP was applied to the two largest available metagenomic collections, the "Global Ocean Sampling" and the "Metagenomic Profiling of Nine Biomes". Conclusion RAMMCAP is a very fast method that can cluster and annotate one million metagenomic reads in only hundreds of CPU hours. It is available from http://tools.camera.calit2.net/camera/rammcap/.

  17. Metagenomic mining of feruloyl esterases from termite enteric flora

    CSIR Research Space (South Africa)

    Rashamuse, K

    2014-01-01

    Full Text Available A metagenome expression library was created from Trinervitermes trinervoides termite hindgut symbionts and subsequently screened for feruloyl esterase (FAE) activities, resulting in seven recombinant fosmids conferring feruloyl esterase phenotypes...

  18. Towards diagnostic metagenomics of Campylobacter in fecal samples

    DEFF Research Database (Denmark)

    Andersen, Sandra Christine; Kiil, Kristoffer; Harder, Christoffer Bugge

    2017-01-01

    The development of diagnostic metagenomics is driven by the need for universal, culture-independent methods for detection and characterization of pathogens to substitute the time-consuming, organism-specific, and often culture-based laboratory procedures for epidemiological source-tracing. Some...... of the challenges in diagnostic metagenomics are, that it requires a great next-generation sequencing depth and unautomated data analysis. DNA from human fecal samples spiked with 7.75 × 101-7.75 × 107 colony forming unit (CFU)/ml Campylobacter jejuni and chicken fecal samples spiked with 1 × 102-1 × 106 CFU...... Campylobacter in all the clinical samples. Sensitivity in diagnostic metagenomics is improving and has reached a clinically relevant level. There are still challenges to overcome before real-time diagnostic metagenomics can replace quantitative polymerase chain reaction (qPCR) or culture-based surveillance...

  19. Validation and extraction of molecular-geometry information from small-molecule databases.

    Science.gov (United States)

    Long, Fei; Nicholls, Robert A; Emsley, Paul; Graǽulis, Saulius; Merkys, Andrius; Vaitkus, Antanas; Murshudov, Garib N

    2017-02-01

    A freely available small-molecule structure database, the Crystallography Open Database (COD), is used for the extraction of molecular-geometry information on small-molecule compounds. The results are used for the generation of new ligand descriptions, which are subsequently used by macromolecular model-building and structure-refinement software. To increase the reliability of the derived data, and therefore the new ligand descriptions, the entries from this database were subjected to very strict validation. The selection criteria made sure that the crystal structures used to derive atom types, bond and angle classes are of sufficiently high quality. Any suspicious entries at a crystal or molecular level were removed from further consideration. The selection criteria included (i) the resolution of the data used for refinement (entries solved at 0.84 Å resolution or higher) and (ii) the structure-solution method (structures must be from a single-crystal experiment and all atoms of generated molecules must have full occupancies), as well as basic sanity checks such as (iii) consistency between the valences and the number of connections between atoms, (iv) acceptable bond-length deviations from the expected values and (v) detection of atomic collisions. The derived atom types and bond classes were then validated using high-order moment-based statistical techniques. The results of the statistical analyses were fed back to fine-tune the atom typing. The developed procedure was repeated four times, resulting in fine-grained atom typing, bond and angle classes. The procedure will be repeated in the future as and when new entries are deposited in the COD. The whole procedure can also be applied to any source of small-molecule structures, including the Cambridge Structural Database and the ZINC database.

  20. Extracting respiratory information from seismocardiogram signals acquired on the chest using a miniature accelerometer

    International Nuclear Information System (INIS)

    Pandia, Keya; Inan, Omer T; Kovacs, Gregory T A; Giovangrandi, Laurent

    2012-01-01

    Seismocardiography (SCG) is a non-invasive measurement of the vibrations of the chest caused by the heartbeat. SCG signals can be measured using a miniature accelerometer attached to the chest, and are thus well-suited for unobtrusive and long-term patient monitoring. Additionally, SCG contains information relating to both cardiovascular and respiratory systems. In this work, algorithms were developed for extracting three respiration-dependent features of the SCG signal: intensity modulation, timing interval changes within each heartbeat, and timing interval changes between successive heartbeats. Simultaneously with a reference respiration belt, SCG signals were measured from 20 healthy subjects and a respiration rate was estimated using each of the three SCG features and the reference signal. The agreement between each of the three accelerometer-derived respiration rate measurements was computed with respect to the respiration rate derived from the reference respiration belt. The respiration rate obtained from the intensity modulation in the SCG signal was found to be in closest agreement with the respiration rate obtained from the reference respiration belt: the bias was found to be 0.06 breaths per minute with a 95% confidence interval of −0.99 to 1.11 breaths per minute. The limits of agreement between the respiration rates estimated using SCG (intensity modulation) and the reference were within the clinically relevant ranges given in existing literature, demonstrating that SCG could be used for both cardiovascular and respiratory monitoring. Furthermore, phases of each of the three SCG parameters were investigated at four instances of a respiration cycle—start inspiration, peak inspiration, start expiration, and peak expiration—and during breath hold (apnea). The phases of the three SCG parameters observed during the respiration cycle were congruent with existing literature and physiologically expected trends. (paper)

  1. Extracting key information from historical data to quantify the transmission dynamics of smallpox

    Directory of Open Access Journals (Sweden)

    Brockmann Stefan O

    2008-08-01

    Full Text Available Abstract Background Quantification of the transmission dynamics of smallpox is crucial for optimizing intervention strategies in the event of a bioterrorist attack. This article reviews basic methods and findings in mathematical and statistical studies of smallpox which estimate key transmission parameters from historical data. Main findings First, critically important aspects in extracting key information from historical data are briefly summarized. We mention different sources of heterogeneity and potential pitfalls in utilizing historical records. Second, we discuss how smallpox spreads in the absence of interventions and how the optimal timing of quarantine and isolation measures can be determined. Case studies demonstrate the following. (1 The upper confidence limit of the 99th percentile of the incubation period is 22.2 days, suggesting that quarantine should last 23 days. (2 The highest frequency (61.8% of secondary transmissions occurs 3–5 days after onset of fever so that infected individuals should be isolated before the appearance of rash. (3 The U-shaped age-specific case fatality implies a vulnerability of infants and elderly among non-immune individuals. Estimates of the transmission potential are subsequently reviewed, followed by an assessment of vaccination effects and of the expected effectiveness of interventions. Conclusion Current debates on bio-terrorism preparedness indicate that public health decision making must account for the complex interplay and balance between vaccination strategies and other public health measures (e.g. case isolation and contact tracing taking into account the frequency of adverse events to vaccination. In this review, we summarize what has already been clarified and point out needs to analyze previous smallpox outbreaks systematically.

  2. Oral Metagenomic Biomarkers in Rheumatoid Arthritis

    Science.gov (United States)

    2017-09-01

    individuals with rheumatoid arthritis (RA). The goal is to test the  hypothesis that oral microbiome and metagenomic analyses will allow  us  to identify new...biomarkers  that are  useful  for the diagnosis of early RA and/or biomarkers that help to predict the efficacy of  specific therapeutic interventions... RNA  microbiome analysis as well as whole genome shotgun sequencing.  Upon completion of these aims, any identified bacterial biomarkers may be

  3. FY11 Report on Metagenome Analysis using Pathogen Marker Libraries

    Energy Technology Data Exchange (ETDEWEB)

    Gardner, Shea N. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Allen, Jonathan E. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); McLoughlin, Kevin S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Slezak, Tom [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2011-06-02

    A method, sequence library, and software suite was invented to rapidly assess whether any member of a pre-specified list of threat organisms or their near neighbors is present in a metagenome. The system was designed to handle mega- to giga-bases of FASTA-formatted raw sequence reads from short or long read next generation sequencing platforms. The approach is to pre-calculate a viral and a bacterial "Pathogen Marker Library" (PML) containing sub-sequences specific to pathogens or their near neighbors. A list of expected matches comparing every bacterial or viral genome against the PML sequences is also pre-calculated. To analyze a metagenome, reads are compared to the PML, and observed PML-metagenome matches are compared to the expected PML-genome matches, and the ratio of observed relative to expected matches is reported. In other words, a 3-way comparison among the PML, metagenome, and existing genome sequences is used to quickly assess which (if any) species included in the PML is likely to be present in the metagenome, based on available sequence data. Our tests showed that the species with the most PML matches correctly indicated the organism sequenced for empirical metagenomes consisting of a cultured, relatively pure isolate. These runs completed in 1 minute to 3 hours on 12 CPU (1 thread/CPU), depending on the metagenome and PML. Using more threads on the same number of CPU resulted in speed improvements roughly proportional to the number of threads. Simulations indicated that detection sensitivity depends on both sequencing coverage levels for a species and the size of the PML: species were correctly detected even at ~0.003x coverage by the large PMLs, and at ~0.03x coverage by the smaller PMLs. Matches to true positive species were 3-4 orders of magnitude higher than to false positives. Simulations with short reads (36 nt and ~260 nt) showed that species were usually detected for metagenome coverage above 0.005x and coverage in the PML above 0.05x, and

  4. Expanding the marine virosphere using metagenomics.

    Directory of Open Access Journals (Sweden)

    Carolina Megumi Mizuno

    Full Text Available Viruses infecting prokaryotic cells (phages are the most abundant entities of the biosphere and contain a largely uncharted wealth of genomic diversity. They play a critical role in the biology of their hosts and in ecosystem functioning at large. The classical approaches studying phages require isolation from a pure culture of the host. Direct sequencing approaches have been hampered by the small amounts of phage DNA present in most natural habitats and the difficulty in applying meta-omic approaches, such as annotation of small reads and assembly. Serendipitously, it has been discovered that cellular metagenomes of highly productive ocean waters (the deep chlorophyll maximum contain significant amounts of viral DNA derived from cells undergoing the lytic cycle. We have taken advantage of this phenomenon to retrieve metagenomic fosmids containing viral DNA from a Mediterranean deep chlorophyll maximum sample. This method allowed description of complete genomes of 208 new marine phages. The diversity of these genomes was remarkable, contributing 21 genomic groups of tailed bacteriophages of which 10 are completely new. Sequence based methods have allowed host assignment to many of them. These predicted hosts represent a wide variety of important marine prokaryotic microbes like members of SAR11 and SAR116 clades, Cyanobacteria and also the newly described low GC Actinobacteria. A metavirome constructed from the same habitat showed that many of the new phage genomes were abundantly represented. Furthermore, other available metaviromes also indicated that some of the new phages are globally distributed in low to medium latitude ocean waters. The availability of many genomes from the same sample allows a direct approach to viral population genomics confirming the remarkable mosaicism of phage genomes.

  5. Exploration of Metagenome Assemblies with an Interactive Visualization Tool

    Energy Technology Data Exchange (ETDEWEB)

    Cantor, Michael; Nordberg, Henrik; Smirnova, Tatyana; Andersen, Evan; Tringe, Susannah; Hess, Matthias; Dubchak, Inna

    2014-07-09

    Metagenomics, one of the fastest growing areas of modern genomic science, is the genetic profiling of the entire community of microbial organisms present in an environmental sample. Elviz is a web-based tool for the interactive exploration of metagenome assemblies. Elviz can be used with publicly available data sets from the Joint Genome Institute or with custom user-loaded assemblies. Elviz is available at genome.jgi.doe.gov/viz

  6. Evaluation of needle trap micro-extraction and solid-phase micro-extraction: Obtaining comprehensive information on volatile emissions from in vitro cultures.

    Science.gov (United States)

    Oertel, Peter; Bergmann, Andreas; Fischer, Sina; Trefz, Phillip; Küntzel, Anne; Reinhold, Petra; Köhler, Heike; Schubert, Jochen K; Miekisch, Wolfram

    2018-05-14

    Volatile organic compounds (VOCs) emitted from in vitro cultures may reveal information on species and metabolism. Owing to low nmol L -1 concentration ranges, pre-concentration techniques are required for gas chromatography-mass spectrometry (GC-MS) based analyses. This study was intended to compare the efficiency of established micro-extraction techniques - solid-phase micro-extraction (SPME) and needle-trap micro-extraction (NTME) - for the analysis of complex VOC patterns. For SPME, a 75 μm Carboxen®/polydimethylsiloxane fiber was used. The NTME needle was packed with divinylbenzene, Carbopack X and Carboxen 1000. The headspace was sampled bi-directionally. Seventy-two VOCs were calibrated by reference standard mixtures in the range of 0.041-62.24 nmol L -1 by means of GC-MS. Both pre-concentration methods were applied to profile VOCs from cultures of Mycobacterium avium ssp. paratuberculosis. Limits of detection ranged from 0.004 to 3.93 nmol L -1 (median = 0.030 nmol L -1 ) for NTME and from 0.001 to 5.684 nmol L -1 (median = 0.043 nmol L -1 ) for SPME. NTME showed advantages in assessing polar compounds such as alcohols. SPME showed advantages in reproducibility but disadvantages in sensitivity for N-containing compounds. Micro-extraction techniques such as SPME and NTME are well suited for trace VOC profiling over cultures if the limitations of each technique is taken into account. Copyright © 2018 John Wiley & Sons, Ltd.

  7. Introduction to Metagenomics at DOE JGI (Opening Remarks for the Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    Energy Technology Data Exchange (ETDEWEB)

    Kyrpides, Nikos [DOE JGI

    2011-10-12

    After a quick introduction by DOE JGI Director Eddy Rubin, DOE JGI's Nikos Kyrpides delivers the opening remarks at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011

  8. A COMPARATIVE ANALYSIS OF WEB INFORMATION EXTRACTION TECHNIQUES DEEP LEARNING vs. NAÏVE BAYES vs. BACK PROPAGATION NEURAL NETWORKS IN WEB DOCUMENT EXTRACTION

    Directory of Open Access Journals (Sweden)

    J. Sharmila

    2016-01-01

    Full Text Available Web mining related exploration is getting the chance to be more essential these days in view of the reason that a lot of information is overseen through the web. Web utilization is expanding in an uncontrolled way. A particular framework is required for controlling such extensive measure of information in the web space. Web mining is ordered into three noteworthy divisions: Web content mining, web usage mining and web structure mining. Tak-Lam Wong has proposed a web content mining methodology in the exploration with the aid of Bayesian Networks (BN. In their methodology, they were learning on separating the web data and characteristic revelation in view of the Bayesian approach. Roused from their investigation, we mean to propose a web content mining methodology, in view of a Deep Learning Algorithm. The Deep Learning Algorithm gives the interest over BN on the basis that BN is not considered in any learning architecture planning like to propose system. The main objective of this investigation is web document extraction utilizing different grouping algorithm and investigation. This work extricates the data from the web URL. This work shows three classification algorithms, Deep Learning Algorithm, Bayesian Algorithm and BPNN Algorithm. Deep Learning is a capable arrangement of strategies for learning in neural system which is connected like computer vision, speech recognition, and natural language processing and biometrics framework. Deep Learning is one of the simple classification technique and which is utilized for subset of extensive field furthermore Deep Learning has less time for classification. Naive Bayes classifiers are a group of basic probabilistic classifiers in view of applying Bayes hypothesis with concrete independence assumptions between the features. At that point the BPNN algorithm is utilized for classification. Initially training and testing dataset contains more URL. We extract the content presently from the dataset. The

  9. Functional Screening of Antibiotic Resistance Genes from a Representative Metagenomic Library of Food Fermenting Microbiota

    Directory of Open Access Journals (Sweden)

    Chiara Devirgiliis

    2014-01-01

    Full Text Available Lactic acid bacteria (LAB represent the predominant microbiota in fermented foods. Foodborne LAB have received increasing attention as potential reservoir of antibiotic resistance (AR determinants, which may be horizontally transferred to opportunistic pathogens. We have previously reported isolation of AR LAB from the raw ingredients of a fermented cheese, while AR genes could be detected in the final, marketed product only by PCR amplification, thus pointing at the need for more sensitive microbial isolation techniques. We turned therefore to construction of a metagenomic library containing microbial DNA extracted directly from the food matrix. To maximize yield and purity and to ensure that genomic complexity of the library was representative of the original bacterial population, we defined a suitable protocol for total DNA extraction from cheese which can also be applied to other lipid-rich foods. Functional library screening on different antibiotics allowed recovery of ampicillin and kanamycin resistant clones originating from Streptococcus salivarius subsp. thermophilus and Lactobacillus helveticus genomes. We report molecular characterization of the cloned inserts, which were fully sequenced and shown to confer AR phenotype to recipient bacteria. We also show that metagenomics can be applied to food microbiota to identify underrepresented species carrying specific genes of interest.

  10. Information extraction from dynamic PS-InSAR time series using machine learning

    Science.gov (United States)

    van de Kerkhof, B.; Pankratius, V.; Chang, L.; van Swol, R.; Hanssen, R. F.

    2017-12-01

    Due to the increasing number of SAR satellites, with shorter repeat intervals and higher resolutions, SAR data volumes are exploding. Time series analyses of SAR data, i.e. Persistent Scatterer (PS) InSAR, enable the deformation monitoring of the built environment at an unprecedented scale, with hundreds of scatterers per km2, updated weekly. Potential hazards, e.g. due to failure of aging infrastructure, can be detected at an early stage. Yet, this requires the operational data processing of billions of measurement points, over hundreds of epochs, updating this data set dynamically as new data come in, and testing whether points (start to) behave in an anomalous way. Moreover, the quality of PS-InSAR measurements is ambiguous and heterogeneous, which will yield false positives and false negatives. Such analyses are numerically challenging. Here we extract relevant information from PS-InSAR time series using machine learning algorithms. We cluster (group together) time series with similar behaviour, even though they may not be spatially close, such that the results can be used for further analysis. First we reduce the dimensionality of the dataset in order to be able to cluster the data, since applying clustering techniques on high dimensional datasets often result in unsatisfying results. Our approach is to apply t-distributed Stochastic Neighbor Embedding (t-SNE), a machine learning algorithm for dimensionality reduction of high-dimensional data to a 2D or 3D map, and cluster this result using Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The results show that we are able to detect and cluster time series with similar behaviour, which is the starting point for more extensive analysis into the underlying driving mechanisms. The results of the methods are compared to conventional hypothesis testing as well as a Self-Organising Map (SOM) approach. Hypothesis testing is robust and takes the stochastic nature of the observations into account

  11. Synthesis of High-Frequency Ground Motion Using Information Extracted from Low-Frequency Ground Motion

    Science.gov (United States)

    Iwaki, A.; Fujiwara, H.

    2012-12-01

    Broadband ground motion computations of scenario earthquakes are often based on hybrid methods that are the combinations of deterministic approach in lower frequency band and stochastic approach in higher frequency band. Typical computation methods for low-frequency and high-frequency (LF and HF, respectively) ground motions are the numerical simulations, such as finite-difference and finite-element methods based on three-dimensional velocity structure model, and the stochastic Green's function method, respectively. In such hybrid methods, LF and HF wave fields are generated through two different methods that are completely independent of each other, and are combined at the matching frequency. However, LF and HF wave fields are essentially not independent as long as they are from the same event. In this study, we focus on the relation among acceleration envelopes at different frequency bands, and attempt to synthesize HF ground motion using the information extracted from LF ground motion, aiming to propose a new method for broad-band strong motion prediction. Our study area is Kanto area, Japan. We use the K-NET and KiK-net surface acceleration data and compute RMS envelope at four frequency bands: 0.5-1.0 Hz, 1.0-2.0 Hz, 2.0-4.0 Hz, .0-8.0 Hz, and 8.0-16.0 Hz. Taking the ratio of the envelopes of adjacent bands, we find that the envelope ratios have stable shapes at each site. The empirical envelope-ratio characteristics are combined with low-frequency envelope of the target earthquake to synthesize HF ground motion. We have applied the method to M5-class earthquakes and a M7 target earthquake that occurred in the vicinity of Kanto area, and successfully reproduced the observed HF ground motion of the target earthquake. The method can be applied to a broad band ground motion simulation for a scenario earthquake by combining numerically-computed low-frequency (~1 Hz) ground motion with the empirical envelope ratio characteristics to generate broadband ground motion

  12. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata

    Science.gov (United States)

    Liolios, Konstantinos; Chen, I-Min A.; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Philip; Markowitz, Victor M.; Kyrpides, Nikos C.

    2010-01-01

    The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification. GOLD is available at: http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece, at: http://gold.imbb.forth.gr/ PMID:19914934

  13. TempoWordNet : une ressource lexicale pour l'extraction d'information temporelle

    OpenAIRE

    Hasanuzzaman , Mohammed

    2016-01-01

    The ability to capture the time information conveyed in natural language, where that information is expressed either explicitly, or implicitly, or connotative, is essential to many natural language processing applications such as information retrieval, question answering, automatic summarization, targeted marketing, loan repayment forecasting, and understanding economic patterns. Associating word senses with temporal orientation to grasp the temporal information in language is relatively stra...

  14. Microbial community profiling of human saliva using shotgun metagenomic sequencing.

    Directory of Open Access Journals (Sweden)

    Nur A Hasan

    Full Text Available Human saliva is clinically informative of both oral and general health. Since next generation shotgun sequencing (NGS is now widely used to identify and quantify bacteria, we investigated the bacterial flora of saliva microbiomes of two healthy volunteers and five datasets from the Human Microbiome Project, along with a control dataset containing short NGS reads from bacterial species representative of the bacterial flora of human saliva. GENIUS, a system designed to identify and quantify bacterial species using unassembled short NGS reads was used to identify the bacterial species comprising the microbiomes of the saliva samples and datasets. Results, achieved within minutes and at greater than 90% accuracy, showed more than 175 bacterial species comprised the bacterial flora of human saliva, including bacteria known to be commensal human flora but also Haemophilus influenzae, Neisseria meningitidis, Streptococcus pneumoniae, and Gamma proteobacteria. Basic Local Alignment Search Tool (BLASTn analysis in parallel, reported ca. five times more species than those actually comprising the in silico sample. Both GENIUS and BLAST analyses of saliva samples identified major genera comprising the bacterial flora of saliva, but GENIUS provided a more precise description of species composition, identifying to strain in most cases and delivered results at least 10,000 times faster. Therefore, GENIUS offers a facile and accurate system for identification and quantification of bacterial species and/or strains in metagenomic samples.

  15. Metagenomic screening for aromatic compound-responsive transcriptional regulators.

    Directory of Open Access Journals (Sweden)

    Taku Uchiyama

    Full Text Available We applied a metagenomics approach to screen for transcriptional regulators that sense aromatic compounds. The library was constructed by cloning environmental DNA fragments into a promoter-less vector containing green fluorescence protein. Fluorescence-based screening was then performed in the presence of various aromatic compounds. A total of 12 clones were isolated that fluoresced in response to salicylate, 3-methyl catechol, 4-chlorocatechol and chlorohydroquinone. Sequence analysis revealed at least 1 putative transcriptional regulator, excluding 1 clone (CHLO8F. Deletion analysis identified compound-specific transcriptional regulators; namely, 8 LysR-types, 2 two-component-types and 1 AraC-type. Of these, 9 representative clones were selected and their reaction specificities to 18 aromatic compounds were investigated. Overall, our transcriptional regulators were functionally diverse in terms of both specificity and induction rates. LysR- and AraC- type regulators had relatively narrow specificities with high induction rates (5-50 fold, whereas two-component-types had wide specificities with low induction rates (3 fold. Numerous transcriptional regulators have been deposited in sequence databases, but their functions remain largely unknown. Thus, our results add valuable information regarding the sequence-function relationship of transcriptional regulators.

  16. Microbiological profile of chicken carcasses: A comparative analysis using shotgun metagenomic sequencing

    Directory of Open Access Journals (Sweden)

    Alessandra De Cesare

    2018-04-01

    Full Text Available In the last few years metagenomic and 16S rRNA sequencing have completly changed the microbiological investigations of food products. In this preliminary study, the microbiological profile of chicken carcasses collected from animals fed with different diets were tested by using shotgun metagenomic sequencing. A total of 15 carcasses have been collected at the slaughetrhouse at the end of the refrigeration tunnel from chickens reared for 35 days and fed with a control diet (n=5, a diet supplemented with 1500 FTU/kg of commercial phytase (n=5 and a diet supplemented with 1500 FTU/kg of commercial phytase and 3g/kg of inositol (n=5. Ten grams of neck and breast skin were obtained from each carcass and submited to total DNA extraction by using the DNeasy Blood & Tissue Kit (Qiagen. Sequencing libraries have been prepared by using the Nextera XT DNA Library Preparation Kit (Illumina and sequenced in a HiScanSQ (Illumina at 100 bp in paired ends. A number of sequences ranging between 5 and 9 million was obtained for each sample. Sequence analysis showed that Proteobacteria and Firmicutes represented more than 98% of whole bacterial populations associated to carcass skin in all groups but their abundances were different between groups. Moraxellaceae and other degradative bacteria showed a significantly higher abundance in the control compared to the treated groups. Furthermore, Clostridium perfringens showed a relative frequency of abundance significantly higher in the group fed with phytase and Salmonella enterica in the group fed with phytase plus inositol. The results of this preliminary study showed that metagenome sequencing is suitable to investigate and monitor carcass microbiota in order to detect specific pathogenic and/or degradative populations.

  17. Isolation of xylose isomerases by sequence- and function-based screening from a soil metagenomic library

    Directory of Open Access Journals (Sweden)

    Parachin Nádia

    2011-05-01

    Full Text Available Abstract Background Xylose isomerase (XI catalyses the isomerisation of xylose to xylulose in bacteria and some fungi. Currently, only a limited number of XI genes have been functionally expressed in Saccharomyces cerevisiae, the microorganism of choice for lignocellulosic ethanol production. The objective of the present study was to search for novel XI genes in the vastly diverse microbial habitat present in soil. As the exploitation of microbial diversity is impaired by the ability to cultivate soil microorganisms under standard laboratory conditions, a metagenomic approach, consisting of total DNA extraction from a given environment followed by cloning of DNA into suitable vectors, was undertaken. Results A soil metagenomic library was constructed and two screening methods based on protein sequence similarity and enzyme activity were investigated to isolate novel XI encoding genes. These two screening approaches identified the xym1 and xym2 genes, respectively. Sequence and phylogenetic analyses revealed that the genes shared 67% similarity and belonged to different bacterial groups. When xym1 and xym2 were overexpressed in a xylA-deficient Escherichia coli strain, similar growth rates to those in which the Piromyces XI gene was expressed were obtained. However, expression in S. cerevisiae resulted in only one-fourth the growth rate of that obtained for the strain expressing the Piromyces XI gene. Conclusions For the first time, the screening of a soil metagenomic library in E. coli resulted in the successful isolation of two active XIs. However, the discrepancy between XI enzyme performance in E. coli and S. cerevisiae suggests that future screening for XI activity from soil should be pursued directly using yeast as a host.

  18. BioSimplify: an open source sentence simplification engine to improve recall in automatic biomedical information extraction

    OpenAIRE

    Jonnalagadda, Siddhartha; Gonzalez, Graciela

    2011-01-01

    BioSimplify is an open source tool written in Java that introduces and facilitates the use of a novel model for sentence simplification tuned for automatic discourse analysis and information extraction (as opposed to sentence simplification for improving human readability). The model is based on a "shot-gun" approach that produces many different (simpler) versions of the original sentence by combining variants of its constituent elements. This tool is optimized for processing biomedical scien...

  19. The BEL information extraction workflow (BELIEF): evaluation in the BioCreative V BEL and IAT track

    OpenAIRE

    Madan, Sumit; Hodapp, Sven; Senger, Philipp; Ansari, Sam; Szostak, Justyna; Hoeng, Julia; Peitsch, Manuel; Fluck, Juliane

    2016-01-01

    Network-based approaches have become extremely important in systems biology to achieve a better understanding of biological mechanisms. For network representation, the Biological Expression Language (BEL) is well designed to collate findings from the scientific literature into biological network models. To facilitate encoding and biocuration of such findings in BEL, a BEL Information Extraction Workflow (BELIEF) was developed. BELIEF provides a web-based curation interface, the BELIEF Dashboa...

  20. An Investigation of the Relationship Between Automated Machine Translation Evaluation Metrics and User Performance on an Information Extraction Task

    Science.gov (United States)

    2007-01-01

    more reliable than BLEU and that it is easier to understand in terms familiar to NLP researchers. 19 2.2.3 METEOR Researchers at Carnegie Mellon...essential elements of infor- mation from output generated by three types of Arabic -English MT engines. The information extraction experiment was one of three...reviewing the task hierarchy and examining the MT output of several engines. A small, prior pilot experiment to evaluate Arabic -English MT engines for

  1. Comparison of Qinzhou bay wetland landscape information extraction by three methods

    Directory of Open Access Journals (Sweden)

    X. Chang

    2014-04-01

    and OO is 219 km2, 193.70 km2, 217.40 km2 respectively. The result indicates that SC is in the f irst place, followed by OO approach, and the third DT method when used to extract Qingzhou Bay coastal wetland.

  2. Extracting topographic structure from digital elevation data for geographic information-system analysis

    Science.gov (United States)

    Jenson, Susan K.; Domingue, Julia O.

    1988-01-01

    Software tools have been developed at the U.S. Geological Survey's EROS Data Center to extract topographic structure and to delineate watersheds and overland flow paths from digital elevation models. The tools are specialpurpose FORTRAN programs interfaced with general-purpose raster and vector spatial analysis and relational data base management packages.

  3. Microbiota composition, gene pool and its expression in Gir cattle (Bos indicus) rumen under different forage diets using metagenomic and metatranscriptomic approaches.

    Science.gov (United States)

    Pandit, Ramesh J; Hinsu, Ankit T; Patel, Shriram H; Jakhesara, Subhash J; Koringa, Prakash G; Bruno, Fosso; Psifidi, Androniki; Shah, S V; Joshi, Chaitanya G

    2018-03-09

    Zebu (Bos indicus) is a domestic cattle species originating from the Indian subcontinent and now widely domesticated on several continents. In this study, we were particularly interested in understanding the functionally active rumen microbiota of an important Zebu breed, the Gir, under different dietary regimes. Metagenomic and metatranscriptomic data were compared at various taxonomic levels to elucidate the differential microbial population and its functional dynamics in Gir cattle rumen under different roughage dietary regimes. Different proportions of roughage rather than the type of roughage (dry or green) modulated microbiome composition and the expression of its gene pool. Fibre degrading bacteria (i.e. Clostridium, Ruminococcus, Eubacterium, Butyrivibrio, Bacillus and Roseburia) were higher in the solid fraction of rumen (Pcomparison of metagenomic shotgun and metatranscriptomic sequencing appeared to be a much richer source of information compared to conventional metagenomic analysis. Copyright © 2018 Elsevier GmbH. All rights reserved.

  4. Systematically extracting metal- and solvent-related occupational information from free-text responses to lifetime occupational history questionnaires.

    Science.gov (United States)

    Friesen, Melissa C; Locke, Sarah J; Tornow, Carina; Chen, Yu-Cheng; Koh, Dong-Hee; Stewart, Patricia A; Purdue, Mark; Colt, Joanne S

    2014-06-01

    Lifetime occupational history (OH) questionnaires often use open-ended questions to capture detailed information about study participants' jobs. Exposure assessors use this information, along with responses to job- and industry-specific questionnaires, to assign exposure estimates on a job-by-job basis. An alternative approach is to use information from the OH responses and the job- and industry-specific questionnaires to develop programmable decision rules for assigning exposures. As a first step in this process, we developed a systematic approach to extract the free-text OH responses and convert them into standardized variables that represented exposure scenarios. Our study population comprised 2408 subjects, reporting 11991 jobs, from a case-control study of renal cell carcinoma. Each subject completed a lifetime OH questionnaire that included verbatim responses, for each job, to open-ended questions including job title, main tasks and activities (task), tools and equipment used (tools), and chemicals and materials handled (chemicals). Based on a review of the literature, we identified exposure scenarios (occupations, industries, tasks/tools/chemicals) expected to involve possible exposure to chlorinated solvents, trichloroethylene (TCE) in particular, lead, and cadmium. We then used a SAS macro to review the information reported by study participants to identify jobs associated with each exposure scenario; this was done using previously coded standardized occupation and industry classification codes, and a priori lists of associated key words and phrases related to possibly exposed tasks, tools, and chemicals. Exposure variables representing the occupation, industry, and task/tool/chemicals exposure scenarios were added to the work history records of the study respondents. Our identification of possibly TCE-exposed scenarios in the OH responses was compared to an expert's independently assigned probability ratings to evaluate whether we missed identifying

  5. Metagenomic analysis of phosphorus removing sludgecommunities

    Energy Technology Data Exchange (ETDEWEB)

    Garcia Martin, Hector; Ivanova, Natalia; Kunin, Victor; Warnecke,Falk; Barry, Kerrie; McHardy, Alice C.; Yeates, Christine; He, Shaomei; Salamov, Asaf; Szeto, Ernest; Dalin, Eileen; Putnam, Nik; Shapiro, HarrisJ.; Pangilinan, Jasmyn L.; Rigoutsos, Isidore; Kyrpides, Nikos C.; Blackall, Linda Louise; McMahon, Katherine D.; Hugenholtz, Philip

    2006-02-01

    Enhanced Biological Phosphorus Removal (EBPR) is not wellunderstood at the metabolic level despite being one of the best-studiedmicrobially-mediated industrial processes due to its ecological andeconomic relevance. Here we present a metagenomic analysis of twolab-scale EBPR sludges dominated by the uncultured bacterium, "CandidatusAccumulibacter phosphatis." This analysis resolves several controversiesin EBPR metabolic models and provides hypotheses explaining the dominanceof A. phosphatis in this habitat, its lifestyle outside EBPR and probablecultivation requirements. Comparison of the same species from differentEBPR sludges highlights recent evolutionary dynamics in the A. phosphatisgenome that could be linked to mechanisms for environmental adaptation.In spite of an apparent lack of phylogenetic overlap in the flankingcommunities of the two sludges studied, common functional themes werefound, at least one of them complementary to the inferred metabolism ofthe dominant organism. The present study provides a much-needed blueprintfor a systems-level understanding of EBPR and illustrates thatmetagenomics enables detailed, often novel, insights into evenwell-studied biological systems.

  6. OTU analysis using metagenomic shotgun sequencing data.

    Directory of Open Access Journals (Sweden)

    Xiaolin Hao

    Full Text Available Because of technological limitations, the primer and amplification biases in targeted sequencing of 16S rRNA genes have veiled the true microbial diversity underlying environmental samples. However, the protocol of metagenomic shotgun sequencing provides 16S rRNA gene fragment data with natural immunity against the biases raised during priming and thus the potential of uncovering the true structure of microbial community by giving more accurate predictions of operational taxonomic units (OTUs. Nonetheless, the lack of statistically rigorous comparison between 16S rRNA gene fragments and other data types makes it difficult to interpret previously reported results using 16S rRNA gene fragments. Therefore, in the present work, we established a standard analysis pipeline that would help confirm if the differences in the data are true or are just due to potential technical bias. This pipeline is built by using simulated data to find optimal mapping and OTU prediction methods. The comparison between simulated datasets revealed a relationship between 16S rRNA gene fragments and full-length 16S rRNA sequences that a 16S rRNA gene fragment having a length >150 bp provides the same accuracy as a full-length 16S rRNA sequence using our proposed pipeline, which could serve as a good starting point for experimental design and making the comparison between 16S rRNA gene fragment-based and targeted 16S rRNA sequencing-based surveys possible.

  7. Unsupervised Two-Way Clustering of Metagenomic Sequences

    Directory of Open Access Journals (Sweden)

    Shruthi Prabhakara

    2012-01-01

    Full Text Available A major challenge facing metagenomics is the development of tools for the characterization of functional and taxonomic content of vast amounts of short metagenome reads. The efficacy of clustering methods depends on the number of reads in the dataset, the read length and relative abundances of source genomes in the microbial community. In this paper, we formulate an unsupervised naive Bayes multispecies, multidimensional mixture model for reads from a metagenome. We use the proposed model to cluster metagenomic reads by their species of origin and to characterize the abundance of each species. We model the distribution of word counts along a genome as a Gaussian for shorter, frequent words and as a Poisson for longer words that are rare. We employ either a mixture of Gaussians or mixture of Poissons to model reads within each bin. Further, we handle the high-dimensionality and sparsity associated with the data, by grouping the set of words comprising the reads, resulting in a two-way mixture model. Finally, we demonstrate the accuracy and applicability of this method on simulated and real metagenomes. Our method can accurately cluster reads as short as 100 bps and is robust to varying abundances, divergences and read lengths.

  8. Metagenomic analysis of permafrost microbial community response to thaw

    Energy Technology Data Exchange (ETDEWEB)

    Mackelprang, R.; Waldrop, M.P.; DeAngelis, K.M.; David, M.M.; Chavarria, K.L.; Blazewicz, S.J.; Rubin, E.M.; Jansson, J.K.

    2011-07-01

    We employed deep metagenomic sequencing to determine the impact of thaw on microbial phylogenetic and functional genes and related this data to measurements of methane emissions. Metagenomics, the direct sequencing of DNA from the environment, allows for the examination of whole biochemical pathways and associated processes, as opposed to individual pieces of the metabolic puzzle. Our metagenome analyses revealed that during transition from a frozen to a thawed state there were rapid shifts in many microbial, phylogenetic and functional gene abundances and pathways. After one week of incubation at 5°C, permafrost metagenomes converged to be more similar to each other than while they were frozen. We found that multiple genes involved in cycling of C and nitrogen shifted rapidly during thaw. We also constructed the first draft genome from a complex soil metagenome, which corresponded to a novel methanogen. Methane previously accumulated in permafrost was released during thaw and subsequently consumed by methanotrophic bacteria. Together these data point towards the importance of rapid cycling of methane and nitrogen in thawing permafrost.

  9. Meta-IDBA: a de Novo assembler for metagenomic data.

    Science.gov (United States)

    Peng, Yu; Leung, Henry C M; Yiu, S M; Chin, Francis Y L

    2011-07-01

    Next-generation sequencing techniques allow us to generate reads from a microbial environment in order to analyze the microbial community. However, assembling of a set of mixed reads from different species to form contigs is a bottleneck of metagenomic research. Although there are many assemblers for assembling reads from a single genome, there are no assemblers for assembling reads in metagenomic data without reference genome sequences. Moreover, the performances of these assemblers on metagenomic data are far from satisfactory, because of the existence of common regions in the genomes of subspecies and species, which make the assembly problem much more complicated. We introduce the Meta-IDBA algorithm for assembling reads in metagenomic data, which contain multiple genomes from different species. There are two core steps in Meta-IDBA. It first tries to partition the de Bruijn graph into isolated components of different species based on an important observation. Then, for each component, it captures the slight variants of the genomes of subspecies from the same species by multiple alignments and represents the genome of one species, using a consensus sequence. Comparison of the performances of Meta-IDBA and existing assemblers, such as Velvet and Abyss for different metagenomic datasets shows that Meta-IDBA can reconstruct longer contigs with similar accuracy. Meta-IDBA toolkit is available at our website http://www.cs.hku.hk/~alse/metaidba. chin@cs.hku.hk.

  10. MIDAS. An algorithm for the extraction of modal information from experimentally determined transfer functions

    International Nuclear Information System (INIS)

    Durrans, R.F.

    1978-12-01

    In order to design reactor structures to withstand the large flow and acoustic forces present it is necessary to know something of their dynamic properties. In many cases these properties cannot be predicted theoretically and it is necessary to determine them experimentally. The algorithm MIDAS (Modal Identification for the Dynamic Analysis of Structures) which has been developed at B.N.L. for extracting these structural properties from experimental data is described. (author)

  11. Strong spurious transcription likely contributes to DNA insert bias in typical metagenomic clone libraries.

    Science.gov (United States)

    Lam, Kathy N; Charles, Trevor C

    2015-01-01

    Clone libraries provide researchers with a powerful resource to study nucleic acid from diverse sources. Metagenomic clone libraries in particular have aided in studies of microbial biodiversity and function, and allowed the mining of novel enzymes. Libraries are often constructed by cloning large inserts into cosmid or fosmid vectors. Recently, there have been reports of GC bias in fosmid metagenomic libraries, and it was speculated to be a result of fragmentation and loss of AT-rich sequences during cloning. However, evidence in the literature suggests that transcriptional activity or gene product toxicity may play a role. To explore possible mechanisms responsible for sequence bias in clone libraries, we constructed a cosmid library from a human microbiome sample and sequenced DNA from different steps during library construction: crude extract DNA, size-selected DNA, and cosmid library DNA. We confirmed a GC bias in the final cosmid library, and we provide evidence that the bias is not due to fragmentation and loss of AT-rich sequences but is likely occurring after DNA is introduced into Escherichia coli. To investigate the influence of strong constitutive transcription, we searched the sequence data for promoters and found that rpoD/σ(70) promoter sequences were underrepresented in the cosmid library. Furthermore, when we examined the genomes of taxa that were differentially abundant in the cosmid library relative to the original sample, we found the bias to be more correlated with the number of rpoD/σ(70) consensus sequences in the genome than with simple GC content. The GC bias of metagenomic libraries does not appear to be due to DNA fragmentation. Rather, analysis of promoter sequences provides support for the hypothesis that strong constitutive transcription from sequences recognized as rpoD/σ(70) consensus-like in E. coli may lead to instability, causing loss of the plasmid or loss of the insert DNA that gives rise to the transcription. Despite

  12. Single-Cell-Genomics-Facilitated Read Binning of Candidate Phylum EM19 Genomes from Geothermal Spring Metagenomes.

    Science.gov (United States)

    Becraft, Eric D; Dodsworth, Jeremy A; Murugapiran, Senthil K; Ohlsson, J Ingemar; Briggs, Brandon R; Kanbar, Jad; De Vlaminck, Iwijn; Quake, Stephen R; Dong, Hailiang; Hedlund, Brian P; Swingley, Wesley D

    2016-02-15

    The vast majority of microbial life remains uncatalogued due to the inability to cultivate these organisms in the laboratory. This "microbial dark matter" represents a substantial portion of the tree of life and of the populations that contribute to chemical cycling in many ecosystems. In this work, we leveraged an existing single-cell genomic data set representing the candidate bacterial phylum "Calescamantes" (EM19) to calibrate machine learning algorithms and define metagenomic bins directly from pyrosequencing reads derived from Great Boiling Spring in the U.S. Great Basin. Compared to other assembly-based methods, taxonomic binning with a read-based machine learning approach yielded final assemblies with the highest predicted genome completeness of any method tested. Read-first binning subsequently was used to extract Calescamantes bins from all metagenomes with abundant Calescamantes populations, including metagenomes from Octopus Spring and Bison Pool in Yellowstone National Park and Gongxiaoshe Spring in Yunnan Province, China. Metabolic reconstruction suggests that Calescamantes are heterotrophic, facultative anaerobes, which can utilize oxidized nitrogen sources as terminal electron acceptors for respiration in the absence of oxygen and use proteins as their primary carbon source. Despite their phylogenetic divergence, the geographically separate Calescamantes populations were highly similar in their predicted metabolic capabilities and core gene content, respiring O2, or oxidized nitrogen species for energy conservation in distant but chemically similar hot springs. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  13. Phylogenetic and functional analysis of metagenome sequence from high-temperature archaeal habitats demonstrate linkages between metabolic potential and geochemistry

    Directory of Open Access Journals (Sweden)

    William P. Inskeep

    2013-05-01

    Full Text Available Geothermal habitats in Yellowstone National Park (YNP provide an unparalled opportunity to understand the environmental factors that control the distribution of archaea in thermal habitats. Here we describe, analyze and synthesize metagenomic and geochemical data collected from seven high-temperature sites that contain microbial communities dominated by archaea relative to bacteria. The specific objectives of the study were to use metagenome sequencing to determine the structure and functional capacity of thermophilic archaeal-dominated microbial communities across a pH range from 2.5 to 6.4 and to discuss specific examples where the metabolic potential correlated with measured environmental parameters and geochemical processes occurring in situ. Random shotgun metagenome sequence (~40-45 Mbase Sanger sequencing per site was obtained from environmental DNA extracted from high-temperature sediments and/or microbial mats and subjected to numerous phylogenetic and functional analyses. Analysis of individual sequences (e.g., MEGAN and G+C content and assemblies from each habitat type revealed the presence of dominant archaeal populations in all environments, 10 of whose genomes were largely reconstructed from the sequence data. Analysis of protein family occurrence, particularly of those involved in energy conservation, electron transport and autotrophic metabolism, revealed significant differences in metabolic strategies across sites consistent with differences in major geochemical attributes (e.g., sulfide, oxygen, pH. These observations provide an ecological basis for understanding the distribution of indigenous archaeal lineages across high temperature systems of YNP.

  14. RNA viral metagenome of whiteflies leads to the discovery and characterization of a whitefly-transmitted carlavirus in North America.

    Science.gov (United States)

    Rosario, Karyna; Capobianco, Heather; Ng, Terry Fei Fan; Breitbart, Mya; Polston, Jane E

    2014-01-01

    Whiteflies from the Bemisia tabaci species complex have the ability to transmit a large number of plant viruses and are some of the most detrimental pests in agriculture. Although whiteflies are known to transmit both DNA and RNA viruses, most of the diversity has been recorded for the former, specifically for the Begomovirus genus. This study investigated the total diversity of DNA and RNA viruses found in whiteflies collected from a single site in Florida to evaluate if there are additional, previously undetected viral types within the B. tabaci vector. Metagenomic analysis of viral DNA extracted from the whiteflies only resulted in the detection of begomoviruses. In contrast, whiteflies contained sequences similar to RNA viruses from divergent groups, with a diversity that extends beyond currently described viruses. The metagenomic analysis of whiteflies also led to the first report of a whitefly-transmitted RNA virus similar to Cowpea mild mottle virus (CpMMV Florida) (genus Carlavirus) in North America. Further investigation resulted in the detection of CpMMV Florida in native and cultivated plants growing near the original field site of whitefly collection and determination of its experimental host range. Analysis of complete CpMMV Florida genomes recovered from whiteflies and plants suggests that the current classification criteria for carlaviruses need to be reevaluated. Overall, metagenomic analysis supports that DNA plant viruses carried by B. tabaci are dominated by begomoviruses, whereas significantly less is known about RNA viruses present in this damaging insect vector.

  15. RNA viral metagenome of whiteflies leads to the discovery and characterization of a whitefly-transmitted carlavirus in North America.

    Directory of Open Access Journals (Sweden)

    Karyna Rosario

    Full Text Available Whiteflies from the Bemisia tabaci species complex have the ability to transmit a large number of plant viruses and are some of the most detrimental pests in agriculture. Although whiteflies are known to transmit both DNA and RNA viruses, most of the diversity has been recorded for the former, specifically for the Begomovirus genus. This study investigated the total diversity of DNA and RNA viruses found in whiteflies collected from a single site in Florida to evaluate if there are additional, previously undetected viral types within the B. tabaci vector. Metagenomic analysis of viral DNA extracted from the whiteflies only resulted in the detection of begomoviruses. In contrast, whiteflies contained sequences similar to RNA viruses from divergent groups, with a diversity that extends beyond currently described viruses. The metagenomic analysis of whiteflies also led to the first report of a whitefly-transmitted RNA virus similar to Cowpea mild mottle virus (CpMMV Florida (genus Carlavirus in North America. Further investigation resulted in the detection of CpMMV Florida in native and cultivated plants growing near the original field site of whitefly collection and determination of its experimental host range. Analysis of complete CpMMV Florida genomes recovered from whiteflies and plants suggests that the current classification criteria for carlaviruses need to be reevaluated. Overall, metagenomic analysis supports that DNA plant viruses carried by B. tabaci are dominated by begomoviruses, whereas significantly less is known about RNA viruses present in this damaging insect vector.

  16. Extracting Information about the Initial State from the Black Hole Radiation.

    Science.gov (United States)

    Lochan, Kinjalk; Padmanabhan, T

    2016-02-05

    The crux of the black hole information paradox is related to the fact that the complete information about the initial state of a quantum field in a collapsing spacetime is not available to future asymptotic observers, belying the expectations from a unitary quantum theory. We study the imprints of the initial quantum state contained in a specific class of distortions of the black hole radiation and identify the classes of in states that can be partially or fully reconstructed from the information contained within. Even for the general in state, we can uncover some specific information. These results suggest that a classical collapse scenario ignores this richness of information in the resulting spectrum and a consistent quantum treatment of the entire collapse process might allow us to retrieve much more information from the spectrum of the final radiation.

  17. Metagenomic profiling of microbial composition and antibiotic resistance determinants in Puget Sound.

    Science.gov (United States)

    Port, Jesse A; Wallace, James C; Griffith, William C; Faustman, Elaine M

    2012-01-01

    Human-health relevant impacts on marine ecosystems are increasing on both spatial and temporal scales. Traditional indicators for environmental health monitoring and microbial risk assessment have relied primarily on single species analyses and have provided only limited spatial and temporal information. More high-throughput, broad-scale approaches to evaluate these impacts are therefore needed to provide a platform for informing public health. This study uses shotgun metagenomics to survey the taxonomic composition and antibiotic resistance determinant content of surface water bacterial communities in the Puget Sound estuary. Metagenomic DNA was collected at six sites in Puget Sound in addition to one wastewater treatment plant (WWTP) that discharges into the Sound and pyrosequenced. A total of ~550 Mbp (1.4 million reads) were obtained, 22 Mbp of which could be assembled into contigs. While the taxonomic and resistance determinant profiles across the open Sound samples were similar, unique signatures were identified when comparing these profiles across the open Sound, a nearshore marina and WWTP effluent. The open Sound was dominated by α-Proteobacteria (in particular Rhodobacterales sp.), γ-Proteobacteria and Bacteroidetes while the marina and effluent had increased abundances of Actinobacteria, β-Proteobacteria and Firmicutes. There was a significant increase in the antibiotic resistance gene signal from the open Sound to marina to WWTP effluent, suggestive of a potential link to human impacts. Mobile genetic elements associated with environmental and pathogenic bacteria were also differentially abundant across the samples. This study is the first comparative metagenomic survey of Puget Sound and provides baseline data for further assessments of community composition and antibiotic resistance determinants in the environment using next generation sequencing technologies. In addition, these genomic signals of potential human impact can be used to guide initial

  18. Metagenomic profiling of microbial composition and antibiotic resistance determinants in Puget Sound.

    Directory of Open Access Journals (Sweden)

    Jesse A Port

    Full Text Available Human-health relevant impacts on marine ecosystems are increasing on both spatial and temporal scales. Traditional indicators for environmental health monitoring and microbial risk assessment have relied primarily on single species analyses and have provided only limited spatial and temporal information. More high-throughput, broad-scale approaches to evaluate these impacts are therefore needed to provide a platform for informing public health. This study uses shotgun metagenomics to survey the taxonomic composition and antibiotic resistance determinant content of surface water bacterial communities in the Puget Sound estuary. Metagenomic DNA was collected at six sites in Puget Sound in addition to one wastewater treatment plant (WWTP that discharges into the Sound and pyrosequenced. A total of ~550 Mbp (1.4 million reads were obtained, 22 Mbp of which could be assembled into contigs. While the taxonomic and resistance determinant profiles across the open Sound samples were similar, unique signatures were identified when comparing these profiles across the open Sound, a nearshore marina and WWTP effluent. The open Sound was dominated by α-Proteobacteria (in particular Rhodobacterales sp., γ-Proteobacteria and Bacteroidetes while the marina and effluent had increased abundances of Actinobacteria, β-Proteobacteria and Firmicutes. There was a significant increase in the antibiotic resistance gene signal from the open Sound to marina to WWTP effluent, suggestive of a potential link to human impacts. Mobile genetic elements associated with environmental and pathogenic bacteria were also differentially abundant across the samples. This study is the first comparative metagenomic survey of Puget Sound and provides baseline data for further assessments of community composition and antibiotic resistance determinants in the environment using next generation sequencing technologies. In addition, these genomic signals of potential human impact can be used

  19. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes

    Science.gov (United States)

    Parks, Donovan H.; Imelfort, Michael; Skennerton, Connor T.; Hugenholtz, Philip; Tyson, Gene W.

    2015-01-01

    Large-scale recovery of genomes from isolates, single cells, and metagenomic data has been made possible by advances in computational methods and substantial reductions in sequencing costs. Although this increasing breadth of draft genomes is providing key information regarding the evolutionary and functional diversity of microbial life, it has become impractical to finish all available reference genomes. Making robust biological inferences from draft genomes requires accurate estimates of their completeness and contamination. Current methods for assessing genome quality are ad hoc and generally make use of a limited number of “marker” genes conserved across all bacterial or archaeal genomes. Here we introduce CheckM, an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes. We demonstrate the effectiveness of CheckM using synthetic data and a wide range of isolate-, single-cell-, and metagenome-derived genomes. CheckM is shown to provide accurate estimates of genome completeness and contamination and to outperform existing approaches. Using CheckM, we identify a diverse range of errors currently impacting publicly available isolate genomes and demonstrate that genomes obtained from single cells and metagenomic data vary substantially in quality. In order to facilitate the use of draft genomes, we propose an objective measure of genome quality that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities. PMID:25977477

  20. Point Cloud Classification of Tesserae from Terrestrial Laser Data Combined with Dense Image Matching for Archaeological Information Extraction

    Science.gov (United States)

    Poux, F.; Neuville, R.; Billen, R.

    2017-08-01

    Reasoning from information extraction given by point cloud data mining allows contextual adaptation and fast decision making. However, to achieve this perceptive level, a point cloud must be semantically rich, retaining relevant information for the end user. This paper presents an automatic knowledge-based method for pre-processing multi-sensory data and classifying a hybrid point cloud from both terrestrial laser scanning and dense image matching. Using 18 features including sensor's biased data, each tessera in the high-density point cloud from the 3D captured complex mosaics of Germigny-des-prés (France) is segmented via a colour multi-scale abstraction-based featuring extracting connectivity. A 2D surface and outline polygon of each tessera is generated by a RANSAC plane extraction and convex hull fitting. Knowledge is then used to classify every tesserae based on their size, surface, shape, material properties and their neighbour's class. The detection and semantic enrichment method shows promising results of 94% correct semantization, a first step toward the creation of an archaeological smart point cloud.

  1. Isolation and characterization of novel lipases/esterases from a bovine rumen metagenome.

    Science.gov (United States)

    Privé, Florence; Newbold, C Jamie; Kaderbhai, Naheed N; Girdwood, Susan G; Golyshina, Olga V; Golyshin, Peter N; Scollan, Nigel D; Huws, Sharon A

    2015-07-01

    Improving the health beneficial fatty acid content of meat and milk is a major challenge requiring an increased understanding of rumen lipid metabolism. In this study, we isolated and characterized rumen bacterial lipases/esterases using functional metagenomics. Metagenomic libraries were constructed from DNA extracted from strained rumen fluid (SRF), solid-attached bacteria (SAB) and liquid-associated rumen bacteria (LAB), ligated into a fosmid vector and subsequently transformed into an Escherichia coli host. Fosmid libraries consisted of 7,744; 8,448; and 7,680 clones with an average insert size of 30 to 35 kbp for SRF, SAB and LAB, respectively. Transformants were screened on spirit blue agar plates containing tributyrin for lipase/esterase activity. Five SAB and four LAB clones exhibited lipolytic activity, and no positive clones were found in the SRF library. Fosmids from positive clones were pyrosequenced and twelve putative lipase/esterase genes and two phospholipase genes retrieved. Although the derived proteins clustered into diverse esterase and lipase families, a degree of novelty was seen, with homology ranging from 40 to 78% following BlastP searches. Isolated lipases/esterases exhibited activity against mostly short- to medium-chain substrates across a range of temperatures and pH. The function of these novel enzymes recovered in ruminal metabolism needs further investigation, alongside their potential industrial uses.

  2. Metagenomic survey of bacterial diversity in the atmosphere of Mexico City using different sampling methods.

    Science.gov (United States)

    Serrano-Silva, N; Calderón-Ezquerro, M C

    2018-04-01

    The identification of airborne bacteria has traditionally been performed by retrieval in culture media, but the bacterial diversity in the air is underestimated using this method because many bacteria are not readily cultured. Advances in DNA sequencing technology have produced a broad knowledge of genomics and metagenomics, which can greatly improve our ability to identify and study the diversity of airborne bacteria. However, researchers are facing several challenges, particularly the efficient retrieval of low-density microorganisms from the air and the lack of standardized protocols for sample collection and processing. In this study, we tested three methods for sampling bioaerosols - a Durham-type spore trap (Durham), a seven-day recording volumetric spore trap (HST), and a high-throughput 'Jet' spore and particle sampler (Jet) - and recovered metagenomic DNA for 16S rDNA sequencing. Samples were simultaneously collected with the three devices during one week, and the sequencing libraries were analyzed. A simple and efficient method for collecting bioaerosols and extracting good quality DNA for high-throughput sequencing was standardized. The Durham sampler collected preferentially Cyanobacteria, the HST Actinobacteria, Proteobacteria and Firmicutes, and the Jet mainly Proteobacteria and Firmicutes. The HST sampler collected the largest amount of airborne bacterial diversity. More experiments are necessary to select the right sampler, depending on study objectives, which may require monitoring and collecting specific airborne bacteria. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. Metagenomic Survey of Viral Diversity Obtained from Feces of Subantarctic and South American Fur Seals.

    Directory of Open Access Journals (Sweden)

    Mariana Kluge

    Full Text Available The Brazilian South coast seasonally hosts numerous marine species, observed particularly during winter months. Some animals, including fur seals, are found dead or debilitated along the shore and may harbor potential pathogens within their microbiota. In the present study, a metagenomic approach was performed to evaluate the viral diversity in feces of fur seals found deceased along the coast of the state of Rio Grande do Sul. The fecal virome of two fur seal species was characterized: the South American fur seal (Arctocephalus australis and the Subantarctic fur seal (Arctocephalus tropicalis. Fecal samples from 10 specimens (A. australis, n = 5; A. tropicalis, n = 5 were collected and viral particles were purified, extracted and amplified with a random PCR. The products were sequenced through Ion Torrent and Illumina platforms and assembled reads were submitted to BLASTx searches. Both viromes were dominated by bacteriophages and included a number of potentially novel virus genomes. Sequences of picobirnaviruses, picornaviruses and a hepevirus-like were identified in A. australis. A rotavirus related to group C, a novel member of the Sakobuvirus and a sapovirus very similar to California sea lion sapovirus 1 were found in A. tropicalis. Additionally, sequences of members of the Anelloviridae and Parvoviridae families were detected in both fur seal species. This is the first metagenomic study to screen the fecal virome of fur seals, contributing to a better understanding of the complexity of the viral community present in the intestinal microbiota of these animals.

  4. The potential of viral metagenomics in blood transfusion safety.

    Science.gov (United States)

    Sauvage, V; Gomez, J; Boizeau, L; Laperche, S

    2017-09-01

    Thanks to the significant advent of high throughput sequencing in the last ten years, it is now possible via metagenomics to define the spectrum of the microbial sequences present in human blood samples. Therefore, metagenomics sequencing appears as a promising approach for the identification and global surveillance of new, emerging and/or unexpected viruses that could impair blood transfusion safety. However, despite considerable advantages compared to the traditional methods of pathogen identification, this non-targeted approach presents several drawbacks including a lack of sensitivity and sequence contaminant issues. With further improvements, especially to increase sensitivity, metagenomics sequencing should become in a near future an additional diagnostic tool in infectious disease field and especially in blood transfusion safety. Copyright © 2017 Elsevier Masson SAS. All rights reserved.

  5. Functional Metagenomic Investigations of the Human Intestinal Microbiota

    DEFF Research Database (Denmark)

    Moore, Aimee M.; Munck, Christian; Sommer, Morten Otto Alexander

    2011-01-01

    The human intestinal microbiota encode multiple critical functions impacting human health, including metabolism of dietary substrate, prevention of pathogen invasion, immune system modulation, and provision of a reservoir of antibiotic resistance genes accessible to pathogens. The complexity...... microorganisms, but relatively recently applied to the study of the human commensal microbiota. Metagenomic functional screens characterize the functional capacity of a microbial community, independent of identity to known genes, by subjecting the metagenome to functional assays in a genetically tractable host....... Here we highlight recent work applying this technique to study the functional diversity of the intestinal microbiota, and discuss how an approach combining high-throughput sequencing, cultivation, and metagenomic functional screens can improve our understanding of interactions between this complex...

  6. Lung region extraction based on the model information and the inversed MIP method by using chest CT images

    International Nuclear Information System (INIS)

    Tomita, Toshihiro; Miguchi, Ryosuke; Okumura, Toshiaki; Yamamoto, Shinji; Matsumoto, Mitsuomi; Tateno, Yukio; Iinuma, Takeshi; Matsumoto, Toru.

    1997-01-01

    We developed a lung region extraction method based on the model information and the inversed MIP method in the Lung Cancer Screening CT (LSCT). Original model is composed of typical 3-D lung contour lines, a body axis, an apical point, and a convex hull. First, the body axis. the apical point, and the convex hull are automatically extracted from the input image Next, the model is properly transformed to fit to those of input image by the affine transformation. Using the same affine transformation coefficients, typical lung contour lines are also transferred, which correspond to rough contour lines of input image. Experimental results applied for 68 samples showed this method quite promising. (author)

  7. BioSimplify: an open source sentence simplification engine to improve recall in automatic biomedical information extraction.

    Science.gov (United States)

    Jonnalagadda, Siddhartha; Gonzalez, Graciela

    2010-11-13

    BioSimplify is an open source tool written in Java that introduces and facilitates the use of a novel model for sentence simplification tuned for automatic discourse analysis and information extraction (as opposed to sentence simplification for improving human readability). The model is based on a "shot-gun" approach that produces many different (simpler) versions of the original sentence by combining variants of its constituent elements. This tool is optimized for processing biomedical scientific literature such as the abstracts indexed in PubMed. We tested our tool on its impact to the task of PPI extraction and it improved the f-score of the PPI tool by around 7%, with an improvement in recall of around 20%. The BioSimplify tool and test corpus can be downloaded from https://biosimplify.sourceforge.net.

  8. Unsupervised improvement of named entity extraction in short informal context using disambiguation clues

    NARCIS (Netherlands)

    Habib, Mena Badieh; van Keulen, Maurice

    2012-01-01

    Short context messages (like tweets and SMS’s) are a potentially rich source of continuously and instantly updated information. Shortness and informality of such messages are challenges for Natural Language Processing tasks. Most efforts done in this direction rely on machine learning techniques

  9. Automated Methods to Extract Patient New Information from Clinical Notes in Electronic Health Record Systems

    Science.gov (United States)

    Zhang, Rui

    2013-01-01

    The widespread adoption of Electronic Health Record (EHR) has resulted in rapid text proliferation within clinical care. Clinicians' use of copying and pasting functions in EHR systems further compounds this by creating a large amount of redundant clinical information in clinical documents. A mixture of redundant information (especially outdated…

  10. Extracting principles for information management adaptability during crisis response : A dynamic capability view

    NARCIS (Netherlands)

    Bharosa, N.; Janssen, M.F.W.H.A.

    2010-01-01

    During crises, relief agency commanders have to make decisions in a complex and uncertain environment, requiring them to continuously adapt to unforeseen environmental changes. In the process of adaptation, the commanders depend on information management systems for information. Yet there are still

  11. Extracting protein dynamics information from overlapped NMR signals using relaxation dispersion difference NMR spectroscopy.

    Science.gov (United States)

    Konuma, Tsuyoshi; Harada, Erisa; Sugase, Kenji

    2015-12-01

    Protein dynamics plays important roles in many biological events, such as ligand binding and enzyme reactions. NMR is mostly used for investigating such protein dynamics in a site-specific manner. Recently, NMR has been actively applied to large proteins and intrinsically disordered proteins, which are attractive research targets. However, signal overlap, which is often observed for such proteins, hampers accurate analysis of NMR data. In this study, we have developed a new methodology called relaxation dispersion difference that can extract conformational exchange parameters from overlapped NMR signals measured using relaxation dispersion spectroscopy. In relaxation dispersion measurements, the signal intensities of fluctuating residues vary according to the Carr-Purcell-Meiboon-Gill pulsing interval, whereas those of non-fluctuating residues are constant. Therefore, subtraction of each relaxation dispersion spectrum from that with the highest signal intensities, measured at the shortest pulsing interval, leaves only the signals of the fluctuating residues. This is the principle of the relaxation dispersion difference method. This new method enabled us to extract exchange parameters from overlapped signals of heme oxygenase-1, which is a relatively large protein. The results indicate that the structural flexibility of a kink in the heme-binding site is important for efficient heme binding. Relaxation dispersion difference requires neither selectively labeled samples nor modification of pulse programs; thus it will have wide applications in protein dynamics analysis.

  12. Extracting protein dynamics information from overlapped NMR signals using relaxation dispersion difference NMR spectroscopy

    Energy Technology Data Exchange (ETDEWEB)

    Konuma, Tsuyoshi [Icahn School of Medicine at Mount Sinai, Department of Structural and Chemical Biology (United States); Harada, Erisa [Suntory Foundation for Life Sciences, Bioorganic Research Institute (Japan); Sugase, Kenji, E-mail: sugase@sunbor.or.jp, E-mail: sugase@moleng.kyoto-u.ac.jp [Kyoto University, Department of Molecular Engineering, Graduate School of Engineering (Japan)

    2015-12-15

    Protein dynamics plays important roles in many biological events, such as ligand binding and enzyme reactions. NMR is mostly used for investigating such protein dynamics in a site-specific manner. Recently, NMR has been actively applied to large proteins and intrinsically disordered proteins, which are attractive research targets. However, signal overlap, which is often observed for such proteins, hampers accurate analysis of NMR data. In this study, we have developed a new methodology called relaxation dispersion difference that can extract conformational exchange parameters from overlapped NMR signals measured using relaxation dispersion spectroscopy. In relaxation dispersion measurements, the signal intensities of fluctuating residues vary according to the Carr-Purcell-Meiboon-Gill pulsing interval, whereas those of non-fluctuating residues are constant. Therefore, subtraction of each relaxation dispersion spectrum from that with the highest signal intensities, measured at the shortest pulsing interval, leaves only the signals of the fluctuating residues. This is the principle of the relaxation dispersion difference method. This new method enabled us to extract exchange parameters from overlapped signals of heme oxygenase-1, which is a relatively large protein. The results indicate that the structural flexibility of a kink in the heme-binding site is important for efficient heme binding. Relaxation dispersion difference requires neither selectively labeled samples nor modification of pulse programs; thus it will have wide applications in protein dynamics analysis.

  13. Wavelet analysis of molecular dynamics: Efficient extraction of time-frequency information in ultrafast optical processes

    International Nuclear Information System (INIS)

    Prior, Javier; Castro, Enrique; Chin, Alex W.; Almeida, Javier; Huelga, Susana F.; Plenio, Martin B.

    2013-01-01

    New experimental techniques based on nonlinear ultrafast spectroscopies have been developed over the last few years, and have been demonstrated to provide powerful probes of quantum dynamics in different types of molecular aggregates, including both natural and artificial light harvesting complexes. Fourier transform-based spectroscopies have been particularly successful, yet “complete” spectral information normally necessitates the loss of all information on the temporal sequence of events in a signal. This information though is particularly important in transient or multi-stage processes, in which the spectral decomposition of the data evolves in time. By going through several examples of ultrafast quantum dynamics, we demonstrate that the use of wavelets provide an efficient and accurate way to simultaneously acquire both temporal and frequency information about a signal, and argue that this greatly aids the elucidation and interpretation of physical process responsible for non-stationary spectroscopic features, such as those encountered in coherent excitonic energy transport

  14. Extracting information from an ensemble of GCMs to reliably assess future global runoff change

    NARCIS (Netherlands)

    Sperna Weiland, F.C.; Beek, L.P.H. van; Weerts, A.H.; Bierkens, M.F.P.

    2011-01-01

    Future runoff projections derived from different global climate models (GCMs) show large differences. Therefore, within this study the, information from multiple GCMs has been combined to better assess hydrological changes. For projections of precipitation and temperature the Reliability ensemble

  15. Investigation of the Impact of Extracting and Exchanging Health Information by Using Internet and Social Networks.

    Science.gov (United States)

    Pistolis, John; Zimeras, Stelios; Chardalias, Kostas; Roupa, Zoe; Fildisis, George; Diomidous, Marianna

    2016-06-01

    Social networks (1) have been embedded in our daily life for a long time. They constitute a powerful tool used nowadays for both searching and exchanging information on different issues by using Internet searching engines (Google, Bing, etc.) and Social Networks (Facebook, Twitter etc.). In this paper, are presented the results of a research based on the frequency and the type of the usage of the Internet and the Social Networks by the general public and the health professionals. The objectives of the research were focused on the investigation of the frequency of seeking and meticulously searching for health information in the social media by both individuals and health practitioners. The exchanging of information is a procedure that involves the issues of reliability and quality of information. In this research, by using advanced statistical techniques an effort is made to investigate the participant's profile in using social networks for searching and exchanging information on health issues. Based on the answers 93 % of the people, use the Internet to find information on health-subjects. Considering principal component analysis, the most important health subjects were nutrition (0.719 %), respiratory issues (0.79 %), cardiological issues (0.777%), psychological issues (0.667%) and total (73.8%). The research results, based on different statistical techniques revealed that the 61.2% of the males and 56.4% of the females intended to use the social networks for searching medical information. Based on the principal components analysis, the most important sources that the participants mentioned, were the use of the Internet and social networks for exchanging information on health issues. These sources proved to be of paramount importance to the participants of the study. The same holds for nursing, medical and administrative staff in hospitals.

  16. A function-based screen for seeking RubisCO active clones from metagenomes: novel enzymes influencing RubisCO activity.

    Science.gov (United States)

    Böhnke, Stefanie; Perner, Mirjam

    2015-03-01

    Ribulose-1,5-bisphosphate carboxylase/oxygenase (RubisCO) is a key enzyme of the Calvin cycle, which is responsible for most of Earth's primary production. Although research on RubisCO genes and enzymes in plants, cyanobacteria and bacteria has been ongoing for years, still little is understood about its regulation and activation in bacteria. Even more so, hardly any information exists about the function of metagenomic RubisCOs and the role of the enzymes encoded on the flanking DNA owing to the lack of available function-based screens for seeking active RubisCOs from the environment. Here we present the first solely activity-based approach for identifying RubisCO active fosmid clones from a metagenomic library. We constructed a metagenomic library from hydrothermal vent fluids and screened 1056 fosmid clones. Twelve clones exhibited RubisCO activity and the metagenomic fragments resembled genes from Thiomicrospira crunogena. One of these clones was further analyzed. It contained a 35.2 kb metagenomic insert carrying the RubisCO gene cluster and flanking DNA regions. Knockouts of twelve genes and two intergenic regions on this metagenomic fragment demonstrated that the RubisCO activity was significantly impaired and was attributed to deletions in genes encoding putative transcriptional regulators and those believed to be vital for RubisCO activation. Our new technique revealed a novel link between a poorly characterized gene and RubisCO activity. This screen opens the door to directly investigating RubisCO genes and respective enzymes from environmental samples.

  17. Amplitude extraction in pseudoscalar-meson photoproduction: towards a situation of complete information

    International Nuclear Information System (INIS)

    Nys, Jannes; Vrancx, Tom; Ryckebusch, Jan

    2015-01-01

    A complete set for pseudoscalar-meson photoproduction is a minimum set of observables from which one can determine the underlying reaction amplitudes unambiguously. The complete sets considered in this work involve single- and double-polarization observables. It is argued that for extracting amplitudes from data, the transversity representation of the reaction amplitudes offers advantages over alternate representations. It is shown that with the available single-polarization data for the p(γ,K + )Λ reaction, the energy and angular dependence of the moduli of the normalized transversity amplitudes in the resonance region can be determined to a fair accuracy. Determining the relative phases of the amplitudes from double-polarization observables is far less evident. (paper)

  18. Metagenomic analysis of bacterial community structure and diversity of lignocellulolytic bacteria in Vietnamese native goat rumen.

    Science.gov (United States)

    Do, Thi Huyen; Dao, Trong Khoa; Nguyen, Khanh Hoang Viet; Le, Ngoc Giang; Nguyen, Thi Mai Phuong; Le, Tung Lam; Phung, Thu Nguyet; van Straalen, Nico M; Roelofs, Dick; Truong, Nam Hai

    2018-05-01

    In a previous study, analysis of Illumina sequenced metagenomic DNA data of bacteria in Vietnamese goats' rumen showed a high diversity of putative lignocellulolytic genes. In this study, taxonomy speculation of microbial community and lignocellulolytic bacteria population in the rumen was conducted to elucidate a role of bacterial structure for effective degradation of plant materials. The metagenomic data had been subjected into Basic Local Alignment Search Tool (BLASTX) algorithm and the National Center for Biotechnology Information non-redundant sequence database. Here the BLASTX hits were further processed by the Metagenome Analyzer program to statistically analyze the abundance of taxa. Microbial community in the rumen is defined by dominance of Bacteroidetes compared to Firmicutes. The ratio of Firmicutes versus Bacteroidetes was 0.36:1. An abundance of Synergistetes was uniquely identified in the goat microbiome may be formed by host genotype. With regard to bacterial lignocellulose degraders, the ratio of lignocellulolytic genes affiliated with Firmicutes compared to the genes linked to Bacteroidetes was 0.11:1, in which the genes encoding putative hemicellulases, carbohydrate esterases, polysaccharide lyases originated from Bacteroidetes were 14 to 20 times higher than from Firmicutes. Firmicutes seem to possess more cellulose hydrolysis capacity showing a Firmicutes/Bacteroidetes ratio of 0.35:1. Analysis of lignocellulolytic potential degraders shows that four species belonged to Bacteroidetes phylum, while two species belonged to Firmicutes phylum harbouring at least 12 different catalytic domains for all lignocellulose pretreatment, cellulose, as well as hemicellulose saccharification. Based on these findings, we speculate that increasing the members of Bacteroidetes to keep a low ratio of Firmicutes versus Bacteroidetes in goat rumen has resulted most likely in an increased lignocellulose digestion.

  19. An artificial functional family filter in homolog searching in next-generation sequencing metagenomics.

    Directory of Open Access Journals (Sweden)

    Ruofei Du

    Full Text Available In functional metagenomics, BLAST homology search is a common method to classify metagenomic reads into protein/domain sequence families such as Clusters of Orthologous Groups of proteins (COGs in order to quantify the abundance of each COG in the community. The resulting functional profile of the community is then used in downstream analysis to correlate the change in abundance to environmental perturbation, clinical variation, and so on. However, the short read length coupled with next-generation sequencing technologies poses a barrier in this approach, essentially because similarity significance cannot be discerned by searching with short reads. Consequently, artificial functional families are produced, in which those with a large number of reads assigned decreases the accuracy of functional profile dramatically. There is no method available to address this problem. We intended to fill this gap in this paper. We revealed that BLAST similarity scores of homologues for short reads from COG protein members coding sequences are distributed differently from the scores of those derived elsewhere. We showed that, by choosing an appropriate score cut-off, we are able to filter out most artificial families and simultaneously to preserve sufficient information in order to build the functional profile. We also showed that, by incorporated application of BLAST and RPS-BLAST, some artificial families with large read counts can be further identified after the score cutoff filtration. Evaluated on three experimental metagenomic datasets with different coverages, we found that the proposed method is robust against read coverage and consistently outperforms the other E-value cutoff methods currently used in literatures.

  20. The Analysis of Tree Species Distribution Information Extraction and Landscape Pattern Based on Remote Sensing Images

    Directory of Open Access Journals (Sweden)

    Yi Zeng

    2017-08-01

    Full Text Available The forest ecosystem is the largest land vegetation type, which plays the role of unreplacement with its unique value. And in the landscape scale, the research on forest landscape pattern has become the current hot spot, wherein the study of forest canopy structure is very important. They determines the process and the strength of forests energy flow, which influences the adjustments of ecosystem for climate and species diversity to some extent. The extraction of influencing factors of canopy structure and the analysis of the vegetation distribution pattern are especially important. To solve the problems, remote sensing technology, which is superior to other technical means because of its fine timeliness and large-scale monitoring, is applied to the study. Taking Lingkong Mountain as the study area, the paper uses the remote sensing image to analyze the forest distribution pattern and obtains the spatial characteristics of canopy structure distribution, and DEM data are as the basic data to extract the influencing factors of canopy structure. In this paper, pattern of trees distribution is further analyzed by using terrain parameters, spatial analysis tools and surface processes quantitative simulation. The Hydrological Analysis tool is used to build distributed hydrological model, and corresponding algorithm is applied to determine surface water flow path, rivers network and basin boundary. Results show that forest vegetation distribution of dominant tree species present plaque on the landscape scale and their distribution have spatial heterogeneity which is related to terrain factors closely. After the overlay analysis of aspect, slope and forest distribution pattern respectively, the most suitable area for stand growth and the better living condition are obtained.

  1. Linking attentional processes and conceptual problem solving: visual cues facilitate the automaticity of extracting relevant information from diagrams.

    Science.gov (United States)

    Rouinfar, Amy; Agra, Elise; Larson, Adam M; Rebello, N Sanjay; Loschky, Lester C

    2014-01-01

    This study investigated links between visual attention processes and conceptual problem solving. This was done by overlaying visual cues on conceptual physics problem diagrams to direct participants' attention to relevant areas to facilitate problem solving. Participants (N = 80) individually worked through four problem sets, each containing a diagram, while their eye movements were recorded. Each diagram contained regions that were relevant to solving the problem correctly and separate regions related to common incorrect responses. Problem sets contained an initial problem, six isomorphic training problems, and a transfer problem. The cued condition saw visual cues overlaid on the training problems. Participants' verbal responses were used to determine their accuracy. This study produced two major findings. First, short duration visual cues which draw attention to solution-relevant information and aid in the organizing and integrating of it, facilitate both immediate problem solving and generalization of that ability to new problems. Thus, visual cues can facilitate re-representing a problem and overcoming impasse, enabling a correct solution. Importantly, these cueing effects on problem solving did not involve the solvers' attention necessarily embodying the solution to the problem, but were instead caused by solvers attending to and integrating relevant information in the problems into a solution path. Second, this study demonstrates that when such cues are used across multiple problems, solvers can automatize the extraction of problem-relevant information extraction. These results suggest that low-level attentional selection processes provide a necessary gateway for relevant information to be used in problem solving, but are generally not sufficient for correct problem solving. Instead, factors that lead a solver to an impasse and to organize and integrate problem information also greatly facilitate arriving at correct solutions.

  2. The Application of Chinese High-Spatial Remote Sensing Satellite Image in Land Law Enforcement Information Extraction

    Science.gov (United States)

    Wang, N.; Yang, R.

    2018-04-01

    Chinese high -resolution (HR) remote sensing satellites have made huge leap in the past decade. Commercial satellite datasets, such as GF-1, GF-2 and ZY-3 images, the panchromatic images (PAN) resolution of them are 2 m, 1 m and 2.1 m and the multispectral images (MS) resolution are 8 m, 4 m, 5.8 m respectively have been emerged in recent years. Chinese HR satellite imagery has been free downloaded for public welfare purposes using. Local government began to employ more professional technician to improve traditional land management technology. This paper focused on analysing the actual requirements of the applications in government land law enforcement in Guangxi Autonomous Region. 66 counties in Guangxi Autonomous Region were selected for illegal land utilization spot extraction with fusion Chinese HR images. The procedure contains: A. Defines illegal land utilization spot type. B. Data collection, GF-1, GF-2, and ZY-3 datasets were acquired in the first half year of 2016 and other auxiliary data were collected in 2015. C. Batch process, HR images were collected for batch preprocessing through ENVI/IDL tool. D. Illegal land utilization spot extraction by visual interpretation. E. Obtaining attribute data with ArcGIS Geoprocessor (GP) model. F. Thematic mapping and surveying. Through analysing 42 counties results, law enforcement officials found 1092 illegal land using spots and 16 suspicious illegal mining spots. The results show that Chinese HR satellite images have great potential for feature information extraction and the processing procedure appears robust.

  3. Implementation of generalized quantum measurements: Superadditive quantum coding, accessible information extraction, and classical capacity limit

    International Nuclear Information System (INIS)

    Takeoka, Masahiro; Fujiwara, Mikio; Mizuno, Jun; Sasaki, Masahide

    2004-01-01

    Quantum-information theory predicts that when the transmission resource is doubled in quantum channels, the amount of information transmitted can be increased more than twice by quantum-channel coding technique, whereas the increase is at most twice in classical information theory. This remarkable feature, the superadditive quantum-coding gain, can be implemented by appropriate choices of code words and corresponding quantum decoding which requires a collective quantum measurement. Recently, an experimental demonstration was reported [M. Fujiwara et al., Phys. Rev. Lett. 90, 167906 (2003)]. The purpose of this paper is to describe our experiment in detail. Particularly, a design strategy of quantum-collective decoding in physical quantum circuits is emphasized. We also address the practical implication of the gain on communication performance by introducing the quantum-classical hybrid coding scheme. We show how the superadditive quantum-coding gain, even in a small code length, can boost the communication performance of conventional coding techniques

  4. Extraction of basic roadway information for non-state roads in Florida : [summary].

    Science.gov (United States)

    2015-07-01

    The Florida Department of Transportation (FDOT) maintains a map of all the roads in Florida, : containing over one and a half million road links. For planning purposes, a wide variety : of information, such as stop lights, signage, lane number, and s...

  5. Extracting additional risk managers information from a risk assessment of Listeria monocytogenes in deli meats

    NARCIS (Netherlands)

    Pérez-Rodríguez, F.; Asselt, van E.D.; García-Gimeno, R.M.; Zurera, G.; Zwietering, M.H.

    2007-01-01

    The risk assessment study of Listeria monocytogenes in ready-to-eat foods conducted by the U.S. Food and Drug Administration is an example of an extensive quantitative microbiological risk assessment that could be used by risk analysts and other scientists to obtain information and by managers and

  6. Synthetic aperture radar ship discrimination, generation and latent variable extraction using information maximizing generative adversarial networks

    CSIR Research Space (South Africa)

    Schwegmann, Colin P

    2017-07-01

    Full Text Available such as Synthetic Aperture Radar imagery. To aid in the creation of improved machine learning-based ship detection and discrimination methods this paper applies a type of neural network known as an Information Maximizing Generative Adversarial Network. Generative...

  7. An algorithm for detecting eukaryotic sequences in metagenomic ...

    Indian Academy of Sciences (India)

    species but also from accidental contamination from the genome of eukaryotic host cells. The latter scenario generally occurs in the case of host-associated metagenomes, e.g. microbes living in human gut. In such cases, one needs to identify and remove contaminating host DNA sequences, since the latter sequences will ...

  8. SPHINX--an algorithm for taxonomic binning of metagenomic sequences.

    Science.gov (United States)

    Mohammed, Monzoorul Haque; Ghosh, Tarini Shankar; Singh, Nitin Kumar; Mande, Sharmila S

    2011-01-01

    Compared with composition-based binning algorithms, the binning accuracy and specificity of alignment-based binning algorithms is significantly higher. However, being alignment-based, the latter class of algorithms require enormous amount of time and computing resources for binning huge metagenomic datasets. The motivation was to develop a binning approach that can analyze metagenomic datasets as rapidly as composition-based approaches, but nevertheless has the accuracy and specificity of alignment-based algorithms. This article describes a hybrid binning approach (SPHINX) that achieves high binning efficiency by utilizing the principles of both 'composition'- and 'alignment'-based binning algorithms. Validation results with simulated sequence datasets indicate that SPHINX is able to analyze metagenomic sequences as rapidly as composition-based algorithms. Furthermore, the binning efficiency (in terms of accuracy and specificity of assignments) of SPHINX is observed to be comparable with results obtained using alignment-based algorithms. A web server for the SPHINX algorithm is available at http://metagenomics.atc.tcs.com/SPHINX/.

  9. A feruloyl esterase derived from a leachate metagenome library

    CSIR Research Space (South Africa)

    Rashamuse, K

    2012-01-01

    Full Text Available A feruloyl esterase encoding gene (designated fae6), derived from a leachate metagenomic library, was cloned and the nucleotide sequence of the insert DNA determined. Translational analysis revealed that fae6 consists of a 515 amino acid polypeptide...

  10. Comprehensive benchmarking and ensemble approaches for metagenomic classifiers.

    Science.gov (United States)

    McIntyre, Alexa B R; Ounit, Rachid; Afshinnekoo, Ebrahim; Prill, Robert J; Hénaff, Elizabeth; Alexander, Noah; Minot, Samuel S; Danko, David; Foox, Jonathan; Ahsanuddin, Sofia; Tighe, Scott; Hasan, Nur A; Subramanian, Poorani; Moffat, Kelly; Levy, Shawn; Lonardi, Stefano; Greenfield, Nick; Colwell, Rita R; Rosen, Gail L; Mason, Christopher E

    2017-09-21

    One of the main challenges in metagenomics is the identification of microorganisms in clinical and environmental samples. While an extensive and heterogeneous set of computational tools is available to classify microorganisms using whole-genome shotgun sequencing data, comprehensive comparisons of these methods are limited. In this study, we use the largest-to-date set of laboratory-generated and simulated controls across 846 species to evaluate the performance of 11 metagenomic classifiers. Tools were characterized on the basis of their ability to identify taxa at the genus, species, and strain levels, quantify relative abundances of taxa, and classify individual reads to the species level. Strikingly, the number of species identified by the 11 tools can differ by over three orders of magnitude on the same datasets. Various strategies can ameliorate taxonomic misclassification, including abundance filtering, ensemble approaches, and tool intersection. Nevertheless, these strategies were often insufficient to completely eliminate false positives from environmental samples, which are especially important where they concern medically relevant species. Overall, pairing tools with different classification strategies (k-mer, alignment, marker) can combine their respective advantages. This study provides positive and negative controls, titrated standards, and a guide for selecting tools for metagenomic analyses by comparing ranges of precision, accuracy, and recall. We show that proper experimental design and analysis parameters can reduce false positives, provide greater resolution of species in complex metagenomic samples, and improve the interpretation of results.

  11. Metaviz: interactive statistical and visual analysis of metagenomic data.

    Science.gov (United States)

    Wagner, Justin; Chelaru, Florin; Kancherla, Jayaram; Paulson, Joseph N; Zhang, Alexander; Felix, Victor; Mahurkar, Anup; Elmqvist, Niklas; Corrada Bravo, Héctor

    2018-04-06

    Large studies profiling microbial communities and their association with healthy or disease phenotypes are now commonplace. Processed data from many of these studies are publicly available but significant effort is required for users to effectively organize, explore and integrate it, limiting the utility of these rich data resources. Effective integrative and interactive visual and statistical tools to analyze many metagenomic samples can greatly increase the value of these data for researchers. We present Metaviz, a tool for interactive exploratory data analysis of annotated microbiome taxonomic community profiles derived from marker gene or whole metagenome shotgun sequencing. Metaviz is uniquely designed to address the challenge of browsing the hierarchical structure of metagenomic data features while rendering visualizations of data values that are dynamically updated in response to user navigation. We use Metaviz to provide the UMD Metagenome Browser web service, allowing users to browse and explore data for more than 7000 microbiomes from published studies. Users can also deploy Metaviz as a web service, or use it to analyze data through the metavizr package to interoperate with state-of-the-art analysis tools available through Bioconductor. Metaviz is free and open source with the code, documentation and tutorials publicly accessible.

  12. A human gut microbial gene catalogue established by metagenomic sequencing

    DEFF Research Database (Denmark)

    dos Santos, Marcelo Bertalan Quintanilha; Sicheritz-Pontén, Thomas; Nielsen, Henrik Bjørn

    2010-01-01

    To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Here we describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million non-redundant microbial genes, derived from 576.7 gigabases of sequence...

  13. Functional Metagenomic Investigations of the Human Intestinal Microbiota

    Directory of Open Access Journals (Sweden)

    Aimee Marguerite Moore

    2011-10-01

    Full Text Available The human intestinal microbiota encode multiple critical functions impacting human health, including, metabolism of dietary substrate, prevention of pathogen invasion, immune system modulation, and provision of a reservoir of antibiotic resistance genes accessible to pathogens. The complexity of this microbial community, its recalcitrance to standard cultivation and the immense diversity of its encoded genes has necessitated the development of novel molecular, microbiological, and genomic tools. Functional metagenomics is one such culture-independent technique used for decades to study environmental microorganisms but relatively recently applied to the study of the human commensal microbiota. Metagenomic functional screens characterize the functional capacity of a microbial community independent of identity to known genes by subjecting the metagenome to functional assays in a genetically tractable host. Here we highlight recent work applying this technique to study the functional diversity of the intestinal microbiota, and discuss how an approach combining high-throughput sequencing, cultivation, and metagenomic functional screens can improve our understanding of interactions between this complex community and its human host.

  14. You had me at "Hello": Rapid extraction of dialect information from spoken words.

    Science.gov (United States)

    Scharinger, Mathias; Monahan, Philip J; Idsardi, William J

    2011-06-15

    Research on the neuronal underpinnings of speaker identity recognition has identified voice-selective areas in the human brain with evolutionary homologues in non-human primates who have comparable areas for processing species-specific calls. Most studies have focused on estimating the extent and location of these areas. In contrast, relatively few experiments have investigated the time-course of speaker identity, and in particular, dialect processing and identification by electro- or neuromagnetic means. We show here that dialect extraction occurs speaker-independently, pre-attentively and categorically. We used Standard American English and African-American English exemplars of 'Hello' in a magnetoencephalographic (MEG) Mismatch Negativity (MMN) experiment. The MMN as an automatic change detection response of the brain reflected dialect differences that were not entirely reducible to acoustic differences between the pronunciations of 'Hello'. Source analyses of the M100, an auditory evoked response to the vowels suggested additional processing in voice-selective areas whenever a dialect change was detected. These findings are not only relevant for the cognitive neuroscience of language, but also for the social sciences concerned with dialect and race perception. Copyright © 2011 Elsevier Inc. All rights reserved.

  15. Extraction of indirectly captured information for use in a comparison of offline pH measurement technologies.

    Science.gov (United States)

    Ritchie, Elspeth K; Martin, Elaine B; Racher, Andy; Jaques, Colin

    2017-06-10

    Understanding the causes of discrepancies in pH readings of a sample can allow more robust pH control strategies to be implemented. It was found that 59.4% of differences between two offline pH measurement technologies for an historical dataset lay outside an expected instrument error range of ±0.02pH. A new variable, Osmo Res , was created using multiple linear regression (MLR) to extract information indirectly captured in the recorded measurements for osmolality. Principal component analysis and time series analysis were used to validate the expansion of the historical dataset with the new variable Osmo Res . MLR was used to identify variables strongly correlated (p<0.05) with differences in pH readings by the two offline pH measurement technologies. These included concentrations of specific chemicals (e.g. glucose) and Osmo Res, indicating culture medium and bolus feed additions as possible causes of discrepancies between the offline pH measurement technologies. Temperature was also identified as statistically significant. It is suggested that this was a result of differences in pH-temperature compensations employed by the pH measurement technologies. In summary, a method for extracting indirectly captured information has been demonstrated, and it has been shown that competing pH measurement technologies were not necessarily interchangeable at the desired level of control (±0.02pH). Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Extracting 3d Semantic Information from Video Surveillance System Using Deep Learning

    Science.gov (United States)

    Zhang, J. S.; Cao, J.; Mao, B.; Shen, D. Q.

    2018-04-01

    At present, intelligent video analysis technology has been widely used in various fields. Object tracking is one of the important part of intelligent video surveillance, but the traditional target tracking technology based on the pixel coordinate system in images still exists some unavoidable problems. Target tracking based on pixel can't reflect the real position information of targets, and it is difficult to track objects across scenes. Based on the analysis of Zhengyou Zhang's camera calibration method, this paper presents a method of target tracking based on the target's space coordinate system after converting the 2-D coordinate of the target into 3-D coordinate. It can be seen from the experimental results: Our method can restore the real position change information of targets well, and can also accurately get the trajectory of the target in space.

  17. Information extracting and processing with diffraction enhanced imaging of X-ray

    International Nuclear Information System (INIS)

    Chen Bo; Chinese Academy of Science, Beijing; Chen Chunchong; Jiang Fan; Chen Jie; Ming Hai; Shu Hang; Zhu Peiping; Wang Junyue; Yuan Qingxi; Wu Ziyu

    2006-01-01

    X-ray imaging at high energies has been used for many years in many fields. Conventional X-ray imaging is based on the different absorption within a sample. It is difficult to distinguish different tissues of a biological sample because of their small difference in absorption. The authors use the diffraction enhanced imaging (DEI) method. The authors took images of absorption, extinction, scattering and refractivity. In the end, the authors presented pictures of high resolution with all these information combined. (authors)

  18. Architecture and data processing alternatives for the TSE computer. Volume 2: Extraction of topological information from an image by the Tse computer

    Science.gov (United States)

    Jones, J. R.; Bodenheimer, R. E.

    1976-01-01

    A simple programmable Tse processor organization and arithmetic operations necessary for extraction of the desired topological information are described. Hardware additions to this organization are discussed along with trade-offs peculiar to the tse computing concept. An improved organization is presented along with the complementary software for the various arithmetic operations. The performance of the two organizations is compared in terms of speed, power, and cost. Software routines developed to extract the desired information from an image are included.

  19. Comparative analysis of metagenomes of Italian top soil improvers

    International Nuclear Information System (INIS)

    Gigliucci, Federica; Brambilla, Gianfranco; Tozzoli, Rosangela; Michelacci, Valeria; Morabito, Stefano

    2017-01-01

    Biosolids originating from Municipal Waste Water Treatment Plants are proposed as top soil improvers (TSI) for their beneficial input of organic carbon on agriculture lands. Their use to amend soil is controversial, as it may lead to the presence of emerging hazards of anthropogenic or animal origin in the environment devoted to food production. In this study, we used a shotgun metagenomics sequencing as a tool to perform a characterization of the hazards related with the TSIs. The samples showed the presence of many virulence genes associated to different diarrheagenic E. coli pathotypes as well as of different antimicrobial resistance-associated genes. The genes conferring resistance to Fluoroquinolones was the most relevant class of antimicrobial resistance genes observed in all the samples tested. To a lesser extent traits associated with the resistance to Methicillin in Staphylococci and genes conferring resistance to Streptothricin, Fosfomycin and Vancomycin were also identified. The most represented metal resistance genes were cobalt-zinc-cadmium related, accounting for 15–50% of the sequence reads in the different metagenomes out of the total number of those mapping on the class of resistance to compounds determinants. Moreover the taxonomic analysis performed by comparing compost-based samples and biosolids derived from municipal sewage-sludges treatments divided the samples into separate populations, based on the microbiota composition. The results confirm that the metagenomics is efficient to detect genomic traits associated with pathogens and antimicrobial resistance in complex matrices and this approach can be efficiently used for the traceability of TSI samples using the microorganisms’ profiles as indicators of their origin. - Highlights: • Sludge- and green- based biosolids analysed by metagenomics. • Biosolids may introduce microbial hazards in the food chain. • Metagenomics enables tracking biosolids’ sources.

  20. Functional metagenomics to decipher food-microbe-host crosstalk.

    Science.gov (United States)

    Larraufie, Pierre; de Wouters, Tomas; Potocki-Veronese, Gabrielle; Blottière, Hervé M; Doré, Joël

    2015-02-01

    The recent developments of metagenomics permit an extremely high-resolution molecular scan of the intestinal microbiota giving new insights and opening perspectives for clinical applications. Beyond the unprecedented vision of the intestinal microbiota given by large-scale quantitative metagenomics studies, such as the EU MetaHIT project, functional metagenomics tools allow the exploration of fine interactions between food constituents, microbiota and host, leading to the identification of signals and intimate mechanisms of crosstalk, especially between bacteria and human cells. Cloning of large genome fragments, either from complex intestinal communities or from selected bacteria, allows the screening of these biological resources for bioactivity towards complex plant polymers or functional food such as prebiotics. This permitted identification of novel carbohydrate-active enzyme families involved in dietary fibre and host glycan breakdown, and highlighted unsuspected bacterial players at the top of the intestinal microbial food chain. Similarly, exposure of fractions from genomic and metagenomic clones onto human cells engineered with reporter systems to track modulation of immune response, cell proliferation or cell metabolism has allowed the identification of bioactive clones modulating key cell signalling pathways or the induction of specific genes. This opens the possibility to decipher mechanisms by which commensal bacteria or candidate probiotics can modulate the activity of cells in the intestinal epithelium or even in distal organs such as the liver, adipose tissue or the brain. Hence, in spite of our inability to culture many of the dominant microbes of the human intestine, functional metagenomics open a new window for the exploration of food-microbe-host crosstalk.

  1. Comparative analysis of metagenomes of Italian top soil improvers

    Energy Technology Data Exchange (ETDEWEB)

    Gigliucci, Federica, E-mail: Federica.gigliucci@libero.it [Department of Veterinary Public Health and Food Safety, Istituto Superiore di Sanità, Viale Regina Elena, 299 00161 Rome (Italy); Department of Sciences, University Roma,Tre, Viale Marconi, 446, 00146 Rome (Italy); Brambilla, Gianfranco; Tozzoli, Rosangela; Michelacci, Valeria; Morabito, Stefano [Department of Veterinary Public Health and Food Safety, Istituto Superiore di Sanità, Viale Regina Elena, 299 00161 Rome (Italy)

    2017-05-15

    Biosolids originating from Municipal Waste Water Treatment Plants are proposed as top soil improvers (TSI) for their beneficial input of organic carbon on agriculture lands. Their use to amend soil is controversial, as it may lead to the presence of emerging hazards of anthropogenic or animal origin in the environment devoted to food production. In this study, we used a shotgun metagenomics sequencing as a tool to perform a characterization of the hazards related with the TSIs. The samples showed the presence of many virulence genes associated to different diarrheagenic E. coli pathotypes as well as of different antimicrobial resistance-associated genes. The genes conferring resistance to Fluoroquinolones was the most relevant class of antimicrobial resistance genes observed in all the samples tested. To a lesser extent traits associated with the resistance to Methicillin in Staphylococci and genes conferring resistance to Streptothricin, Fosfomycin and Vancomycin were also identified. The most represented metal resistance genes were cobalt-zinc-cadmium related, accounting for 15–50% of the sequence reads in the different metagenomes out of the total number of those mapping on the class of resistance to compounds determinants. Moreover the taxonomic analysis performed by comparing compost-based samples and biosolids derived from municipal sewage-sludges treatments divided the samples into separate populations, based on the microbiota composition. The results confirm that the metagenomics is efficient to detect genomic traits associated with pathogens and antimicrobial resistance in complex matrices and this approach can be efficiently used for the traceability of TSI samples using the microorganisms’ profiles as indicators of their origin. - Highlights: • Sludge- and green- based biosolids analysed by metagenomics. • Biosolids may introduce microbial hazards in the food chain. • Metagenomics enables tracking biosolids’ sources.

  2. What do professional forecasters' stock market expectations tell us about herding, information extraction and beauty contests?

    DEFF Research Database (Denmark)

    Rangvid, Jesper; Schmeling, M.; Schrimpf, A.

    2013-01-01

    We study how professional forecasters form equity market expectations based on a new micro-level dataset which includes rich cross-sectional information about individual characteristics. We focus on testing whether agents rely on the beliefs of others, i.e., consensus expectations, when forming...... their own forecast. We find strong evidence that the average of all forecasters' beliefs influences an individual's own forecast. This effect is stronger for young and less experienced forecasters as well as forecasters whose pay depends more on performance relative to a benchmark. Further tests indicate...

  3. Metagenomics, metaMicrobesOnline and Kbase Data Integration (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    Energy Technology Data Exchange (ETDEWEB)

    Dehal, Paramvir

    2011-10-12

    Berkeley Lab's Paramvir Dehal on "Managing and Storing large Datasets in MicrobesOnline, metaMicrobesOnline and the DOE Knowledgebase" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  4. CLASSIFICATION OF INFORMAL SETTLEMENTS THROUGH THE INTEGRATION OF 2D AND 3D FEATURES EXTRACTED FROM UAV DATA

    Directory of Open Access Journals (Sweden)

    C. M. Gevaert

    2016-06-01

    Full Text Available Unmanned Aerial Vehicles (UAVs are capable of providing very high resolution and up-to-date information to support informal settlement upgrading projects. In order to provide accurate basemaps, urban scene understanding through the identification and classification of buildings and terrain is imperative. However, common characteristics of informal settlements such as small, irregular buildings with heterogeneous roof material and large presence of clutter challenge state-of-the-art algorithms. Especially the dense buildings and steeply sloped terrain cause difficulties in identifying elevated objects. This work investigates how 2D radiometric and textural features, 2.5D topographic features, and 3D geometric features obtained from UAV imagery can be integrated to obtain a high classification accuracy in challenging classification problems for the analysis of informal settlements. It compares the utility of pixel-based and segment-based features obtained from an orthomosaic and DSM with point-based and segment-based features extracted from the point cloud to classify an unplanned settlement in Kigali, Rwanda. Findings show that the integration of 2D and 3D features leads to higher classification accuracies.

  5. Phylogeny and phylogeography of functional genes shared among seven terrestrial subsurface metagenomes reveal N-cycling and microbial evolutionary relationships

    Directory of Open Access Journals (Sweden)

    Maggie CY Lau

    2014-10-01

    Full Text Available Comparative studies on community phylogenetics and phylogeography of microorganisms living in extreme environments are rare. Terrestrial subsurface habitats are valuable for studying microbial biogeographical patterns due to their isolation and the restricted dispersal mechanisms. Since the taxonomic identity of a microorganism does not always correspond well with its functional role in a particular community, the use of taxonomic assignments or patterns may give limited inference on how microbial functions are affected by historical, geographical and environmental factors. With seven metagenomic libraries generated from fracture water samples collected from five South African mines, this study was carried out to (1 screen for ubiquitous functions or pathways of biogeochemical cycling of CH4, S and N; (2 to characterize the biodiversity represented by the common functional genes; (3 to investigate the subsurface biogeography as revealed by this subset of genes; and (4 to explore the possibility of using metagenomic data for evolutionary study. The ubiquitous functional genes are NarV, NPD, PAP reductase, NifH, NifD, NifK, NifE and NifN genes. Although these 8 common functional genes were taxonomically and phylogenetically diverse and distinct from each other, the dissimilarity between samples did not correlate strongly with either geographical, environmental or residence time of the water. Por genes homologous to those of Thermodesulfovibrio yellowstonii detected in all metagenomes were deep lineages of Nitrospirae, suggesting that subsurface habitats have preserved ancestral genetic signatures that inform the study of the origin and evolution of prokaryotes.

  6. Integrating Metagenomics and NanoSIMS to Investigate the Evolution and Ecophysiology of Magnetotactic Bacteria

    Science.gov (United States)

    Lin, W.; Zhang, W.; He, M.; Pan, Y.

    2017-12-01

    Magnetotactic bacteria (MTB) synthesize intracellular nano-sized magnetite (Fe3O4) and/or greigite (Fe3S4) crystals, called magnetosomes, which impart a permanent magnetic dipole moment to the cell causing it to align along the geomagnetic field lines as it swims. MTB play essential roles in global cycling of Fe, S, N and C, and represent an excellent model system not just for the investigation of the mechanisms of microbial engines that drive Earth's biogeochemical cycles but also for magnetotaxis and microbial biomineralization. Most of the previous studies on MTB were based on 16S rRNA gene-targeting analyses, which are powerful approaches to characterize the diversity, ecology and biogeography of MTB in nature. However, these approaches are somewhat limited in the physiological detail they can provide. In the present study, we have combined the genome-resolved metagenomics and nanoscale secondary ion mass spectrometry (NanoSIMS) analyses to study the genomic information, biomineralization mechanism and metabolic potential of environmental MTB. Two nearly complete genomes from uncultivated MTB belonging to the Nitrospirae phylum were reconstructed and their proposed metabolisms were further investigated and confirmed through NanoSIMS analyses. These results improve our understanding about the ecophysiology and evolution of MTB and their environmental function. The development of metagenomics-NanoSIMS integrated approach will provide a powerful tool for the research of geomicrobiology and environmental microbiology.

  7. A platform-independent method for detecting errors in metagenomic sequencing data: DRISEE.

    Directory of Open Access Journals (Sweden)

    Kevin P Keegan

    Full Text Available We provide a novel method, DRISEE (duplicate read inferred sequencing error estimation, to assess sequencing quality (alternatively referred to as "noise" or "error" within and/or between sequencing samples. DRISEE provides positional error estimates that can be used to inform read trimming within a sample. It also provides global (whole sample error estimates that can be used to identify samples with high or varying levels of sequencing error that may confound downstream analyses, particularly in the case of studies that utilize data from multiple sequencing samples. For shotgun metagenomic data, we believe that DRISEE provides estimates of sequencing error that are more accurate and less constrained by technical limitations than existing methods that rely on reference genomes or the use of scores (e.g. Phred. Here, DRISEE is applied to (non amplicon data sets from both the 454 and Illumina platforms. The DRISEE error estimate is obtained by analyzing sets of artifactual duplicate reads (ADRs, a known by-product of both sequencing platforms. We present DRISEE as an open-source, platform-independent method to assess sequencing error in shotgun metagenomic data, and utilize it to discover previously uncharacterized error in de novo sequence data from the 454 and Illumina sequencing platforms.

  8. A method to extract quantitative information in analyzer-based x-ray phase contrast imaging

    International Nuclear Information System (INIS)

    Pagot, E.; Cloetens, P.; Fiedler, S.; Bravin, A.; Coan, P.; Baruchel, J.; Haertwig, J.; Thomlinson, W.

    2003-01-01

    Analyzer-based imaging is a powerful phase-sensitive technique that generates improved contrast compared to standard absorption radiography. Combining numerically two images taken on either side at ±1/2 of the full width at half-maximum (FWHM) of the rocking curve provides images of 'pure refraction' and of 'apparent absorption'. In this study, a similar approach is made by combining symmetrical images with respect to the peak of the analyzer rocking curve but at general positions, ±α·FWHM. These two approaches do not consider the ultrasmall angle scattering produced by the object independently, which can lead to inconsistent results. An accurate way to separately retrieve the quantitative information intrinsic to the object is proposed. It is based on a statistical analysis of the local rocking curve, and allows one to overcome the problems encountered using the previous approaches

  9. Breast cancer and quality of life: medical information extraction from health forums.

    Science.gov (United States)

    Opitz, Thomas; Aze, Jérome; Bringay, Sandra; Joutard, Cyrille; Lavergne, Christian; Mollevi, Caroline

    2014-01-01

    Internet health forums are a rich textual resource with content generated through free exchanges among patients and, in certain cases, health professionals. We tackle the problem of retrieving clinically relevant information from such forums, with relevant topics being defined from clinical auto-questionnaires. Texts in forums are largely unstructured and noisy, calling for adapted preprocessing and query methods. We minimize the number of false negatives in queries by using a synonym tool to achieve query expansion of initial topic keywords. To avoid false positives, we propose a new measure based on a statistical comparison of frequent co-occurrences in a large reference corpus (Web) to keep only relevant expansions. Our work is motivated by a study of breast cancer patients' health-related quality of life (QoL). We consider topics defined from a breast-cancer specific QoL-questionnaire. We quantify and structure occurrences in posts of a specialized French forum and outline important future developments.

  10. EXTRACTION OF BENTHIC COVER INFORMATION FROM VIDEO TOWS AND PHOTOGRAPHS USING OBJECT-BASED IMAGE ANALYSIS

    Directory of Open Access Journals (Sweden)

    M. T. L. Estomata

    2012-07-01

    Full Text Available Mapping benthic cover in deep waters comprises a very small proportion of studies in the field of research. Majority of benthic cover mapping makes use of satellite images and usually, classification is carried out only for shallow waters. To map the seafloor in optically deep waters, underwater videos and photos are needed. Some researchers have applied this method on underwater photos, but made use of different classification methods such as: Neural Networks, and rapid classification via down sampling. In this study, accurate bathymetric data obtained using a multi-beam echo sounder (MBES was attempted to be used as complementary data with the underwater photographs. Due to the absence of a motion reference unit (MRU, which applies correction to the data gathered by the MBES, accuracy of the said depth data was compromised. Nevertheless, even with the absence of accurate bathymetric data, object-based image analysis (OBIA, which used rule sets based on information such as shape, size, area, relative distance, and spectral information, was still applied. Compared to pixel-based classifications, OBIA was able to classify more specific benthic cover types other than coral and sand, such as rubble and fish. Through the use of rule sets on area, less than or equal to 700 pixels for fish and between 700 to 10,000 pixels for rubble, as well as standard deviation values to distinguish texture, fish and rubble were identified. OBIA produced benthic cover maps that had higher overall accuracy, 93.78±0.85%, as compared to pixel-based methods that had an average accuracy of only 87.30±6.11% (p-value = 0.0001, α = 0.05.

  11. DOE JGI Quality Metrics; Approaches to Scaling and Improving Metagenome Assembly (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    Energy Technology Data Exchange (ETDEWEB)

    Copeland, Alex; Brown, C. Titus

    2011-10-13

    DOE JGI's Alex Copeland on "DOE JGI Quality Metrics" and Michigan State University's C. Titus Brown on "Approaches to Scaling and Improving Metagenome Assembly" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  12. Evaluation of the Cow Rumen Metagenome: Assembly by Single Copy Gene Analysis and Single Cell Genome Assemblies (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    Energy Technology Data Exchange (ETDEWEB)

    Sczyrba, Alex

    2011-10-13

    DOE JGI's Alex Sczyrba on "Evaluation of the Cow Rumen Metagenome" and "Assembly by Single Copy Gene Analysis and Single Cell Genome Assemblies" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  13. MetaVelvet: An Extension of Velvet Assembler to de novo Metagenome Assembly from Short Sequence Reads (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    Energy Technology Data Exchange (ETDEWEB)

    Sakakibara, Yasumbumi

    2011-10-13

    Keio University's Yasumbumi Sakakibara on "MetaVelvet: An Extension of Velvet Assembler to de novo Metagenome Assembly from Short Sequence Reads" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  14. Information Extraction and Interpretation Analysis of Mineral Potential Targets Based on ETM+ Data and GIS technology: A Case Study of Copper and Gold Mineralization in Burma

    International Nuclear Information System (INIS)

    Wenhui, Du; Yongqing, Chen; Nana, Guo; Yinglong, Hao; Pengfei, Zhao; Gongwen, Wang

    2014-01-01

    Mineralization-alteration and structure information extraction plays important roles in mineral resource prospecting and assessment using remote sensing data and the Geographical Information System (GIS) technology. Choosing copper and gold mines in Burma as example, the authors adopt band ratio, threshold segmentation and principal component analysis (PCA) to extract the hydroxyl alteration information using ETM+ remote sensing images. Digital elevation model (DEM) (30m spatial resolution) and ETM+ data was used to extract linear and circular faults that are associated with copper and gold mineralization. Combining geological data and the above information, the weights of evidence method and the C-A fractal model was used to integrate and identify the ore-forming favourable zones in this area. Research results show that the high grade potential targets are located with the known copper and gold deposits, and the integrated information can be used to the next exploration for the mineral resource decision-making

  15. The BEL information extraction workflow (BELIEF): evaluation in the BioCreative V BEL and IAT track.

    Science.gov (United States)

    Madan, Sumit; Hodapp, Sven; Senger, Philipp; Ansari, Sam; Szostak, Justyna; Hoeng, Julia; Peitsch, Manuel; Fluck, Juliane

    2016-01-01

    Network-based approaches have become extremely important in systems biology to achieve a better understanding of biological mechanisms. For network representation, the Biological Expression Language (BEL) is well designed to collate findings from the scientific literature into biological network models. To facilitate encoding and biocuration of such findings in BEL, a BEL Information Extraction Workflow (BELIEF) was developed. BELIEF provides a web-based curation interface, the BELIEF Dashboard, that incorporates text mining techniques to support the biocurator in the generation of BEL networks. The underlying UIMA-based text mining pipeline (BELIEF Pipeline) uses several named entity recognition processes and relationship extraction methods to detect concepts and BEL relationships in literature. The BELIEF Dashboard allows easy curation of the automatically generated BEL statements and their context annotations. Resulting BEL statements and their context annotations can be syntactically and semantically verified to ensure consistency in the BEL network. In summary, the workflow supports experts in different stages of systems biology network building. Based on the BioCreative V BEL track evaluation, we show that the BELIEF Pipeline automatically extracts relationships with an F-score of 36.4% and fully correct statements can be obtained with an F-score of 30.8%. Participation in the BioCreative V Interactive task (IAT) track with BELIEF revealed a systems usability scale (SUS) of 67. Considering the complexity of the task for new users-learning BEL, working with a completely new interface, and performing complex curation-a score so close to the overall SUS average highlights the usability of BELIEF.Database URL: BELIEF is available at http://www.scaiview.com/belief/. © The Author(s) 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  16. Citizen-Centric Urban Planning through Extracting Emotion Information from Twitter in an Interdisciplinary Space-Time-Linguistics Algorithm

    Directory of Open Access Journals (Sweden)

    Bernd Resch

    2016-07-01

    Full Text Available Traditional urban planning processes typically happen in offices and behind desks. Modern types of civic participation can enhance those processes by acquiring citizens’ ideas and feedback in participatory sensing approaches like “People as Sensors”. As such, citizen-centric planning can be achieved by analysing Volunteered Geographic Information (VGI data such as Twitter tweets and posts from other social media channels. These user-generated data comprise several information dimensions, such as spatial and temporal information, and textual content. However, in previous research, these dimensions were generally examined separately in single-disciplinary approaches, which does not allow for holistic conclusions in urban planning. This paper introduces TwEmLab, an interdisciplinary approach towards extracting citizens’ emotions in different locations within a city. More concretely, we analyse tweets in three dimensions (space, time, and linguistics, based on similarities between each pair of tweets as defined by a specific set of functional relationships in each dimension. We use a graph-based semi-supervised learning algorithm to classify the data into discrete emotions (happiness, sadness, fear, anger/disgust, none. Our proposed solution allows tweets to be classified into emotion classes in a multi-parametric approach. Additionally, we created a manually annotated gold standard that can be used to evaluate TwEmLab’s performance. Our experimental results show that we are able to identify tweets carrying emotions and that our approach bears extensive potential to reveal new insights into citizens’ perceptions of the city.

  17. Eodataservice.org: Big Data Platform to Enable Multi-disciplinary Information Extraction from Geospatial Data

    Science.gov (United States)

    Natali, S.; Mantovani, S.; Barboni, D.; Hogan, P.

    2017-12-01

    In 1999, US Vice-President Al Gore outlined the concept of `Digital Earth' as a multi-resolution, three-dimensional representation of the planet to find, visualise and make sense of vast amounts of geo- referenced information on physical and social environments, allowing to navigate through space and time, accessing historical and forecast data to support scientists, policy-makers, and any other user. The eodataservice platform (http://eodataservice.org/) implements the Digital Earth Concept: eodatasevice is a cross-domain platform that makes available a large set of multi-year global environmental collections allowing data discovery, visualization, combination, processing and download. It implements a "virtual datacube" approach where data stored on distributed data centers are made available via standardized OGC-compliant interfaces. Dedicated web-based Graphic User Interfaces (based on the ESA-NASA WebWorldWind technology) as well as web-based notebooks (e.g. Jupyter notebook), deskop GIS tools and command line interfaces can be used to access and manipulate the data. The platform can be fully customized on users' needs. So far eodataservice has been used for the following thematic applications: High resolution satellite data distribution Land surface monitoring using SAR surface deformation data Atmosphere, ocean and climate applications Climate-health applications Urban Environment monitoring Safeguard of cultural heritage sites Support to farmers and (re)-insurances in the agriculturés field In the current work, the EO Data Service concept is presented as key enabling technology; furthermore various examples are provided to demonstrate the high level of interdisciplinarity of the platform.

  18. [Mini review] metagenomic studies of the Red Sea

    KAUST Repository

    Behzad, Hayedeh; Ibarra, Martin Augusto; Mineta, Katsuhiko; Gojobori, Takashi

    2015-01-01

    Metagenomics has significantly advanced the field of marine microbial ecology, revealing the vast diversity of previously unknown microbial life forms in different marine niches. The tremendous amount of data generated has enabled identification of a large number of microbial genes (metagenomes), their community interactions, adaptation mechanisms, and their potential applications in pharmaceutical and biotechnology-based industries. Comparative metagenomics reveals that microbial diversity is a function of the local environment, meaning that unique or unusual environments typically harbor novel microbial species with unique genes and metabolic pathways. The Red Sea has an abundance of unique characteristics; however, its microbiota is one of the least studied amongst marine environments. The Red Sea harbors approximately 25 hot anoxic brine pools, plus a vibrant coral reef ecosystem. Physiochemical studies describe the Red Sea as an oligotrophic environment that contains one of the warmest and saltiest waters in the world with year-round high UV radiations. These characteristics are believed to have shaped the evolution of microbial communities in the Red Sea. Over-representation of genes involved in DNA repair, high-intensity light responses, and osmolyte C1 oxidation were found in the Red Sea metagenomic databases suggesting acquisition of specific environmental adaptation by the Red Sea microbiota. The Red Sea brine pools harbor a diverse range of halophilic and thermophilic bacterial and archaeal communities, which are potential sources of enzymes for pharmaceutical and biotechnology-based application. Understanding the mechanisms of these adaptations and their function within the larger ecosystem could also prove useful in light of predicted global warming scenarios where global ocean temperatures are expected to rise by 1–3 °C in the next few decades. In this review, we provide an overview of the published metagenomic studies that were conducted in the

  19. [Mini review] metagenomic studies of the Red Sea

    KAUST Repository

    Behzad, Hayedeh

    2015-10-23

    Metagenomics has significantly advanced the field of marine microbial ecology, revealing the vast diversity of previously unknown microbial life forms in different marine niches. The tremendous amount of data generated has enabled identification of a large number of microbial genes (metagenomes), their community interactions, adaptation mechanisms, and their potential applications in pharmaceutical and biotechnology-based industries. Comparative metagenomics reveals that microbial diversity is a function of the local environment, meaning that unique or unusual environments typically harbor novel microbial species with unique genes and metabolic pathways. The Red Sea has an abundance of unique characteristics; however, its microbiota is one of the least studied amongst marine environments. The Red Sea harbors approximately 25 hot anoxic brine pools, plus a vibrant coral reef ecosystem. Physiochemical studies describe the Red Sea as an oligotrophic environment that contains one of the warmest and saltiest waters in the world with year-round high UV radiations. These characteristics are believed to have shaped the evolution of microbial communities in the Red Sea. Over-representation of genes involved in DNA repair, high-intensity light responses, and osmolyte C1 oxidation were found in the Red Sea metagenomic databases suggesting acquisition of specific environmental adaptation by the Red Sea microbiota. The Red Sea brine pools harbor a diverse range of halophilic and thermophilic bacterial and archaeal communities, which are potential sources of enzymes for pharmaceutical and biotechnology-based application. Understanding the mechanisms of these adaptations and their function within the larger ecosystem could also prove useful in light of predicted global warming scenarios where global ocean temperatures are expected to rise by 1–3 °C in the next few decades. In this review, we provide an overview of the published metagenomic studies that were conducted in the

  20. A METHOD OF EXTRACTING SHORELINE BASED ON SEMANTIC INFORMATION USING DUAL-LENGTH LiDAR DATA

    Directory of Open Access Journals (Sweden)

    C. Yao

    2017-09-01

    Full Text Available Shoreline is a spatial varying separation between water and land. By utilizing dual-wavelength LiDAR point data together with semantic information that shoreline often appears beyond water surface profile and is observable on the beach, the paper generates the shoreline and the details are as follows: (1 Gain the water surface profile: first we obtain water surface by roughly selecting water points based on several features of water body, then apply least square fitting method to get the whole water trend surface. Then we get the ground surface connecting the under -water surface by both TIN progressive filtering method and surface interpolation method. After that, we have two fitting surfaces intersected to get water surface profile of the island. (2 Gain the sandy beach: we grid all points and select the water surface profile grids points as seeds, then extract sandy beach points based on eight-neighborhood method and features, then we get all sandy beaches. (3 Get the island shoreline: first we get the sandy beach shoreline based on intensity information, then we get a threshold value to distinguish wet area and dry area, therefore we get the shoreline of several sandy beaches. In some extent, the shoreline has the same height values within a small area, by using all the sandy shoreline points to fit a plane P, and the intersection line of the ground surface and the shoreline plane P can be regarded as the island shoreline. By comparing with the surveying shoreline, the results show that the proposed method can successfully extract shoreline.

  1. Metagenomic sequence of saline desert microbiota from wild ass sanctuary, Little Rann of Kutch, Gujarat, India.

    Science.gov (United States)

    Patel, Rajesh; Mevada, Vishal; Prajapati, Dhaval; Dudhagara, Pravin; Koringa, Prakash; Joshi, C G

    2015-03-01

    We report Metagenome from the saline desert soil sample of Little Rann of Kutch, Gujarat State, India. Metagenome consisted of 633,760 sequences with size 141,307,202 bp and 56% G + C content. Metagenome sequence data are available at EBI under EBI Metagenomics database with accession no. ERP005612. Community metagenomics revealed total 1802 species belonged to 43 different phyla with dominating Marinobacter (48.7%) and Halobacterium (4.6%) genus in bacterial and archaeal domain respectively. Remarkably, 18.2% sequences in a poorly characterized group and 4% gene for various stress responses along with versatile presence of commercial enzyme were evident in a functional metagenome analysis.

  2. High throughtput comparisons and profiling of metagenomes for industrially relevant enzymes

    KAUST Repository

    Alam, Intikhab

    2016-01-26

    More and more genomes and metagenomes are being sequenced since the advent of Next Generation Sequencing Technologies (NGS). Many metagenomic samples are collected from a variety of environments, each exhibiting a different environmental profile, e.g. temperature, environmental chemistry, etc… These metagenomes can be profiled to unearth enzymes relevant to several industries based on specific enzyme properties such as ability to work on extreme conditions, such as extreme temperatures, salinity, anaerobically, etc.. In this work, we present the DMAP platform comprising of a high-throughput metagenomic annotation pipeline and a data-warehouse for comparisons and profiling across large number of metagenomes. We developed two reference databases for profiling of important genes, one containing enzymes related to different industries and the other containing genes with potential bioactivity roles. In this presentation we describe an example analysis of a large number of publicly available metagenomic sample from TARA oceans study (Science 2015) that covers significant part of world oceans.

  3. Metagenomic analysis of lysogeny in Tampa Bay: implications for prophage gene expression.

    Directory of Open Access Journals (Sweden)

    Lauren McDaniel

    Full Text Available Phage integrase genes often play a role in the establishment of lysogeny in temperate phage by catalyzing the integration of the phage into one of the host's replicons. To investigate temperate phage gene expression, an induced viral metagenome from Tampa Bay was sequenced by 454/Pyrosequencing. The sequencing yielded 294,068 reads with 6.6% identifiable. One hundred-three sequences had significant similarity to integrases by BLASTX analysis (e < or =0.001. Four sequences with strongest amino-acid level similarity to integrases were selected and real-time PCR primers and probes were designed. Initial testing with microbial fraction DNA from Tampa Bay revealed 1.9 x 10(7, and 1300 gene copies of Vibrio-like integrase and Oceanicola-like integrase L(-1 respectively. The other two integrases were not detected. The integrase assay was then tested on microbial fraction RNA extracted from 200 ml of Tampa Bay water sampled biweekly over a 12 month time series. Vibrio-like integrase gene expression was detected in three samples, with estimated copy numbers of 2.4-1280 L(-1. Clostridium-like integrase gene expression was detected in 6 samples, with estimated copy numbers of 37 to 265 L(-1. In all cases, detection of integrase gene expression corresponded to the occurrence of lysogeny as detected by prophage induction. Investigation of the environmental distribution of the two expressed integrases in the Global Ocean Survey Database found the Vibrio-like integrase was present in genome equivalents of 3.14% of microbial libraries and all four viral metagenomes. There were two similar genes in the library from British Columbia and one similar gene was detected in both the Gulf of Mexico and Sargasso Sea libraries. In contrast, in the Arctic library eleven similar genes were observed. The Clostridium-like integrase was less prevalent, being found in 0.58% of the microbial and none of the viral libraries. These results underscore the value of metagenomic data

  4. Metagenomes from two microbial consortia associated with Santa Barbara seep oil.

    Science.gov (United States)

    Hawley, Erik R; Malfatti, Stephanie A; Pagani, Ioanna; Huntemann, Marcel; Chen, Amy; Foster, Brian; Copeland, Alexander; del Rio, Tijana Glavina; Pati, Amrita; Jansson, Janet R; Gilbert, Jack A; Tringe, Susannah Green; Lorenson, Thomas D; Hess, Matthias

    2014-12-01

    The metagenomes from two microbial consortia associated with natural oils seeping into the Pacific Ocean offshore the coast of Santa Barbara (California, USA) were determined to complement already existing metagenomes generated from microbial communities associated with hydrocarbons that pollute the marine ecosystem. This genomics resource article is the first of two publications reporting a total of four new metagenomes from oils that seep into the Santa Barbara Channel. Copyright © 2014 Elsevier B.V. All rights reserved.

  5. A Statistical Framework for the Functional Analysis of Metagenomes

    Energy Technology Data Exchange (ETDEWEB)

    Sharon, Itai; Pati, Amrita; Markowitz, Victor; Pinter, Ron Y.

    2008-10-01

    Metagenomic studies consider the genetic makeup of microbial communities as a whole, rather than their individual member organisms. The functional and metabolic potential of microbial communities can be analyzed by comparing the relative abundance of gene families in their collective genomic sequences (metagenome) under different conditions. Such comparisons require accurate estimation of gene family frequencies. They present a statistical framework for assessing these frequencies based on the Lander-Waterman theory developed originally for Whole Genome Shotgun (WGS) sequencing projects. They also provide a novel method for assessing the reliability of the estimations which can be used for removing seemingly unreliable measurements. They tested their method on a wide range of datasets, including simulated genomes and real WGS data from sequencing projects of whole genomes. Results suggest that their framework corrects inherent biases in accepted methods and provides a good approximation to the true statistics of gene families in WGS projects.

  6. Metagenomic species profiling using universal phylogenetic marker genes

    DEFF Research Database (Denmark)

    Sunagawa, Shinichi; Mende, Daniel R; Zeller, Georg

    2013-01-01

    To quantify known and unknown microorganisms at species-level resolution using shotgun sequencing data, we developed a method that establishes metagenomic operational taxonomic units (mOTUs) based on single-copy phylogenetic marker genes. Applied to 252 human fecal samples, the method revealed th...... that on average 43% of the species abundance and 58% of the richness cannot be captured by current reference genome-based methods. An implementation of the method is available at http://www.bork.embl.de/software/mOTU/.......To quantify known and unknown microorganisms at species-level resolution using shotgun sequencing data, we developed a method that establishes metagenomic operational taxonomic units (mOTUs) based on single-copy phylogenetic marker genes. Applied to 252 human fecal samples, the method revealed...

  7. Extremozymes from metagenome: Potential applications in food processing.

    Science.gov (United States)

    Khan, Mahejibin; Sathya, T A

    2017-06-12

    The long-established use of enzymes for food processing and product formulation has resulted in an increased enzyme market compounding to 7.0% annual growth rate. Advancements in molecular biology and recognition that enzymes with specific properties have application for industrial production of infant, baby and functional foods boosted research toward sourcing the genes of microorganisms for enzymes with distinctive properties. In this regard, functional metagenomics for extremozymes has gained attention on the premise that such enzymes can catalyze specific reactions. Hence, metagenomics that can isolate functional genes of unculturable extremophilic microorganisms has expanded attention as a promising tool. Developments in this field of research in relation to food sector are reviewed.

  8. Metagenome of a Versatile Chemolithoautotroph from Expanding Oceanic Dead Zones

    Energy Technology Data Exchange (ETDEWEB)

    Walsh, David A.; Zaikova, Elena; Howes, Charles L.; Song, Young; Wright, Jody; Tringe, Susannah G.; Tortell, Philippe D.; Hallam, Steven J.

    2009-07-15

    Oxygen minimum zones (OMZs), also known as oceanic"dead zones", are widespread oceanographic features currently expanding due to global warming and coastal eutrophication. Although inhospitable to metazoan life, OMZs support a thriving but cryptic microbiota whose combined metabolic activity is intimately connected to nutrient and trace gas cycling within the global ocean. Here we report time-resolved metagenomic analyses of a ubiquitous and abundant but uncultivated OMZ microbe (SUP05) closely related to chemoautotrophic gill symbionts of deep-sea clams and mussels. The SUP05 metagenome harbors a versatile repertoire of genes mediating autotrophic carbon assimilation, sulfur-oxidation and nitrate respiration responsive to a wide range of water column redox states. Thus, SUP05 plays integral roles in shaping nutrient and energy flow within oxygen-deficient oceanic waters via carbon sequestration, sulfide detoxification and biological nitrogen loss with important implications for marine productivity and atmospheric greenhouse control.

  9. Diverse circovirus-like genome architectures revealed by environmental metagenomics.

    Science.gov (United States)

    Rosario, Karyna; Duffy, Siobain; Breitbart, Mya

    2009-10-01

    Single-stranded DNA (ssDNA) viruses with circular genomes are the smallest viruses known to infect eukaryotes. The present study identified 10 novel genomes similar to ssDNA circoviruses through data-mining of public viral metagenomes. The metagenomic libraries included samples from reclaimed water and three different marine environments (Chesapeake Bay, British Columbia coastal waters and Sargasso Sea). All the genomes have similarities to the replication (Rep) protein of circoviruses; however, only half have genomic features consistent with known circoviruses. Some of the genomes exhibit a mixture of genomic features associated with different families of ssDNA viruses (i.e. circoviruses, geminiviruses and parvoviruses). Unique genome architectures and phylogenetic analysis of the Rep protein suggest that these viruses belong to novel genera and/or families. Investigating the complex community of ssDNA viruses in the environment can lead to the discovery of divergent species and help elucidate evolutionary links between ssDNA viruses.

  10. A metagenomic framework for the study of airborne microbial communities.

    Science.gov (United States)

    Yooseph, Shibu; Andrews-Pfannkoch, Cynthia; Tenney, Aaron; McQuaid, Jeff; Williamson, Shannon; Thiagarajan, Mathangi; Brami, Daniel; Zeigler-Allen, Lisa; Hoffman, Jeff; Goll, Johannes B; Fadrosh, Douglas; Glass, John; Adams, Mark D; Friedman, Robert; Venter, J Craig

    2013-01-01

    Understanding the microbial content of the air has important scientific, health, and economic implications. While studies have primarily characterized the taxonomic content of air samples by sequencing the 16S or 18S ribosomal RNA gene, direct analysis of the genomic content of airborne microorganisms has not been possible due to the extremely low density of biological material in airborne environments. We developed sampling and amplification methods to enable adequate DNA recovery to allow metagenomic profiling of air samples collected from indoor and outdoor environments. Air samples were collected from a large urban building, a medical center, a house, and a pier. Analyses of metagenomic data generated from these samples reveal airborne communities with a high degree of diversity and different genera abundance profiles. The identities of many of the taxonomic groups and protein families also allows for the identification of the likely sources of the sampled airborne bacteria.

  11. Construction and Screening of Marine Metagenomic Large Insert Libraries.

    Science.gov (United States)

    Weiland-Bräuer, Nancy; Langfeldt, Daniela; Schmitz, Ruth A

    2017-01-01

    The marine environment covers more than 70 % of the world's surface. Marine microbial communities are highly diverse and have evolved during extended evolutionary processes of physiological adaptations under the influence of a variety of ecological conditions and selection pressures. They harbor an enormous diversity of microbes with still unknown and probably new physiological characteristics. In the past, marine microbes, mostly bacteria of microbial consortia attached to marine tissues of multicellular organisms, have proven to be a rich source of highly potent bioactive compounds, which represent a considerable number of drug candidates. However, to date, the biodiversity of marine microbes and the versatility of their bioactive compounds and metabolites have not been fully explored. This chapter describes sampling in the marine environment, construction of metagenomic large insert libraries from marine habitats, and exemplarily one function based screen of metagenomic clones for identification of quorum quenching activities.

  12. The new science of metagenomics: revealing the secrets of our microbial planet

    National Research Council Canada - National Science Library

    Committee on Metagenomics: Challenges and Functional Applications, National Research Council

    2007-01-01

    .... The emerging field of metagenomics offers a new way of exploring the microbial world that will transform modern microbiology and lead to practical applications in medicine, agriculture, alternative...

  13. An audit of the reliability of influenza vaccination and medical information extracted from eHealth records in general practice.

    Science.gov (United States)

    Regan, Annette K; Gibbs, Robyn A; Effler, Paul V

    2018-05-31

    To evaluate the reliability of information in general practice (GP) electronic health records (EHRs), 2100 adult patients were randomly selected for interview regarding the presence of specific medical conditions and recent influenza vaccination. Agreement between self-report and data extracted from EHRs was compared using Cohen's kappa coefficient (k) and interpreted in accordance with Altman's Kappa Benchmarking criteria; 377 (18%) patients declined participation, and 608 (29%) could not be contacted. Of 1115 (53%) remaining, 856 (77%) were active patients (≥3 visits to the GP practice in the last two years) who provided complete information for analysis. Although a higher proportion of patients self-reported being vaccinated or having a medical condition compared to the EHR (50.7% vs 36.9%, and 39.4% vs 30.3%, respectively), there was "good" agreement between self-report and EHR for both vaccination status (κ = 0.67) and medical conditions (κ = 0.66). These findings suggest EHR may be useful for public health surveillance. Crown Copyright © 2018. Published by Elsevier Ltd. All rights reserved.

  14. Metagenome-derived haloalkane dehalogenases with novel catalytic properties

    Czech Academy of Sciences Publication Activity Database

    Kotík, Michael; Vaňáček, P.; Kuňka, A.; Prokop, Z.; Dambrovský, J.

    2017-01-01

    Roč. 101, č. 16 (2017), s. 6385-6397 ISSN 0175-7598 R&D Projects: GA ČR GAP504/10/0137; GA MŠk(CZ) LM2015047; GA MŠk(CZ) LM2015055 Institutional support: RVO:61388971 Keywords : Haloalkane dehalogenase * Metagenomic DNA * Heterologous production Subject RIV: CE - Biochemistry OBOR OECD: Biochemistry and molecular biology Impact factor: 3.420, year: 2016

  15. Bioprospecting metagenomics of decaying wood: mining for new glycoside hydrolases

    Directory of Open Access Journals (Sweden)

    Li Luen-Luen

    2011-08-01

    Full Text Available Abstract Background To efficiently deconstruct recalcitrant plant biomass to fermentable sugars in industrial processes, biocatalysts of higher performance and lower cost are required. The genetic diversity found in the metagenomes of natural microbial biomass decay communities may harbor such enzymes. Our goal was to discover and characterize new glycoside hydrolases (GHases from microbial biomass decay communities, especially those from unknown or never previously cultivated microorganisms. Results From the metagenome sequences of an anaerobic microbial community actively decaying poplar biomass, we identified approximately 4,000 GHase homologs. Based on homology to GHase families/activities of interest and the quality of the sequences, candidates were selected for full-length cloning and subsequent expression. As an alternative strategy, a metagenome expression library was constructed and screened for GHase activities. These combined efforts resulted in the cloning of four novel GHases that could be successfully expressed in Escherichia coli. Further characterization showed that two enzymes showed significant activity on p-nitrophenyl-α-L-arabinofuranoside, one enzyme had significant activity against p-nitrophenyl-β-D-glucopyranoside, and one enzyme showed significant activity against p-nitrophenyl-β-D-xylopyranoside. Enzymes were also tested in the presence of ionic liquids. Conclusions Metagenomics provides a good resource for mining novel biomass degrading enzymes and for screening of cellulolytic enzyme activities. The four GHases that were cloned may have potential application for deconstruction of biomass pretreated with ionic liquids, as they remain active in the presence of up to 20% ionic liquid (except for 1-ethyl-3-methylimidazolium diethyl phosphate. Alternatively, ionic liquids might be used to immobilize or stabilize these enzymes for minimal solvent processing of biomass.

  16. Forest harvesting reduces the soil metagenomic potential for biomass decomposition.

    Science.gov (United States)

    Cardenas, Erick; Kranabetter, J M; Hope, Graeme; Maas, Kendra R; Hallam, Steven; Mohn, William W

    2015-11-01

    Soil is the key resource that must be managed to ensure sustainable forest productivity. Soil microbial communities mediate numerous essential ecosystem functions, and recent studies show that forest harvesting alters soil community composition. From a long-term soil productivity study site in a temperate coniferous forest in British Columbia, 21 forest soil shotgun metagenomes were generated, totaling 187 Gb. A method to analyze unassembled metagenome reads from the complex community was optimized and validated. The subsequent metagenome analysis revealed that, 12 years after forest harvesting, there were 16% and 8% reductions in relative abundances of biomass decomposition genes in the organic and mineral soil layers, respectively. Organic and mineral soil layers differed markedly in genetic potential for biomass degradation, with the organic layer having greater potential and being more strongly affected by harvesting. Gene families were disproportionately affected, and we identified 41 gene families consistently affected by harvesting, including families involved in lignin, cellulose, hemicellulose and pectin degradation. The results strongly suggest that harvesting profoundly altered below-ground cycling of carbon and other nutrients at this site, with potentially important consequences for forest regeneration. Thus, it is important to determine whether these changes foreshadow long-term changes in forest productivity or resilience and whether these changes are broadly characteristic of harvested forests.

  17. Challenges of the Unknown: Clinical Application of Microbial Metagenomics

    Directory of Open Access Journals (Sweden)

    Graham Rose

    2015-01-01

    Full Text Available Availability of fast, high throughput and low cost whole genome sequencing holds great promise within public health microbiology, with applications ranging from outbreak detection and tracking transmission events to understanding the role played by microbial communities in health and disease. Within clinical metagenomics, identifying microorganisms from a complex and host enriched background remains a central computational challenge. As proof of principle, we sequenced two metagenomic samples, a known viral mixture of 25 human pathogens and an unknown complex biological model using benchtop technology. The datasets were then analysed using a bioinformatic pipeline developed around recent fast classification methods. A targeted approach was able to detect 20 of the viruses against a background of host contamination from multiple sources and bacterial contamination. An alternative untargeted identification method was highly correlated with these classifications, and over 1,600 species were identified when applied to the complex biological model, including several species captured at over 50% genome coverage. In summary, this study demonstrates the great potential of applying metagenomics within the clinical laboratory setting and that this can be achieved using infrastructure available to nondedicated sequencing centres.

  18. Bioinformatic approaches reveal metagenomic characterization of soil microbial community.

    Directory of Open Access Journals (Sweden)

    Zhuofei Xu

    Full Text Available As is well known, soil is a complex ecosystem harboring the most prokaryotic biodiversity on the Earth. In recent years, the advent of high-throughput sequencing techniques has greatly facilitated the progress of soil ecological studies. However, how to effectively understand the underlying biological features of large-scale sequencing data is a new challenge. In the present study, we used 33 publicly available metagenomes from diverse soil sites (i.e. grassland, forest soil, desert, Arctic soil, and mangrove sediment and integrated some state-of-the-art computational tools to explore the phylogenetic and functional characterizations of the microbial communities in soil. Microbial composition and metabolic potential in soils were comprehensively illustrated at the metagenomic level. A spectrum of metagenomic biomarkers containing 46 taxa and 33 metabolic modules were detected to be significantly differential that could be used as indicators to distinguish at least one of five soil communities. The co-occurrence associations between complex microbial compositions and functions were inferred by network-based approaches. Our results together with the established bioinformatic pipelines should provide a foundation for future research into the relation between soil biodiversity and ecosystem function.

  19. PhyloSift: phylogenetic analysis of genomes and metagenomes.

    Science.gov (United States)

    Darling, Aaron E; Jospin, Guillaume; Lowe, Eric; Matsen, Frederick A; Bik, Holly M; Eisen, Jonathan A

    2014-01-01

    Like all organisms on the planet, environmental microbes are subject to the forces of molecular evolution. Metagenomic sequencing provides a means to access the DNA sequence of uncultured microbes. By combining DNA sequencing of microbial communities with evolutionary modeling and phylogenetic analysis we might obtain new insights into microbiology and also provide a basis for practical tools such as forensic pathogen detection. In this work we present an approach to leverage phylogenetic analysis of metagenomic sequence data to conduct several types of analysis. First, we present a method to conduct phylogeny-driven Bayesian hypothesis tests for the presence of an organism in a sample. Second, we present a means to compare community structure across a collection of many samples and develop direct associations between the abundance of certain organisms and sample metadata. Third, we apply new tools to analyze the phylogenetic diversity of microbial communities and again demonstrate how this can be associated to sample metadata. These analyses are implemented in an open source software pipeline called PhyloSift. As a pipeline, PhyloSift incorporates several other programs including LAST, HMMER, and pplacer to automate phylogenetic analysis of protein coding and RNA sequences in metagenomic datasets generated by modern sequencing platforms (e.g., Illumina, 454).

  20. PhyloSift: phylogenetic analysis of genomes and metagenomes

    Directory of Open Access Journals (Sweden)

    Aaron E. Darling

    2014-01-01

    Full Text Available Like all organisms on the planet, environmental microbes are subject to the forces of molecular evolution. Metagenomic sequencing provides a means to access the DNA sequence of uncultured microbes. By combining DNA sequencing of microbial communities with evolutionary modeling and phylogenetic analysis we might obtain new insights into microbiology and also provide a basis for practical tools such as forensic pathogen detection.In this work we present an approach to leverage phylogenetic analysis of metagenomic sequence data to conduct several types of analysis. First, we present a method to conduct phylogeny-driven Bayesian hypothesis tests for the presence of an organism in a sample. Second, we present a means to compare community structure across a collection of many samples and develop direct associations between the abundance of certain organisms and sample metadata. Third, we apply new tools to analyze the phylogenetic diversity of microbial communities and again demonstrate how this can be associated to sample metadata.These analyses are implemented in an open source software pipeline called PhyloSift. As a pipeline, PhyloSift incorporates several other programs including LAST, HMMER, and pplacer to automate phylogenetic analysis of protein coding and RNA sequences in metagenomic datasets generated by modern sequencing platforms (e.g., Illumina, 454.

  1. In-depth resistome analysis by targeted metagenomics.

    Science.gov (United States)

    Lanza, Val F; Baquero, Fernando; Martínez, José Luís; Ramos-Ruíz, Ricardo; González-Zorn, Bruno; Andremont, Antoine; Sánchez-Valenzuela, Antonio; Ehrlich, Stanislav Dusko; Kennedy, Sean; Ruppé, Etienne; van Schaik, Willem; Willems, Rob J; de la Cruz, Fernando; Coque, Teresa M

    2018-01-15

    Antimicrobial resistance is a major global health challenge. Metagenomics allows analyzing the presence and dynamics of "resistomes" (the ensemble of genes encoding antimicrobial resistance in a given microbiome) in disparate microbial ecosystems. However, the low sensitivity and specificity of available metagenomic methods preclude the detection of minority populations (often present below their detection threshold) and/or the identification of allelic variants that differ in the resulting phenotype. Here, we describe a novel strategy that combines targeted metagenomics using last generation in-solution capture platforms, with novel bioinformatics tools to establish a standardized framework that allows both quantitative and qualitative analyses of resistomes. We developed ResCap, a targeted sequence capture platform based on SeqCapEZ (NimbleGene) technology, which includes probes for 8667 canonical resistance genes (7963 antibiotic resistance genes and 704 genes conferring resistance to metals or biocides), and 2517 relaxase genes (plasmid markers) and 78,600 genes homologous to the previous identified targets (47,806 for antibiotics and 30,794 for biocides or metals). Its performance was compared with metagenomic shotgun sequencing (MSS) for 17 fecal samples (9 humans, 8 swine). ResCap significantly improves MSS to detect "gene abundance" (from 2.0 to 83.2%) and "gene diversity" (26 versus 14.9 genes unequivocally detected per sample per million of reads; the number of reads unequivocally mapped increasing up to 300-fold by using ResCap), which were calculated using novel bioinformatic tools. ResCap also facilitated the analysis of novel genes potentially involved in the resistance to antibiotics, metals, biocides, or any combination thereof. ResCap, the first targeted sequence capture, specifically developed to analyze resistomes, greatly enhances the sensitivity and specificity of available metagenomic methods and offers the possibility to analyze genes

  2. Extracting information on the spatial variability in erosion rate stored in detrital cooling age distributions in river sands

    Science.gov (United States)

    Braun, Jean; Gemignani, Lorenzo; van der Beek, Peter

    2018-03-01

    One of the main purposes of detrital thermochronology is to provide constraints on the regional-scale exhumation rate and its spatial variability in actively eroding mountain ranges. Procedures that use cooling age distributions coupled with hypsometry and thermal models have been developed in order to extract quantitative estimates of erosion rate and its spatial distribution, assuming steady state between tectonic uplift and erosion. This hypothesis precludes the use of these procedures to assess the likely transient response of mountain belts to changes in tectonic or climatic forcing. Other methods are based on an a priori knowledge of the in situ distribution of ages to interpret the detrital age distributions. In this paper, we describe a simple method that, using the observed detrital mineral age distributions collected along a river, allows us to extract information about the relative distribution of erosion rates in an eroding catchment without relying on a steady-state assumption, the value of thermal parameters or an a priori knowledge of in situ age distributions. The model is based on a relatively low number of parameters describing lithological variability among the various sub-catchments and their sizes and only uses the raw ages. The method we propose is tested against synthetic age distributions to demonstrate its accuracy and the optimum conditions for it use. In order to illustrate the method, we invert age distributions collected along the main trunk of the Tsangpo-Siang-Brahmaputra river system in the eastern Himalaya. From the inversion of the cooling age distributions we predict present-day erosion rates of the catchments along the Tsangpo-Siang-Brahmaputra river system, as well as some of its tributaries. We show that detrital age distributions contain dual information about present-day erosion rate, i.e., from the predicted distribution of surface ages within each catchment and from the relative contribution of any given catchment to the

  3. Extracting information on the spatial variability in erosion rate stored in detrital cooling age distributions in river sands

    Directory of Open Access Journals (Sweden)

    J. Braun

    2018-03-01

    Full Text Available One of the main purposes of detrital thermochronology is to provide constraints on the regional-scale exhumation rate and its spatial variability in actively eroding mountain ranges. Procedures that use cooling age distributions coupled with hypsometry and thermal models have been developed in order to extract quantitative estimates of erosion rate and its spatial distribution, assuming steady state between tectonic uplift and erosion. This hypothesis precludes the use of these procedures to assess the likely transient response of mountain belts to changes in tectonic or climatic forcing. Other methods are based on an a priori knowledge of the in situ distribution of ages to interpret the detrital age distributions. In this paper, we describe a simple method that, using the observed detrital mineral age distributions collected along a river, allows us to extract information about the relative distribution of erosion rates in an eroding catchment without relying on a steady-state assumption, the value of thermal parameters or an a priori knowledge of in situ age distributions. The model is based on a relatively low number of parameters describing lithological variability among the various sub-catchments and their sizes and only uses the raw ages. The method we propose is tested against synthetic age distributions to demonstrate its accuracy and the optimum conditions for it use. In order to illustrate the method, we invert age distributions collected along the main trunk of the Tsangpo–Siang–Brahmaputra river system in the eastern Himalaya. From the inversion of the cooling age distributions we predict present-day erosion rates of the catchments along the Tsangpo–Siang–Brahmaputra river system, as well as some of its tributaries. We show that detrital age distributions contain dual information about present-day erosion rate, i.e., from the predicted distribution of surface ages within each catchment and from the relative contribution of

  4. NeSSM: a Next-generation Sequencing Simulator for Metagenomics.

    Directory of Open Access Journals (Sweden)

    Ben Jia

    Full Text Available BACKGROUND: Metagenomics can reveal the vast majority of microbes that have been missed by traditional cultivation-based methods. Due to its extremely wide range of application areas, fast metagenome sequencing simulation systems with high fidelity are in great demand to facilitate the development and comparison of metagenomics analysis tools. RESULTS: We present here a customizable metagenome simulation system: NeSSM (Next-generation Sequencing Simulator for Metagenomics. Combining complete genomes currently available, a community composition table, and sequencing parameters, it can simulate metagenome sequencing better than existing systems. Sequencing error models based on the explicit distribution of errors at each base and sequencing coverage bias are incorporated in the simulation. In order to improve the fidelity of simulation, tools are provided by NeSSM to estimate the sequencing error models, sequencing coverage bias and the community composition directly from existing metagenome sequencing data. Currently, NeSSM supports single-end and pair-end sequencing for both 454 and Illumina platforms. In addition, a GPU (graphics processing units version of NeSSM is also developed to accelerate the simulation. By comparing the simulated sequencing data from NeSSM with experimental metagenome sequencing data, we have demonstrated that NeSSM performs better in many aspects than existing popular metagenome simulators, such as MetaSim, GemSIM and Grinder. The GPU version of NeSSM is more than one-order of magnitude faster than MetaSim. CONCLUSIONS: NeSSM is a fast simulation system for high-throughput metagenome sequencing. It can be helpful to develop tools and evaluate strategies for metagenomics analysis and it's freely available for academic users at http://cbb.sjtu.edu.cn/~ccwei/pub/software/NeSSM.php.

  5. Metagenomic and ecophysiological analysis of biofilms colonizing coral substrates: "Life after death of coral"

    Science.gov (United States)

    Sanchez, A., Sr.; Cerqueda-Garcia, D.; Falcón, L. I.; Iglesias-Prieto, R., Sr.

    2015-12-01

    Coral reefs are the most productive ecosystems on the planet and are the most important carbonated structures of biological origin. However, global warming is affecting the health and functionality of these ecosystems. Specifically, most of the Acropora sp. stony corals have declined their population all over the Mexican Caribbean in more than ~80% of their original coverage, resulting in vast extensions of dead coral rubble. When the coral dies, the skeleton begins to be colonized by algae, sponges, bacteria and others, forming a highly diverse biofilm. We analyzed the metagenomes of the dead A. palmata rubbles from Puerto Morelos, in the Mexican Caribbean. Also, we quantified the elemental composition of biomass and measured nitrogen fixation and emission of greenhouse gases over 24 hrs. This works provides information on how the community is composed and functions after the death of the coral, visualizing a possible picture for a world without coral reefs.

  6. Profile hidden Markov models for the detection of viruses within metagenomic sequence data.

    Directory of Open Access Journals (Sweden)

    Peter Skewes-Cox

    Full Text Available Rapid, sensitive, and specific virus detection is an important component of clinical diagnostics. Massively parallel sequencing enables new diagnostic opportunities that complement traditional serological and PCR based techniques. While massively parallel sequencing promises the benefits of being more comprehensive and less biased than traditional approaches, it presents new analytical challenges, especially with respect to detection of pathogen sequences in metagenomic contexts. To a first approximation, the initial detection of viruses can be achieved simply through alignment of sequence reads or assembled contigs to a reference database of pathogen genomes with tools such as BLAST. However, recognition of highly divergent viral sequences is problematic, and may be further complicated by the inherently high mutation rates of some viral types, especially RNA viruses. In these cases, increased sensitivity may be achieved by leveraging position-specific information during the alignment process. Here, we constructed HMMER3-compatible profile hidden Markov models (profile HMMs from all the virally annotated proteins in RefSeq in an automated fashion using a custom-built bioinformatic pipeline. We then tested the ability of these viral profile HMMs ("vFams" to accurately classify sequences as viral or non-viral. Cross-validation experiments with full-length gene sequences showed that the vFams were able to recall 91% of left-out viral test sequences without erroneously classifying any non-viral sequences into viral protein clusters. Thorough reanalysis of previously published metagenomic datasets with a set of the best-performing vFams showed that they were more sensitive than BLAST for detecting sequences originating from more distant relatives of known viruses. To facilitate the use of the vFams for rapid detection of remote viral homologs in metagenomic data, we provide two sets of vFams, comprising more than 4,000 vFams each, in the HMMER3

  7. A metagenomic snapshot of taxonomic and functional diversity in an alpine glacier cryoconite ecosystem

    International Nuclear Information System (INIS)

    Edwards, Arwyn; Pachebat, Justin A; Swain, Martin; Hegarty, Matt; Rassner, Sara M E; Hodson, Andrew J; Irvine-Fynn, Tristram D L; Sattler, Birgit

    2013-01-01

    Cryoconite is a microbe–mineral aggregate which darkens the ice surface of glaciers. Microbial process and marker gene PCR-dependent measurements reveal active and diverse cryoconite microbial communities on polar glaciers. Here, we provide the first report of a cryoconite metagenome and culture-independent study of alpine cryoconite microbial diversity. We assembled 1.2 Gbp of metagenomic DNA sequenced using an Illumina HiScanSQ from cryoconite holes across the ablation zone of Rotmoosferner in the Austrian Alps. The metagenome revealed a bacterially-dominated community, with Proteobacteria (62% of bacterial-assigned contigs) and Bacteroidetes (14%) considerably more abundant than Cyanobacteria (2.5%). Streptophyte DNA dominated the eukaryotic metagenome. Functional genes linked to N, Fe, S and P cycling illustrated an acquisitive trend and a nitrogen cycle based upon efficient ammonia recycling. A comparison of 32 metagenome datasets revealed a similarity in functional profiles between the cryoconite and metagenomes characterized from other cold microbe–mineral aggregates. Overall, the metagenomic snapshot reveals the cryoconite ecosystem of this alpine glacier as dependent on scavenging carbon and nutrients from allochthonous sources, in particular mosses transported by wind from ice-marginal habitats, consistent with net heterotrophy indicated by productivity measurements. A transition from singular snapshots of cryoconite metagenomes to comparative analyses is advocated. (letter)

  8. Beyond research: a primer for considerations on using viral metagenomics in the field and clinic

    NARCIS (Netherlands)

    Hall, Richard J; Draper, Jenny L; Nielsen, Fiona G G; Dutilh, Bas E

    2015-01-01

    Powered by recent advances in next-generation sequencing technologies, metagenomics has already unveiled vast microbial biodiversity in a range of environments, and is increasingly being applied in clinics for difficult-to-diagnose cases. It can be tempting to suggest that metagenomics could be used

  9. A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes

    NARCIS (Netherlands)

    Dutilh, Bas E; Cassman, Noriko; McNair, Katelyn; Sanchez, Savannah E; Silva, Genivaldo G Z; Boling, Lance; Barr, Jeremy J; Speth, Daan R; Seguritan, Victor; Aziz, Ramy K; Felts, Ben; Dinsdale, Elizabeth A; Mokili, John L; Edwards, Robert A

    2014-01-01

    Metagenomics, or sequencing of the genetic material from a complete microbial community, is a promising tool to discover novel microbes and viruses. Viral metagenomes typically contain many unknown sequences. Here we describe the discovery of a previously unidentified bacteriophage present in the

  10. PCR screening of an African fermented pearl-millet porridge metagenome to investigate the nutritional potential of its microbiota.

    Science.gov (United States)

    Saubade, Fabien; Humblot, Christèle; Hemery, Youna M; Guyot, Jean-Pierre

    2017-03-06

    Cereals are staple foods in most African countries, and many African cereal-based foods are spontaneously fermented. The nutritional quality of cereal products can be enhanced through fermentation, and traditional cereal-based fermented foods (CBFFs) are possible sources of lactic acid bacteria (LAB) with useful nutritional properties. The nutritional properties of LAB vary depending on the species and even on the strain, and the microbial composition of traditional CBFFs varies from one traditional production unit (TPU) to another. The nutritional quality of traditional CBFFs may thus vary depending on their microbial composition. As the isolation of potentially useful LAB from traditional CBFFs can be very time consuming, the aim of this study was to use PCR to assess the nutritional potential of LAB directly on the metagenomes of pearl-millet based fermented porridges (ben-saalga) from Burkina Faso. Genes encoding enzymes involved in different nutritional activities were screened in 50 metagenomes extracted from samples collected in 10 TPUs in Ouagadougou. The variability of the genetic potential was recorded. Certain genes were never detected in the metagenomes (genes involved in carotenoid synthesis) while others were frequently detected (genes involved in folate and riboflavin production, starch hydrolysis, polyphenol degradation). Highly variable microbial composition - assessed by real-time PCR - was observed among samples collected in different TPUs, but also among samples from the same TPU. The high frequency of the presence of genes did not necessarily correlate with in situ measurements of the expected products. Indeed, no significant correlation was found between the microbial variability and the variability of the genetic potential. In spite of the high rate of detection (80%) of both genes folP and folK, encoding enzymes involved in folate synthesis, the folate content in ben-saalga was rather low (median: 0.5μg/100g fresh weight basis). This work

  11. Introduction to Metagenomics at DOE JGI: Program Overview and Program Informatics (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    Energy Technology Data Exchange (ETDEWEB)

    Tringe, Susannah

    2011-10-12

    Susannah Tringe of the DOE Joint Genome Institute talks about the Program Overview and Program Informatics at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  12. Metagenomic Analysis of the Sponge Discodermia Reveals the Production of the Cyanobacterial Natural Product Kasumigamide by 'Entotheonella'.

    Science.gov (United States)

    Nakashima, Yu; Egami, Yoko; Kimura, Miki; Wakimoto, Toshiyuki; Abe, Ikuro

    2016-01-01

    Sponge metagenomes are a useful platform to mine cryptic biosynthetic gene clusters responsible for production of natural products involved in the sponge-microbe association. Since numerous sponge-derived bioactive metabolites are biosynthesized by the symbiotic bacteria, this strategy may concurrently reveal sponge-symbiont produced compounds. Accordingly, a metagenomic analysis of the Japanese marine sponge Discodermia calyx has resulted in the identification of a hybrid type I polyketide synthase-nonribosomal peptide synthetase gene (kas). Bioinformatic analysis of the gene product suggested its involvement in the biosynthesis of kasumigamide, a tetrapeptide originally isolated from freshwater free-living cyanobacterium Microcystis aeruginosa NIES-87. Subsequent investigation of the sponge metabolic profile revealed the presence of kasumigamide in the sponge extract. The kasumigamide producing bacterium was identified as an 'Entotheonella' sp. Moreover, an in silico analysis of kas gene homologs uncovered the presence of kas family genes in two additional bacteria from different phyla. The production of kasumigamide by distantly related multiple bacterial strains implicates horizontal gene transfer and raises the potential for a wider distribution across other bacterial groups.

  13. Discovery of Microorganisms and Enzymes Involved in High-Solids Decomposition of Rice Straw Using Metagenomic Analyses

    Science.gov (United States)

    D’haeseleer, Patrik; Khudyakov, Jane; Burd, Helcio; Hadi, Masood; Simmons, Blake A.; Singer, Steven W.; Thelen, Michael P.; VanderGheynst, Jean S.

    2013-01-01

    High-solids incubations were performed to enrich for microbial communities and enzymes that decompose rice straw under mesophilic (35°C) and thermophilic (55°C) conditions. Thermophilic enrichments yielded a community that was 7.5 times more metabolically active on rice straw than mesophilic enrichments. Extracted xylanase and endoglucanse activities were also 2.6 and 13.4 times greater, respectively, for thermophilic enrichments. Metagenome sequencing was performed on enriched communities to determine community composition and mine for genes encoding lignocellulolytic enzymes. Proteobacteria were found to dominate the mesophilic community while Actinobacteria were most abundant in the thermophilic community. Analysis of protein family representation in each metagenome indicated that cellobiohydrolases containing carbohydrate binding module 2 (CBM2) were significantly overrepresented in the thermophilic community. Micromonospora, a member of Actinobacteria, primarily housed these genes in the thermophilic community. In light of these findings, Micromonospora and other closely related Actinobacteria genera appear to be promising sources of thermophilic lignocellulolytic enzymes for rice straw deconstruction under high-solids conditions. Furthermore, these discoveries warrant future research to determine if exoglucanases with CBM2 represent thermostable enzymes tolerant to the process conditions expected to be encountered during industrial biofuel production. PMID:24205054

  14. Discovery of microorganisms and enzymes involved in high-solids decomposition of rice straw using metagenomic analyses.

    Directory of Open Access Journals (Sweden)

    Amitha P Reddy

    Full Text Available High-solids incubations were performed to enrich for microbial communities and enzymes that decompose rice straw under mesophilic (35°C and thermophilic (55°C conditions. Thermophilic enrichments yielded a community that was 7.5 times more metabolically active on rice straw than mesophilic enrichments. Extracted xylanase and endoglucanse activities were also 2.6 and 13.4 times greater, respectively, for thermophilic enrichments. Metagenome sequencing was performed on enriched communities to determine community composition and mine for genes encoding lignocellulolytic enzymes. Proteobacteria were found to dominate the mesophilic community while Actinobacteria were most abundant in the thermophilic community. Analysis of protein family representation in each metagenome indicated that cellobiohydrolases containing carbohydrate binding module 2 (CBM2 were significantly overrepresented in the thermophilic community. Micromonospora, a member of Actinobacteria, primarily housed these genes in the thermophilic community. In light of these findings, Micromonospora and other closely related Actinobacteria genera appear to be promising sources of thermophilic lignocellulolytic enzymes for rice straw deconstruction under high-solids conditions. Furthermore, these discoveries warrant future research to determine if exoglucanases with CBM2 represent thermostable enzymes tolerant to the process conditions expected to be encountered during industrial biofuel production.

  15. In vitro and in silico characterization of metagenomic soil-derived cellulases capable of hydrolyzing oil palm empty fruit bunch

    Directory of Open Access Journals (Sweden)

    Laura Marcela Palma Medina

    2017-09-01

    Full Text Available Diversification of raw material for biofuel production is of interest to both academia and industry. One attractive substrate is a renewable lignocellulosic material such as oil palm (Elaeis guineensis Jacq. empty fruit bunch (OPEFB, which is a byproduct of the palm oil industry. This study aimed to characterize cellulases active against this substrate. Cellulases with activity against OPEFB were identified from a metagenomic library obtained from DNA extracted from a high-Andean forest ecosystem. Our findings show that the highest cellulolytic activities were obtained at pH and temperature ranges of 4–10 and 30 °C–60 °C, respectively. Due to the heterogeneous character of the system, degradation profiles were fitted to a fractal-like kinetic model, evidencing transport mass transfer limitations. The sequence analysis of the metagenomic library inserts revealed three glycosyl hydrolase families. Finally, molecular docking simulations of the cellulases were carried out corroborating possible exoglucanase and β-glucosidase activity.

  16. Survey of endosymbionts in the Diaphorina citri metagenome and assembly of a Wolbachia wDi draft genome.

    Directory of Open Access Journals (Sweden)

    Surya Saha

    Full Text Available Diaphorina citri (Hemiptera: Psyllidae, the Asian citrus psyllid, is the insect vector of Ca. Liberibacter asiaticus, the causal agent of citrus greening disease. Sequencing of the D. citri metagenome has been initiated to gain better understanding of the biology of this organism and the potential roles of its bacterial endosymbionts. To corroborate candidate endosymbionts previously identified by rDNA amplification, raw reads from the D. citri metagenome sequence were mapped to reference genome sequences. Results of the read mapping provided the most support for Wolbachia and an enteric bacterium most similar to Salmonella. Wolbachia-derived reads were extracted using the complete genome sequences for four Wolbachia strains. Reads were assembled into a draft genome sequence, and the annotation assessed for the presence of features potentially involved in host interaction. Genome alignment with the complete sequences reveals membership of Wolbachia wDi in supergroup B, further supported by phylogenetic analysis of FtsZ. FtsZ and Wsp phylogenies additionally indicate that the Wolbachia strain in the Florida D. citri isolate falls into a sub-clade of supergroup B, distinct from Wolbachia present in Chinese D. citri isolates, supporting the hypothesis that the D. citri introduced into Florida did not originate from China.

  17. Survey of endosymbionts in the Diaphorina citri metagenome and assembly of a Wolbachia wDi draft genome.

    Science.gov (United States)

    Saha, Surya; Hunter, Wayne B; Reese, Justin; Morgan, J Kent; Marutani-Hert, Mizuri; Huang, Hong; Lindeberg, Magdalen

    2012-01-01

    Diaphorina citri (Hemiptera: Psyllidae), the Asian citrus psyllid, is the insect vector of Ca. Liberibacter asiaticus, the causal agent of citrus greening disease. Sequencing of the D. citri metagenome has been initiated to gain better understanding of the biology of this organism and the potential roles of its bacterial endosymbionts. To corroborate candidate endosymbionts previously identified by rDNA amplification, raw reads from the D. citri metagenome sequence were mapped to reference genome sequences. Results of the read mapping provided the most support for Wolbachia and an enteric bacterium most similar to Salmonella. Wolbachia-derived reads were extracted using the complete genome sequences for four Wolbachia strains. Reads were assembled into a draft genome sequence, and the annotation assessed for the presence of features potentially involved in host interaction. Genome alignment with the complete sequences reveals membership of Wolbachia wDi in supergroup B, further supported by phylogenetic analysis of FtsZ. FtsZ and Wsp phylogenies additionally indicate that the Wolbachia strain in the Florida D. citri isolate falls into a sub-clade of supergroup B, distinct from Wolbachia present in Chinese D. citri isolates, supporting the hypothesis that the D. citri introduced into Florida did not originate from China.

  18. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes.

    Science.gov (United States)

    Nielsen, H Bjørn; Almeida, Mathieu; Juncker, Agnieszka Sierakowska; Rasmussen, Simon; Li, Junhua; Sunagawa, Shinichi; Plichta, Damian R; Gautier, Laurent; Pedersen, Anders G; Le Chatelier, Emmanuelle; Pelletier, Eric; Bonde, Ida; Nielsen, Trine; Manichanh, Chaysavanh; Arumugam, Manimozhiyan; Batto, Jean-Michel; Quintanilha Dos Santos, Marcelo B; Blom, Nikolaj; Borruel, Natalia; Burgdorf, Kristoffer S; Boumezbeur, Fouad; Casellas, Francesc; Doré, Joël; Dworzynski, Piotr; Guarner, Francisco; Hansen, Torben; Hildebrand, Falk; Kaas, Rolf S; Kennedy, Sean; Kristiansen, Karsten; Kultima, Jens Roat; Léonard, Pierre; Levenez, Florence; Lund, Ole; Moumen, Bouziane; Le Paslier, Denis; Pons, Nicolas; Pedersen, Oluf; Prifti, Edi; Qin, Junjie; Raes, Jeroen; Sørensen, Søren; Tap, Julien; Tims, Sebastian; Ussery, David W; Yamada, Takuji; Renault, Pierre; Sicheritz-Ponten, Thomas; Bork, Peer; Wang, Jun; Brunak, Søren; Ehrlich, S Dusko

    2014-08-01

    Most current approaches for analyzing metagenomic data rely on comparisons to reference genomes, but the microbial diversity of many environments extends far beyond what is covered by reference databases. De novo segregation of complex metagenomic data into specific biological entities, such as particular bacterial strains or viruses, remains a largely unsolved problem. Here we present a method, based on binning co-abundant genes across a series of metagenomic samples, that enables comprehensive discovery of new microbial organisms, viruses and co-inherited genetic entities and aids assembly of microbial genomes without the need for reference sequences. We demonstrate the method on data from 396 human gut microbiome samples and identify 7,381 co-abundance gene groups (CAGs), including 741 metagenomic species (MGS). We use these to assemble 238 high-quality microbial genomes and identify affiliations between MGS and hundreds of viruses or genetic entities. Our method provides the means for comprehensive profiling of the diversity within complex metagenomic samples.

  19. A COMPARATIVE ANALYSIS OF WEB INFORMATION EXTRACTION TECHNIQUES DEEP LEARNING vs. NAÏVE BAYES vs. BACK PROPAGATION NEURAL NETWORKS IN WEB DOCUMENT EXTRACTION

    OpenAIRE

    J. Sharmila; A. Subramani

    2016-01-01

    Web mining related exploration is getting the chance to be more essential these days in view of the reason that a lot of information is overseen through the web. Web utilization is expanding in an uncontrolled way. A particular framework is required for controlling such extensive measure of information in the web space. Web mining is ordered into three noteworthy divisions: Web content mining, web usage mining and web structure mining. Tak-Lam Wong has proposed a web content mining methodolog...

  20. MG-Digger: an automated pipeline to search for giant virus-related sequences in metagenomes

    Directory of Open Access Journals (Sweden)

    Jonathan eVerneau

    2016-03-01

    Full Text Available The number of metagenomic studies conducted each year is growing dramatically. Storage and analysis of such big data is difficult and time-consuming. Interestingly, analysis shows that environmental and human metagenomes include a significant amount of non-annotated sequences, representing a ‘dark matter’. We established a bioinformatics pipeline that automatically detects metagenome reads matching query sequences from a given set and applied this tool to the detection of sequences matching large and giant DNA viral members of the proposed order Megavirales or virophages. A total of 1,045 environmental and human metagenomes (≈ 1 Terabase pairs were collected, processed and stored on our bioinformatics server. In addition, nucleotide and protein sequences from 93 Megavirales representatives, including 19 giant viruses of amoeba, and five virophages, were collected. The pipeline was generated by scripts written in Python language and entitled MG-Digger. Metagenomes previously found to contain megavirus-like sequences were tested as controls. MG-Digger was able to annotate hundreds of metagenome sequences as best matching those of giant viruses. These sequences were most often found to be similar to phycodnavirus or mimivirus sequences, but included reads related to recently available pandoraviruses, Pithovirus sibericum, and faustoviruses. Compared to other tools, MG-Digger combined stand-alone use on Linux or Windows operating systems through a user-friendly interface, implementation of ready-to-use customized metagenome databases and query sequence databases, adjustable parameters for BLAST searches, and creation of output files containing selected reads with best match identification. Compared to Metavir 2, a reference tool in viral metagenome analysis, MG-Digger detected 8% more true positive Megavirales-related reads in a control metagenome. The present work shows that massive, automated and recurrent analyses of metagenomes are

  1. Combining gene prediction methods to improve metagenomic gene annotation

    Directory of Open Access Journals (Sweden)

    Rosen Gail L

    2011-01-01

    Full Text Available Abstract Background Traditional gene annotation methods rely on characteristics that may not be available in short reads generated from next generation technology, resulting in suboptimal performance for metagenomic (environmental samples. Therefore, in recent years, new programs have been developed that optimize performance on short reads. In this work, we benchmark three metagenomic gene prediction programs and combine their predictions to improve metagenomic read gene annotation. Results We not only analyze the programs' performance at different read-lengths like similar studies, but also separate different types of reads, including intra- and intergenic regions, for analysis. The main deficiencies are in the algorithms' ability to predict non-coding regions and gene edges, resulting in more false-positives and false-negatives than desired. In fact, the specificities of the algorithms are notably worse than the sensitivities. By combining the programs' predictions, we show significant improvement in specificity at minimal cost to sensitivity, resulting in 4% improvement in accuracy for 100 bp reads with ~1% improvement in accuracy for 200 bp reads and above. To correctly annotate the start and stop of the genes, we find that a consensus of all the predictors performs best for shorter read lengths while a unanimous agreement is better for longer read lengths, boosting annotation accuracy by 1-8%. We also demonstrate use of the classifier combinations on a real dataset. Conclusions To optimize the performance for both prediction and annotation accuracies, we conclude that the consensus of all methods (or a majority vote is the best for reads 400 bp and shorter, while using the intersection of GeneMark and Orphelia predictions is the best for reads 500 bp and longer. We demonstrate that most methods predict over 80% coding (including partially coding reads on a real human gut sample sequenced by Illumina technology.

  2. Assembling the Marine Metagenome, One Cell at a Time

    Energy Technology Data Exchange (ETDEWEB)

    Woyke, Tanja; Xie, Gary; Copeland, Alex; Gonzalez, Jose M.; Han, Cliff; Kiss, Hajnalka; Saw, Jimmy H.; Senin, Pavel; Yang, Chi; Chatterji, Sourav; Cheng, Jan-Fang; Eisen, Jonathan A.; Sieracki, Michael E.; Stepanauskas, Ramunas

    2010-06-24

    The difficulty associated with the cultivation of most microorganisms and the complexity of natural microbial assemblages, such as marine plankton or human microbiome, hinder genome reconstruction of representative taxa using cultivation or metagenomic approaches. Here we used an alternative, single cell sequencing approach to obtain high-quality genome assemblies of two uncultured, numerically significant marine microorganisms. We employed fluorescence-activated cell sorting and multiple displacement amplification to obtain hundreds of micrograms of genomic DNA from individual, uncultured cells of two marine flavobacteria from the Gulf of Maine that were phylogenetically distant from existing cultured strains. Shotgun sequencing and genome finishing yielded 1.9 Mbp in 17 contigs and 1.5 Mbp in 21 contigs for the two flavobacteria, with estimated genome recoveries of about 91percent and 78percent, respectively. Only 0.24percent of the assembling sequences were contaminants and were removed from further analysis using rigorous quality control. In contrast to all cultured strains of marine flavobacteria, the two single cell genomes were excellent Global Ocean Sampling (GOS) metagenome fragment recruiters, demonstrating their numerical significance in the ocean. The geographic distribution of GOS recruits along the Northwest Atlantic coast coincided with ocean surface currents. Metabolic reconstruction indicated diverse potential energy sources, including biopolymer degradation, proteorhodopsin photometabolism, and hydrogen oxidation. Compared to cultured relatives, the two uncultured flavobacteria have small genome sizes, few non-coding nucleotides, and few paralogous genes, suggesting adaptations to narrow ecological niches. These features may have contributed to the abundance of the two taxa in specific regions of the ocean, and may have hindered their cultivation. We demonstrate the power of single cell DNA sequencing to generate reference genomes of uncultured

  3. The MAR databases: development and implementation of databases specific for marine metagenomics.

    Science.gov (United States)

    Klemetsen, Terje; Raknes, Inge A; Fu, Juan; Agafonov, Alexander; Balasundaram, Sudhagar V; Tartari, Giacomo; Robertsen, Espen; Willassen, Nils P

    2018-01-04

    We introduce the marine databases; MarRef, MarDB and MarCat (https://mmp.sfb.uit.no/databases/), which are publicly available resources that promote marine research and innovation. These data resources, which have been implemented in the Marine Metagenomics Portal (MMP) (https://mmp.sfb.uit.no/), are collections of richly annotated and manually curated contextual (metadata) and sequence databases representing three tiers of accuracy. While MarRef is a database for completely sequenced marine prokaryotic genomes, which represent a marine prokaryote reference genome database, MarDB includes all incomplete sequenced prokaryotic genomes regardless level of completeness. The last database, MarCat, represents a gene (protein) catalog of uncultivable (and cultivable) marine genes and proteins derived from marine metagenomics samples. The first versions of MarRef and MarDB contain 612 and 3726 records, respectively. Each record is built up of 106 metadata fields including attributes for sampling, sequencing, assembly and annotation in addition to the organism and taxonomic information. Currently, MarCat contains 1227 records with 55 metadata fields. Ontologies and controlled vocabularies are used in the contextual databases to enhance consistency. The user-friendly web interface lets the visitors browse, filter and search in the contextual databases and perform BLAST searches against the corresponding sequence databases. All contextual and sequence databases are freely accessible and downloadable from https://s1.sfb.uit.no/public/mar/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Metagenomic analysis of the bacterial communities and their functional profiles in water and sediments of the Apies River, South Africa, as a function of land use.

    Science.gov (United States)

    Abia, Akebe Luther King; Alisoltani, Arghavan; Keshri, Jitendra; Ubomba-Jaswa, Eunice

    2018-03-01

    Water quality is an important public health issue given that the presence of pathogenic organisms in such waters can adversely affect human and animal health. Despite the numerous studies conducted to assess the quality of environmental waters in many countries, limited efforts have been put on investigating the microbial quality of the sediments in developing countries and how this relates to different land uses. The present study evaluated the bacterial diversity in water and sediments in a highly used South African river to find out how the different land uses influenced the bacterial diversity, and to verify the human diseases functional classes of the bacterial populations. Samples were collected on river stretches influenced by an informal, a peri-urban and a rural settlement. Genomic DNA was extracted from water and sediment samples and sequenced on an Illumina® MiSeq platform targeting the 16S rRNA gene variable region V3-V4 from the genomic DNA. Metagenomic data analysis revealed that there was a great diversity in the microbial populations associated with the different land uses, with the informal settlement having the most considerable influence on the bacterial diversity in the water and sediments of the Apies River. The Proteobacteria (69.8%), Cyanobacteria (4.3%), Bacteroidetes (2.7%), and Actinobacteria (2.7%) were the most abundant phyla; the Alphaproteobacteria, Betaproteobacteria and Anaerolineae were the most recorded classes. Also, the sediments had a greater diversity and abundance in bacterial population than the water column. The functional profiles of the bacterial populations revealed an association with many human diseases including cancer pathways. Further studies that would isolate these potentially pathogenic organisms in the aquatic environment are therefore needed as this would help in protecting the lives of communities using such rivers, especially against emerging bacterial pathogens. Copyright © 2017 Elsevier B.V. All rights

  5. Fast and sensitive taxonomic classification for metagenomics with Kaiju

    DEFF Research Database (Denmark)

    Menzel, Peter; Ng, Kim Lee; Krogh, Anders

    2016-01-01

    heuristic. We show in a genome exclusion study that Kaiju can classify more reads with higher sensitivity and similar precision compared to fast k-mer based classifiers, especially in genera that are underrepresented in reference databases. We also demonstrate that Kaiju classifies more than twice as many...... reads in ten real metagenomes compared to programs based on genomic k-mers. Kaiju can process up to millions of reads per minute, and its memory footprint is below 5 GB of RAM, allowing the analysis on a standard PC. The program is available under the GPL3 license at: github.com/bioinformatics-centre/kaiju...

  6. deFUME: Dynamic exploration of functional metagenomic sequencing data

    DEFF Research Database (Denmark)

    van der Helm, Eric; Geertz-Hansen, Henrik Marcus; Genee, Hans Jasper

    2015-01-01

    is time consuming and constitutes a major bottleneck for experimental researchers in the field. Here we present the deFUME web server, an easy-to-use web-based interface for processing, annotation and visualization of functional metagenomics sequencing data, tailored to meet the requirements of non......-bioinformaticians. The web-server integrates multiple analysis steps into one single workflow: read assembly, open reading frame prediction, and annotation with BLAST, InterPro and GO classifiers. Analysis results are visualized in an online dynamic web-interface. The deFUME webserver provides a fast track from raw sequence...

  7. Comparative metagenomics of eight geographically remote terrestrial hot springs

    DEFF Research Database (Denmark)

    Menzel, Peter; Islin, Sóley Ruth; Rike, Anne Gunn

    2015-01-01

    Hot springs are natural habitats for thermophilic Archaea and Bacteria. In this paper, we present the metagenomic analysis of eight globally distributed terrestrial hot springs from China, Iceland, Italy, Russia, and the USA with a temperature range between 61 and 92 (∘)C and pH between 1.8 and 7....... A comparison of the biodiversity and community composition generally showed a decrease in biodiversity with increasing temperature and decreasing pH. Another important factor shaping microbial diversity of the studied sites was the abundance of organic substrates. Several species of the Crenarchaeal order...

  8. Metagenomics and development of the gut microbiota in infants

    DEFF Research Database (Denmark)

    Vallès, Y.; Gosalbes, M. J.; de Vries, Lisbeth Elvira

    2012-01-01

    Clin Microbiol Infect 2012; 18 (Suppl. 4): 21–26 The establishment of a balanced intestinal microbiota is essential for numerous aspects of human health, yet the microbial colonization of the gastrointestinal tract of infants is both complex and highly variable among individuals. In addition......, the gastrointestinal tract microbiota is often exposed to antibiotics, and may be an important reservoir of resistant strains and of transferable resistance genes from early infancy. We are investigating by means of diverse metagenomic approaches several areas of microbiota development in infants, including...

  9. An analytical framework for extracting hydrological information from time series of small reservoirs in a semi-arid region

    Science.gov (United States)

    Annor, Frank; van de Giesen, Nick; Bogaard, Thom; Eilander, Dirk

    2013-04-01

    small reservoirs in the Upper East Region of Ghana. Reservoirs without obvious large seepage losses (field survey) were selected. To verify this, stable water isotopic samples are collected from groundwater upstream and downstream from the reservoir. By looking at possible enrichment of downstream groundwater, a good estimate of seepage can be made in addition to estimates on evaporation. We estimated the evaporative losses and compared those with field measurements using eddy correlation measurements. Lastly, we determined the cumulative surface runoff curves for the small reservoirs .We will present this analytical framework for extracting hydrological information from time series of small reservoirs and show the first results for our study region of northern Ghana.

  10. Freezing fecal samples prior to DNA extraction affects the Firmicutes to Bacteroidetes ratio determined by downstream quantitative PCR analysis

    DEFF Research Database (Denmark)

    Bahl, Martin Iain; Bergström, Anders; Licht, Tine Rask

    2012-01-01

    Freezing stool samples prior to DNA extraction and downstream analysis is widely used in metagenomic studies of the human microbiota but may affect the inferred community composition. In this study, DNA was extracted either directly or following freeze storage of three homogenized human fecal...

  11. Freezing fecal samples prior to DNA extraction affects the Firmicutes to Bacteroidetes ratio determined by downstream quantitative PCR analysis

    DEFF Research Database (Denmark)

    Bahl, Martin Iain; Bergström, Anders; Licht, Tine Rask

    Freezing stool samples prior to DNA extraction and downstream analysis is widely used in metagenomic studies of the human microbiota but may affect the inferred community composition. In this study DNA was extracted either directly or following freeze storage of three homogenized human fecal...

  12. Vikodak--A Modular Framework for Inferring Functional Potential of Microbial Communities from 16S Metagenomic Datasets.

    Directory of Open Access Journals (Sweden)

    Sunil Nagpal

    Full Text Available The overall metabolic/functional potential of any given environmental niche is a function of the sum total of genes/proteins/enzymes that are encoded and expressed by various interacting microbes residing in that niche. Consequently, prior (collated information pertaining to genes, enzymes encoded by the resident microbes can aid in indirectly (reconstructing/ inferring the metabolic/ functional potential of a given microbial community (given its taxonomic abundance profile. In this study, we present Vikodak--a multi-modular package that is based on the above assumption and automates inferring and/ or comparing the functional characteristics of an environment using taxonomic abundance generated from one or more environmental sample datasets. With the underlying assumptions of co-metabolism and independent contributions of different microbes in a community, a concerted effort has been made to accommodate microbial co-existence patterns in various modules incorporated in Vikodak.Validation experiments on over 1400 metagenomic samples have confirmed the utility of Vikodak in (a deciphering enzyme abundance profiles of any KEGG metabolic pathway, (b functional resolution of distinct metagenomic environments, (c inferring patterns of functional interaction between resident microbes, and (d automating statistical comparison of functional features of studied microbiomes. Novel features incorporated in Vikodak also facilitate automatic removal of false positives and spurious functional predictions.With novel provisions for comprehensive functional analysis, inclusion of microbial co-existence pattern based algorithms, automated inter-environment comparisons; in-depth analysis of individual metabolic pathways and greater flexibilities at the user end, Vikodak is expected to be an important value addition to the family of existing tools for 16S based function prediction.A web implementation of Vikodak can be publicly accessed at: http://metagenomics

  13. A metagenomic analysis of pandemic influenza A (2009 H1N1 infection in patients from North America.

    Directory of Open Access Journals (Sweden)

    Alexander L Greninger

    2010-10-01

    Full Text Available Although metagenomics has been previously employed for pathogen discovery, its cost and complexity have prevented its use as a practical front-line diagnostic for unknown infectious diseases. Here we demonstrate the utility of two metagenomics-based strategies, a pan-viral microarray (Virochip and deep sequencing, for the identification and characterization of 2009 pandemic H1N1 influenza A virus. Using nasopharyngeal swabs collected during the earliest stages of the pandemic in Mexico, Canada, and the United States (n = 17, the Virochip was able to detect a novel virus most closely related to swine influenza viruses without a priori information. Deep sequencing yielded reads corresponding to 2009 H1N1 influenza in each sample (percentage of aligned sequences corresponding to 2009 H1N1 ranging from 0.0011% to 10.9%, with up to 97% coverage of the influenza genome in one sample. Detection of 2009 H1N1 by deep sequencing was possible even at titers near the limits of detection for specific RT-PCR, and the percentage of sequence reads was linearly correlated with virus titer. Deep sequencing also provided insights into the upper respiratory microbiota and host gene expression in response to 2009 H1N1 infection. An unbiased analysis combining sequence data from all 17 outbreak samples revealed that 90% of the 2009 H1N1 genome could be assembled de novo without the use of any reference sequence, including assembly of several near full-length genomic segments. These results indicate that a streamlined metagenomics detection strategy can potentially replace the multiple conventional diagnostic tests required to investigate an outbreak of a novel pathogen, and provide a blueprint for comprehensive diagnosis of unexplained acute illnesses or outbreaks in clinical and public health settings.

  14. Extraction of DNA from plant and fungus tissues in situ

    Directory of Open Access Journals (Sweden)

    Abu Almakarem Amal S

    2012-06-01

    Full Text Available Abstract Background When samples are collected in the field and transported to the lab, degradation of the nucleic acids contained in the samples is frequently observed. Immediate extraction and precipitation of the nucleic acids reduces degradation to a minimum, thus preserving accurate sequence information. An extraction method to obtain high quality DNA in field studies is described. Findings DNA extracted immediately after sampling was compared to DNA extracted after allowing the sampled tissues to air dry at 21°C for 48 or 72 hours. While DNA extracted from fresh tissues exhibited little degradation, DNA extracted from all tissues exposed to 21°C air for 48 or 72 hours exhibited varying degrees of degradation. Yield was higher for extractions from fresh tissues in most cases. Four microcentrifuges were compared for DNA yield: one standard electric laboratory microcentrifuge (max rcf = 16,000×g, two battery-operated microcentrifuges (max rcf = 5,000 and 3,000 ×g, and one manually-operated microcentrifuge (max rcf = 120×g. Yields for all centrifuges were similar. DNA extracted under simulated field conditions was similar in yield and quality to DNA extracted in the laboratory using the same equipment. Conclusions This CTAB (cetyltrimethylammonium bromide DNA extraction method employs battery-operated and manually-operated equipment to isolate high quality DNA in the field. The method was tested on plant and fungus tissues, and may be adapted for other types of organisms. The method produced high quality DNA in laboratory tests and under simulated field conditions. The field extraction method should prove useful for working in remote sites, where ice, dry ice, and liquid nitrogen are unavailable; where degradation is likely to occur due to the long distances between the sample site and the laboratory; and in instances where other DNA preservation and transportation methods have been unsuccessful. It may be possible to adapt

  15. Functional metagenomic profiling of intestinal microbiome in extreme ageing

    Science.gov (United States)

    Rampelli, Simone; Candela, Marco; Turroni, Silvia; Biagi, Elena; Collino, Sebastiano; Franceschi, Claudio; O'Toole, Paul W; Brigidi, Patrizia

    2013-01-01

    Age-related alterations in human gut microbiota composition have been thoroughly described, but a detailed functional description of the intestinal bacterial coding capacity is still missing. In order to elucidate the contribution of the gut metagenome to the complex mosaic of human longevity, we applied shotgun sequencing to total fecal bacterial DNA in a selection of samples belonging to a well-characterized human ageing cohort. The age-related trajectory of the human gut microbiome was characterized by loss of genes for shortchain fatty acid production and an overall decrease in the saccharolytic potential, while proteolytic functions were more abundant than in the intestinal metagenome of younger adults. This altered functional profile was associated with a relevant enrichment in “pathobionts”, i.e. opportunistic pro-inflammatory bacteria generally present in the adult gut ecosystem in low numbers. Finally, as a signature for long life we identified 116 microbial genes that significantly correlated with ageing. Collectively, our data emphasize the relationship between intestinal bacteria and human metabolism, by detailing the modifications in the gut microbiota as a consequence of and/or promoter of the physiological changes occurring in the human host upon ageing. PMID:24334635

  16. Centrifuge: rapid and sensitive classification of metagenomic sequences.

    Science.gov (United States)

    Kim, Daehwan; Song, Li; Breitwieser, Florian P; Salzberg, Steven L

    2016-12-01

    Centrifuge is a novel microbial classification engine that enables rapid, accurate, and sensitive labeling of reads and quantification of species on desktop computers. The system uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (4.2 GB for 4078 bacterial and 200 archaeal genomes) and classifies sequences at very high speed, allowing it to process the millions of reads from a typical high-throughput DNA sequencing run within a few minutes. Together, these advances enable timely and accurate analysis of large metagenomics data sets on conventional desktop computers. Because of its space-optimized indexing schemes, Centrifuge also makes it possible to index the entire NCBI nonredundant nucleotide sequence database (a total of 109 billion bases) with an index size of 69 GB, in contrast to k-mer-based indexing schemes, which require far more extensive space. © 2016 Kim et al.; Published by Cold Spring Harbor Laboratory Press.

  17. Quantitative metagenomics reveals unique gut microbiome biomarkers in ankylosing spondylitis.

    Science.gov (United States)

    Wen, Chengping; Zheng, Zhijun; Shao, Tiejuan; Liu, Lin; Xie, Zhijun; Le Chatelier, Emmanuelle; He, Zhixing; Zhong, Wendi; Fan, Yongsheng; Zhang, Linshuang; Li, Haichang; Wu, Chunyan; Hu, Changfeng; Xu, Qian; Zhou, Jia; Cai, Shunfeng; Wang, Dawei; Huang, Yun; Breban, Maxime; Qin, Nan; Ehrlich, Stanislav Dusko

    2017-07-27

    The assessment and characterization of the gut microbiome has become a focus of research in the area of human autoimmune diseases. Ankylosing spondylitis is an inflammatory autoimmune disease and evidence showed that ankylosing spondylitis may be a microbiome-driven disease. To investigate the relationship between the gut microbiome and ankylosing spondylitis, a quantitative metagenomics study based on deep shotgun sequencing was performed, using gut microbial DNA from 211 Chinese individuals. A total of 23,709 genes and 12 metagenomic species were shown to be differentially abundant between ankylosing spondylitis patients and healthy controls. Patients were characterized by a form of gut microbial dysbiosis that is more prominent than previously reported cases with inflammatory bowel disease. Specifically, the ankylosing spondylitis patients demonstrated increases in the abundance of Prevotella melaninogenica, Prevotella copri, and Prevotella sp. C561 and decreases in Bacteroides spp. It is noteworthy that the Bifidobacterium genus, which is commonly used in probiotics, accumulated in the ankylosing spondylitis patients. Diagnostic algorithms were established using a subset of these gut microbial biomarkers. Alterations of the gut microbiome are associated with development of ankylosing spondylitis. Our data suggest biomarkers identified in this study might participate in the pathogenesis or development process of ankylosing spondylitis, providing new leads for the development of new diagnostic tools and potential treatments.

  18. Microbial survival strategies in ancient permafrost: insights from metagenomics.

    Science.gov (United States)

    Mackelprang, Rachel; Burkert, Alexander; Haw, Monica; Mahendrarajah, Tara; Conaway, Christopher H; Douglas, Thomas A; Waldrop, Mark P

    2017-10-01

    In permafrost (perennially frozen ground) microbes survive oligotrophic conditions, sub-zero temperatures, low water availability and high salinity over millennia. Viable life exists in permafrost tens of thousands of years old but we know little about the metabolic and physiological adaptations to the challenges presented by life in frozen ground over geologic time. In this study we asked whether increasing age and the associated stressors drive adaptive changes in community composition and function. We conducted deep metagenomic and 16 S rRNA gene sequencing across a Pleistocene permafrost chronosequence from 19 000 to 33 000 years before present (kyr). We found that age markedly affected community composition and reduced diversity. Reconstruction of paleovegetation from metagenomic sequence suggests vegetation differences in the paleo record are not responsible for shifts in community composition and function. Rather, we observed shifts consistent with long-term survival strategies in extreme cryogenic environments. These include increased reliance on scavenging detrital biomass, horizontal gene transfer, chemotaxis, dormancy, environmental sensing and stress response. Our results identify traits that may enable survival in ancient cryoenvironments with no influx of energy or new materials.

  19. Functional metagenomic profiling of intestinal microbiome in extreme ageing.

    Science.gov (United States)

    Rampelli, Simone; Candela, Marco; Turroni, Silvia; Biagi, Elena; Collino, Sebastiano; Franceschi, Claudio; O'Toole, Paul W; Brigidi, Patrizia

    2013-12-01

    Age-related alterations in human gut microbiota composition have been thoroughly described, but a detailed functional description of the intestinal bacterial coding capacity is still missing. In order to elucidate the contribution of the gut metagenome to the complex mosaic of human longevity, we applied shotgun sequencing to total fecal bacterial DNA in a selection of samples belonging to a well-characterized human ageing cohort. The age-related trajectory of the human gut microbiome was characterized by loss of genes for shortchain fatty acid production and an overall decrease in the saccharolytic potential, while proteolytic functions were more abundant than in the intestinal metagenome of younger adults. This altered functional profile was associated with a relevant enrichment in "pathobionts", i.e. opportunistic pro-inflammatory bacteria generally present in the adult gut ecosystem in low numbers. Finally, as a signature for long life we identified 116 microbial genes that significantly correlated with ageing. Collectively, our data emphasize the relationship between intestinal bacteria and human metabolism, by detailing the modifications in the gut microbiota as a consequence of and/or promoter of the physiological changes occurring in the human host upon ageing.

  20. Genomic and metagenomic technologies to explore the antibiotic resistance mobilome.

    Science.gov (United States)

    Martínez, José L; Coque, Teresa M; Lanza, Val F; de la Cruz, Fernando; Baquero, Fernando

    2017-01-01

    Antibiotic resistance is a relevant problem for human health that requires global approaches to establish a deep understanding of the processes of acquisition, stabilization, and spread of resistance among human bacterial pathogens. Since natural (nonclinical) ecosystems are reservoirs of resistance genes, a health-integrated study of the epidemiology of antibiotic resistance requires the exploration of such ecosystems with the aim of determining the role they may play in the selection, evolution, and spread of antibiotic resistance genes, involving the so-called resistance mobilome. High-throughput sequencing techniques allow an unprecedented opportunity to describe the genetic composition of a given microbiome without the need to subculture the organisms present inside. However, bioinformatic methods for analyzing this bulk of data, mainly with respect to binning each resistance gene with the organism hosting it, are still in their infancy. Here, we discuss how current genomic methodologies can serve to analyze the resistance mobilome and its linkage with different bacterial genomes and metagenomes. In addition, we describe the drawbacks of current methodologies for analyzing the resistance mobilome, mainly in cases of complex microbiotas, and discuss the possibility of implementing novel tools to improve our current metagenomic toolbox. © 2016 New York Academy of Sciences.