WorldWideScience

Sample records for metagenomics extracting information

  1. DNA extraction for streamlined metagenomics of diverse environmental samples.

    Science.gov (United States)

    Marotz, Clarisse; Amir, Amnon; Humphrey, Greg; Gaffney, James; Gogul, Grant; Knight, Rob

    2017-06-01

    A major bottleneck for metagenomic sequencing is rapid and efficient DNA extraction. Here, we compare the extraction efficiencies of three magnetic bead-based platforms (KingFisher, epMotion, and Tecan) to a standardized column-based extraction platform across a variety of sample types, including feces, oral, skin, soil, and water. Replicate sample plates were extracted and prepared for 16S rRNA gene amplicon sequencing in parallel to assess extraction bias and DNA quality. The data demonstrate that any effect of extraction method on sequencing results was small compared with the variability across samples; however, the KingFisher platform produced the largest number of high-quality reads in the shortest amount of time. Based on these results, we have identified an extraction pipeline that dramatically reduces sample processing time without sacrificing bacterial taxonomic or abundance information.

  2. Metagenomes provide valuable comparative information on soil microeukaryotes

    DEFF Research Database (Denmark)

    Jacquiod, Samuel Jehan Auguste; Stenbæk, Jonas; Santos, Susana S.

    2016-01-01

    , providing microbiologists with substantial amounts of accessible information. We took advantage of public metagenomes in order to investigate microeukaryote communities in a well characterized grassland soil. The data gathered allowed the evaluation of several factors impacting the community structure...... has been identified. Our analyses suggest that publicly available metagenome data can provide valuable information on soil microeukaryotes for comparative purposes when handled appropriately, complementing the current view provided by ribosomal amplicon sequencing methods....

  3. An extended genovo metagenomic assembler by incorporating paired-end information

    Directory of Open Access Journals (Sweden)

    Afiahayati

    2013-10-01

    Full Text Available Metagenomes present assembly challenges, when assembling multiple genomes from mixed reads of multiple species. An assembler for single genomes can’t adapt well when applied in this case. A metagenomic assembler, Genovo, is a de novo assembler for metagenomes under a generative probabilistic model. Genovo assembles all reads without discarding any reads in a preprocessing step, and is therefore able to extract more information from metagenomic data and, in principle, generate better assembly results. Paired end sequencing is currently widely-used yet Genovo was designed for 454 single end reads. In this research, we attempted to extend Genovo by incorporating paired-end information, named Xgenovo, so that it generates higher quality assemblies with paired end reads.First, we extended Genovo by adding a bonus parameter in the Chinese Restaurant Process used to get prior accounts for the unknown number of genomes in the sample. This bonus parameter intends for a pair of reads to be in the same contig and as an effort to solve chimera contig case. Second, we modified the sampling process of the location of a read in a contig. We used relative distance for the number of trials in the symmetric geometric distribution instead of using distance between the offset and the center of contig used in Genovo. Using this relative distance, a read sampled in the appropriate location has higher probability. Therefore a read will be mapped in the correct location.Results of extensive experiments on simulated metagenomic datasets from simple to complex with species coverage setting following uniform and lognormal distribution showed that Xgenovo can be superior to the original Genovo and the recently proposed metagenome assembler for 454 reads, MAP. Xgenovo successfully generated longer N50 than Genovo and MAP while maintaining the assembly quality even for very complex metagenomic datasets consisting of 115 species. Xgenovo also demonstrated the potential to

  4. EXTRACT: interactive extraction of environment metadata and term suggestion for metagenomic sample annotation.

    Science.gov (United States)

    Pafilis, Evangelos; Buttigieg, Pier Luigi; Ferrell, Barbra; Pereira, Emiliano; Schnetzer, Julia; Arvanitidis, Christos; Jensen, Lars Juhl

    2016-01-01

    The microbial and molecular ecology research communities have made substantial progress on developing standards for annotating samples with environment metadata. However, sample manual annotation is a highly labor intensive process and requires familiarity with the terminologies used. We have therefore developed an interactive annotation tool, EXTRACT, which helps curators identify and extract standard-compliant terms for annotation of metagenomic records and other samples. Behind its web-based user interface, the system combines published methods for named entity recognition of environment, organism, tissue and disease terms. The evaluators in the BioCreative V Interactive Annotation Task found the system to be intuitive, useful, well documented and sufficiently accurate to be helpful in spotting relevant text passages and extracting organism and environment terms. Comparison of fully manual and text-mining-assisted curation revealed that EXTRACT speeds up annotation by 15-25% and helps curators to detect terms that would otherwise have been missed. Database URL: https://extract.hcmr.gr/. © The Author(s) 2016. Published by Oxford University Press.

  5. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea

    Energy Technology Data Exchange (ETDEWEB)

    Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas; Harmon-Smith, Miranda; Doud, Devin; Reddy, T. B. K.; Schulz, Frederik; Jarett, Jessica; Rivers, Adam R.; Eloe-Fadrosh, Emiley A.; Tringe, Susannah G.; Ivanova, Natalia N.; Copeland, Alex; Clum, Alicia; Becraft, Eric D.; Malmstrom, Rex R.; Birren, Bruce; Podar, Mircea; Bork, Peer; Weinstock, George M.; Garrity, George M.; Dodsworth, Jeremy A.; Yooseph, Shibu; Sutton, Granger; Glöckner, Frank O.; Gilbert, Jack A.; Nelson, William C.; Hallam, Steven J.; Jungbluth, Sean P.; Ettema, Thijs J. G.; Tighe, Scott; Konstantinidis, Konstantinos T.; Liu, Wen-Tso; Baker, Brett J.; Rattei, Thomas; Eisen, Jonathan A.; Hedlund, Brian; McMahon, Katherine D.; Fierer, Noah; Knight, Rob; Finn, Rob; Cochrane, Guy; Karsch-Mizrachi, Ilene; Tyson, Gene W.; Rinke, Christian; Kyrpides, Nikos C.; Schriml, Lynn; Garrity, George M.; Hugenholtz, Philip; Sutton, Granger; Yilmaz, Pelin; Meyer, Folker; Glöckner, Frank O.; Gilbert, Jack A.; Knight, Rob; Finn, Rob; Cochrane, Guy; Karsch-Mizrachi, Ilene; Lapidus, Alla; Meyer, Folker; Yilmaz, Pelin; Parks, Donovan H.; Eren, A. M.; Schriml, Lynn; Banfield, Jillian F.; Hugenholtz, Philip; Woyke, Tanja

    2017-08-08

    The number of genomes from uncultivated microbes will soon surpass the number of isolate genomes in public databases (Hugenholtz, Skarshewski, & Parks, 2016). Technological advancements in high-throughput sequencing and assembly, including single-cell genomics and the computational extraction of genomes from metagenomes (GFMs), are largely responsible. Here we propose community standards for reporting the Minimum Information about a Single-Cell Genome (MIxS-SCG) and Minimum Information about Genomes extracted From Metagenomes (MIxS-GFM) specific for Bacteria and Archaea. The standards have been developed in the context of the International Genomics Standards Consortium (GSC) community (Field et al., 2014) and can be viewed as a supplement to other GSC checklists including the Minimum Information about a Genome Sequence (MIGS), Minimum information about a Metagenomic Sequence(s) (MIMS) (Field et al., 2008) and Minimum Information about a Marker Gene Sequence (MIMARKS) (P. Yilmaz et al., 2011). Community-wide acceptance of MIxS-SCG and MIxS-GFM for Bacteria and Archaea will enable broad comparative analyses of genomes from the majority of taxa that remain uncultivated, improving our understanding of microbial function, ecology, and evolution.

  6. Information extraction system

    Science.gov (United States)

    Lemmond, Tracy D; Hanley, William G; Guensche, Joseph Wendell; Perry, Nathan C; Nitao, John J; Kidwell, Paul Brandon; Boakye, Kofi Agyeman; Glaser, Ron E; Prenger, Ryan James

    2014-05-13

    An information extraction system and methods of operating the system are provided. In particular, an information extraction system for performing meta-extraction of named entities of people, organizations, and locations as well as relationships and events from text documents are described herein.

  7. Databases of the marine metagenomics

    KAUST Repository

    Mineta, Katsuhiko

    2015-10-28

    The metagenomic data obtained from marine environments is significantly useful for understanding marine microbial communities. In comparison with the conventional amplicon-based approach of metagenomics, the recent shotgun sequencing-based approach has become a powerful tool that provides an efficient way of grasping a diversity of the entire microbial community at a sampling point in the sea. However, this approach accelerates accumulation of the metagenome data as well as increase of data complexity. Moreover, when metagenomic approach is used for monitoring a time change of marine environments at multiple locations of the seawater, accumulation of metagenomics data will become tremendous with an enormous speed. Because this kind of situation has started becoming of reality at many marine research institutions and stations all over the world, it looks obvious that the data management and analysis will be confronted by the so-called Big Data issues such as how the database can be constructed in an efficient way and how useful knowledge should be extracted from a vast amount of the data. In this review, we summarize the outline of all the major databases of marine metagenome that are currently publically available, noting that database exclusively on marine metagenome is none but the number of metagenome databases including marine metagenome data are six, unexpectedly still small. We also extend our explanation to the databases, as reference database we call, that will be useful for constructing a marine metagenome database as well as complementing important information with the database. Then, we would point out a number of challenges to be conquered in constructing the marine metagenome database.

  8. An Improved Methodology to Overcome Key Issues in Human Fecal Metagenomic DNA Extraction.

    Science.gov (United States)

    Kumar, Jitendra; Kumar, Manoj; Gupta, Shashank; Ahmed, Vasim; Bhambi, Manu; Pandey, Rajesh; Chauhan, Nar Singh

    2016-12-01

    Microbes are ubiquitously distributed in nature, and recent culture-independent studies have highlighted the significance of gut microbiota in human health and disease. Fecal DNA is the primary source for the majority of human gut microbiome studies. However, further improvement is needed to obtain fecal metagenomic DNA with sufficient amount and good quality but low host genomic DNA contamination. In the current study, we demonstrate a quick, robust, unbiased, and cost-effective method for the isolation of high molecular weight (>23kb) metagenomic DNA (260/280 ratio >1.8) with a good yield (55.8±3.8ng/mg of feces). We also confirm that there is very low human genomic DNA contamination (eubacterial: human genomic DNA marker genes=2 27.9 :1) in the human feces. The newly-developed method robustly performs for fresh as well as stored fecal samples as demonstrated by 16S rRNA gene sequencing using 454 FLX+. Moreover, 16S rRNA gene analysis indicated that compared to other DNA extraction methods tested, the fecal metagenomic DNA isolated with current methodology retains species richness and does not show microbial diversity biases, which is further confirmed by qPCR with a known quantity of spike-in genomes. Overall, our data highlight a protocol with a balance between quality, amount, user-friendliness, and cost effectiveness for its suitability toward usage for culture-independent analysis of the human gut microbiome, which provides a robust solution to overcome key issues associated with fecal metagenomic DNA isolation in human gut microbiome studies. Copyright © 2016 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

  9. An Improved Methodology to Overcome Key Issues in Human Fecal Metagenomic DNA Extraction

    Directory of Open Access Journals (Sweden)

    Jitendra Kumar

    2016-12-01

    Full Text Available Microbes are ubiquitously distributed in nature, and recent culture-independent studies have highlighted the significance of gut microbiota in human health and disease. Fecal DNA is the primary source for the majority of human gut microbiome studies. However, further improvement is needed to obtain fecal metagenomic DNA with sufficient amount and good quality but low host genomic DNA contamination. In the current study, we demonstrate a quick, robust, unbiased, and cost-effective method for the isolation of high molecular weight (>23 kb metagenomic DNA (260/280 ratio >1.8 with a good yield (55.8 ± 3.8 ng/mg of feces. We also confirm that there is very low human genomic DNA contamination (eubacterial: human genomic DNA marker genes = 227.9:1 in the human feces. The newly-developed method robustly performs for fresh as well as stored fecal samples as demonstrated by 16S rRNA gene sequencing using 454 FLX+. Moreover, 16S rRNA gene analysis indicated that compared to other DNA extraction methods tested, the fecal metagenomic DNA isolated with current methodology retains species richness and does not show microbial diversity biases, which is further confirmed by qPCR with a known quantity of spike-in genomes. Overall, our data highlight a protocol with a balance between quality, amount, user-friendliness, and cost effectiveness for its suitability toward usage for culture-independent analysis of the human gut microbiome, which provides a robust solution to overcome key issues associated with fecal metagenomic DNA isolation in human gut microbiome studies.

  10. A Rapid and Economical Method for Efficient DNA Extraction from Diverse Soils Suitable for Metagenomic Applications.

    Science.gov (United States)

    Devi, Selvaraju Gayathri; Fathima, Anwar Aliya; Radha, Sudhakar; Arunraj, Rex; Curtis, Wayne R; Ramya, Mohandass

    2015-01-01

    A rapid, cost effective method of metagenomic DNA extraction from soil is a useful tool for environmental microbiology. The present work describes an improved method of DNA extraction namely "powdered glass method" from diverse soils. The method involves the use of sterile glass powder for cell lysis followed by addition of 1% powdered activated charcoal (PAC) as purifying agent to remove humic substances. The method yielded substantial DNA (5.87 ± 0.04 μg/g of soil) with high purity (A260/280: 1.76 ± 0.05) and reduced humic substances (A340: 0.047 ± 0.03). The quality of the extracted DNA was compared against five different methods based on 16S rDNA PCR amplification, BamHI digestion and validated using quantitative PCR. The digested DNA was used for a metagenomic library construction with the transformation efficiency of 4 X 106 CFU mL-1. Besides providing rapid, efficient and economical extraction of metgenomic DNA from diverse soils, this method's applicability is also demonstrated for cultivated organisms (Gram positive B. subtilis NRRL-B-201, Gram negative E. coli MTCC40, and a microalgae C. sorokiniana UTEX#1666).

  11. A Rapid and Economical Method for Efficient DNA Extraction from Diverse Soils Suitable for Metagenomic Applications.

    Directory of Open Access Journals (Sweden)

    Selvaraju Gayathri Devi

    Full Text Available A rapid, cost effective method of metagenomic DNA extraction from soil is a useful tool for environmental microbiology. The present work describes an improved method of DNA extraction namely "powdered glass method" from diverse soils. The method involves the use of sterile glass powder for cell lysis followed by addition of 1% powdered activated charcoal (PAC as purifying agent to remove humic substances. The method yielded substantial DNA (5.87 ± 0.04 μg/g of soil with high purity (A260/280: 1.76 ± 0.05 and reduced humic substances (A340: 0.047 ± 0.03. The quality of the extracted DNA was compared against five different methods based on 16S rDNA PCR amplification, BamHI digestion and validated using quantitative PCR. The digested DNA was used for a metagenomic library construction with the transformation efficiency of 4 X 106 CFU mL-1. Besides providing rapid, efficient and economical extraction of metgenomic DNA from diverse soils, this method's applicability is also demonstrated for cultivated organisms (Gram positive B. subtilis NRRL-B-201, Gram negative E. coli MTCC40, and a microalgae C. sorokiniana UTEX#1666.

  12. Multimedia Information Extraction

    CERN Document Server

    Maybury, Mark T

    2012-01-01

    The advent of increasingly large consumer collections of audio (e.g., iTunes), imagery (e.g., Flickr), and video (e.g., YouTube) is driving a need not only for multimedia retrieval but also information extraction from and across media. Furthermore, industrial and government collections fuel requirements for stock media access, media preservation, broadcast news retrieval, identity management, and video surveillance.  While significant advances have been made in language processing for information extraction from unstructured multilingual text and extraction of objects from imagery and vid

  13. Chitinase genes revealed and compared in bacterial isolates, DNA extracts and a metagenomic library from a phytopathogen suppressive soil

    Energy Technology Data Exchange (ETDEWEB)

    Hjort, K.; Bergstrom, M.; Adesina, M.F.; Jansson, J.K.; Smalla, K.; Sjoling, S.

    2009-09-01

    Soil that is suppressive to disease caused by fungal pathogens is an interesting source to target for novel chitinases that might be contributing towards disease suppression. In this study we screened for chitinase genes, in a phytopathogen-suppressive soil in three ways: (1) from a metagenomic library constructed from microbial cells extracted from soil, (2) from directly extracted DNA and (3) from bacterial isolates with antifungal and chitinase activities. Terminal-restriction fragment length polymorphism (T-RFLP) of chitinase genes revealed differences in amplified chitinase genes from the metagenomic library and the directly extracted DNA, but approximately 40% of the identified chitinase terminal-restriction fragments (TRFs) were found in both sources. All of the chitinase TRFs from the isolates were matched to TRFs in the directly extracted DNA and the metagenomic library. The most abundant chitinase TRF in the soil DNA and the metagenomic library corresponded to the TRF{sup 103} of the isolate, Streptomyces mutomycini and/or Streptomyces clavifer. There were good matches between T-RFLP profiles of chitinase gene fragments obtained from different sources of DNA. However, there were also differences in both the chitinase and the 16S rRNA gene T-RFLP patterns depending on the source of DNA, emphasizing the lack of complete coverage of the gene diversity by any of the approaches used.

  14. Complete genome sequence of a nonculturable Methanococcus maripaludis strain extracted in a metagenomic survey of petroleum reservoir fluids.

    Science.gov (United States)

    Wang, Xiaoyi; Greenfield, Paul; Li, Dongmei; Hendry, Philip; Volk, Herbert; Sutherland, Tara D

    2011-10-01

    Extraction of genome sequences from metagenomic data is crucial for reconstructing the metabolism of microbial communities that cannot be mimicked in the laboratory. A complete Methanococcus maripaludis genome was generated from metagenomic data derived from a thermophilic subsurface oil reservoir. M. maripaludis is a hydrogenotrophic methanogenic species that is common in mesophilic saline environments. Comparison of the genome from the thermophilic, subsurface environment with the genome of the type species will provide insight into the adaptation of a methanogenic genome to an oil reservoir environment.

  15. PROTOCOL FOR EXTRACTION OF BACTERIAL METAGENOME DNA TO PRAWN Macrobrachium carcinus L

    Directory of Open Access Journals (Sweden)

    J U González de la Cruz

    2011-07-01

    Full Text Available In this work we adapted a protocol for the extraction of metagenomic DNA (ADNmg bacteria in the digestive system (intestines, stomach and hepatopancreas of Macrobrachium carcinus L., with reference to the method of extracting bacterial DNA from soils and sediments (Rojas-Herrera et al., 2008. This methodology consisted of enzymatic, physics, mechanics and chemistry after a series of tests was abolished enzymatic lysis. However, the success ADNmg extraction was influenced mainly by the preparation of the samples, in particular the hepatopancreas, where it was necessary to remove the fat by thermal shock temperature and phase separation by centrifugation with the sample frozen.The effectiveness of isolated DNA fragmentation was verified by gel electrophoresis in denaturing gradient (DGGE after amplification with universal primers. In general, it had a low diversity (19 phylotypes between the different organs analyzed of 13.5 ± 1 (intestines to 11.7 ± 0.96 (stomach. The Shannon-Weaver index (2.45, Simpsons (10.88 and equity (0972 obtained from the digitization of the image of the gel, suggested that the phylotypes that form the gut microflora M. carcinus, is distributed unevenly between the different organs analyzed.

  16. Current and future resources for functional metagenomics

    Directory of Open Access Journals (Sweden)

    Kathy Nguyen Lam

    2015-10-01

    Full Text Available Functional metagenomics is a powerful experimental approach for studying gene function, starting from the extracted DNA of mixed microbial populations. A functional approach relies on the construction and screening of metagenomic libraries – physical libraries that contain DNA cloned from environmental metagenomes. The information obtained from functional metagenomics can help in future annotations of gene function and serve as a complement to sequence-based metagenomics. In this Perspective, we begin by summarizing the technical challenges of constructing metagenomic libraries and emphasize their value as resources. We then discuss libraries constructed using the popular cloning vector, pCC1FOS, and highlight the strengths and shortcomings of this system, alongside possible strategies to maximize existing pCC1FOS-based libraries by screening in diverse hosts. Finally, we discuss the known bias of libraries constructed from human gut and marine water samples, present results that suggest bias may also occur for soil libraries, and consider factors that bias metagenomic libraries in general. We anticipate that discussion of current resources and limitations will advance tools and technologies for functional metagenomics research.

  17. Challenges in Managing Information Extraction

    Science.gov (United States)

    Shen, Warren H.

    2009-01-01

    This dissertation studies information extraction (IE), the problem of extracting structured information from unstructured data. Example IE tasks include extracting person names from news articles, product information from e-commerce Web pages, street addresses from emails, and names of emerging music bands from blogs. IE is all increasingly…

  18. Impact of metagenomic DNA extraction procedures on the identifiable endophytic bacterial diversity in Sorghum bicolor (L. Moench).

    Science.gov (United States)

    Maropola, Mapula Kgomotso Annah; Ramond, Jean-Baptiste; Trindade, Marla

    2015-05-01

    Culture-independent studies rely on the quantity and quality of the extracted environmental metagenomic DNA (mDNA). To fully access the plant tissue microbiome, the extracted plant mDNA should allow optimal PCR applications and the genetic content must be representative of the total microbial diversity. In this study, we evaluated the endophytic bacterial diversity retrieved using different mDNA extraction procedures. Metagenomic DNA from sorghum (Sorghum bicolor L. Moench) stem and root tissues were extracted using two classical DNA extraction protocols (CTAB- and SDS-based) and five commercial kits. The mDNA yields and quality as well as the reproducibility were compared. 16S rRNA gene terminal restriction fragment length polymorphism (t-RFLP) was used to assess the impact on endophytic bacterial community structures observed. Generally, the classical protocols obtained high mDNA yields from sorghum tissues; however, they were less reproducible than the commercial kits. Commercial kits retrieved higher quality mDNA, but with lower endophytic bacterial diversities compared to classical protocols. The SDS-based protocol enabled access to the highest sorghum endophytic diversities. Therefore, "SDS-extracted" sorghum root and stem microbiome diversities were analysed via 454 pyrosequencing, and this revealed that the two tissues harbour significantly different endophytic communities. Nevertheless, both communities are dominated by agriculturally important genera such as Microbacterium, Agrobacterium, Sphingobacterium, Herbaspirillum, Erwinia, Pseudomonas and Stenotrophomonas; which have previously been shown to play a role in plant growth promotion. This study shows that DNA extraction protocols introduce biases in culture-independent studies of environmental microbial communities by influencing the mDNA quality, which impacts the microbial diversity analyses and evaluation. Using the broad-spectrum SDS-based DNA extraction protocol allows the recovery of the most

  19. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea

    Energy Technology Data Exchange (ETDEWEB)

    Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas; Harmon-Smith, Miranda; Doud, Devin; Reddy, T. B. K.; Schulz, Frederik; Jarett, Jessica; Rivers, Adam R.; Eloe-Fadrosh, Emiley A.; Tringe, Susannah G.; Ivanova, Natalia N.; Copeland, Alex; Clum, Alicia; Becraft, Eric D.; Malmstrom, Rex R.; Birren, Bruce; Podar, Mircea; Bork, Peer; Weinstock, George M.; Garrity, George M.; Dodsworth, Jeremy A.; Yooseph, Shibu; Sutton, Granger; Glöckner, Frank O.; Gilbert, Jack A.; Nelson, William C.; Hallam, Steven J.; Jungbluth, Sean P.; Ettema, Thijs J. G.; Tighe, Scott; Konstantinidis, Konstantinos T.; Liu, Wen-Tso; Baker, Brett J.; Rattei, Thomas; Eisen, Jonathan A.; Hedlund, Brian; McMahon, Katherine D.; Fierer, Noah; Knight, Rob; Finn, Rob; Cochrane, Guy; Karsch-Mizrachi, Ilene; Tyson, Gene W.; Rinke, Christian; Kyrpides, Nikos C.; Schriml, Lynn; Garrity, George M.; Hugenholtz, Philip; Sutton, Granger; Yilmaz, Pelin; Meyer, Folker; Glöckner, Frank O.; Gilbert, Jack A.; Knight, Rob; Finn, Rob; Cochrane, Guy; Karsch-Mizrachi, Ilene; Lapidus, Alla; Meyer, Folker; Yilmaz, Pelin; Parks, Donovan H.; Eren, A. M.; Schriml, Lynn; Banfield, Jillian F.; Hugenholtz, Philip; Woyke, Tanja

    2017-08-08

    We present two standards developed by the Genomic Standards Consortium (GSC) for reporting bacterial and archaeal genome sequences. Both are extensions of the Minimum Information about Any (x) Sequence (MIxS). The standards are the Minimum Information about a Single Amplified Genome (MISAG) and the Minimum Information about a Metagenome-Assembled Genome (MIMAG), including, but not limited to, assembly quality, and estimates of genome completeness and contamination. These standards can be used in combination with other GSC checklists, including the Minimum Information about a Genome Sequence (MIGS), Minimum Information about a Metagenomic Sequence (MIMS), and Minimum Information about a Marker Gene Sequence (MIMARKS). Community-wide adoption of MISAG and MIMAG will facilitate more robust comparative genomic analyses of bacterial and archaeal diversity.

  20. Comparison of Boiling and Robotics Automation Method in DNA Extraction for Metagenomic Sequencing of Human Oral Microbes.

    Directory of Open Access Journals (Sweden)

    Junya Yamagishi

    Full Text Available The rapid improvement of next-generation sequencing performance now enables us to analyze huge sample sets with more than ten thousand specimens. However, DNA extraction can still be a limiting step in such metagenomic approaches. In this study, we analyzed human oral microbes to compare the performance of three DNA extraction methods: PowerSoil (a method widely used in this field, QIAsymphony (a robotics method, and a simple boiling method. Dental plaque was initially collected from three volunteers in the pilot study and then expanded to 12 volunteers in the follow-up study. Bacterial flora was estimated by sequencing the V4 region of 16S rRNA following species-level profiling. Our results indicate that the efficiency of PowerSoil and QIAsymphony was comparable to the boiling method. Therefore, the boiling method may be a promising alternative because of its simplicity, cost effectiveness, and short handling time. Moreover, this method was reliable for estimating bacterial species and could be used in the future to examine the correlation between oral flora and health status. Despite this, differences in the efficiency of DNA extraction for various bacterial species were observed among the three methods. Based on these findings, there is no "gold standard" for DNA extraction. In future, we suggest that the DNA extraction method should be selected on a case-by-case basis considering the aims and specimens of the study.

  1. Comparison of Boiling and Robotics Automation Method in DNA Extraction for Metagenomic Sequencing of Human Oral Microbes.

    Science.gov (United States)

    Yamagishi, Junya; Sato, Yukuto; Shinozaki, Natsuko; Ye, Bin; Tsuboi, Akito; Nagasaki, Masao; Yamashita, Riu

    2016-01-01

    The rapid improvement of next-generation sequencing performance now enables us to analyze huge sample sets with more than ten thousand specimens. However, DNA extraction can still be a limiting step in such metagenomic approaches. In this study, we analyzed human oral microbes to compare the performance of three DNA extraction methods: PowerSoil (a method widely used in this field), QIAsymphony (a robotics method), and a simple boiling method. Dental plaque was initially collected from three volunteers in the pilot study and then expanded to 12 volunteers in the follow-up study. Bacterial flora was estimated by sequencing the V4 region of 16S rRNA following species-level profiling. Our results indicate that the efficiency of PowerSoil and QIAsymphony was comparable to the boiling method. Therefore, the boiling method may be a promising alternative because of its simplicity, cost effectiveness, and short handling time. Moreover, this method was reliable for estimating bacterial species and could be used in the future to examine the correlation between oral flora and health status. Despite this, differences in the efficiency of DNA extraction for various bacterial species were observed among the three methods. Based on these findings, there is no "gold standard" for DNA extraction. In future, we suggest that the DNA extraction method should be selected on a case-by-case basis considering the aims and specimens of the study.

  2. Extracting useful information from images

    DEFF Research Database (Denmark)

    Kucheryavskiy, Sergey

    2011-01-01

    The paper presents an overview of methods for extracting useful information from digital images. It covers various approaches that utilized different properties of images, like intensity distribution, spatial frequencies content and several others. A few case studies including isotropic...... and heterogeneous, congruent and non-congruent images are used to illustrate how the described methods work and to compare some of them...

  3. MetaLIMS, a simple open-source laboratory information management system for small metagenomic labs.

    Science.gov (United States)

    Heinle, Cassie Elizabeth; Gaultier, Nicolas Paul Eugène; Miller, Dana; Purbojati, Rikky Wenang; Lauro, Federico M

    2017-06-01

    As the cost of sequencing continues to fall, smaller groups increasingly initiate and manage larger sequencing projects and take on the complexity of data storage for high volumes of samples. This has created a need for low-cost laboratory information management systems (LIMS) that contain flexible fields to accommodate the unique nature of individual labs. Many labs do not have a dedicated information technology position, so LIMS must also be easy to setup and maintain with minimal technical proficiency. MetaLIMS is a free and open-source web-based application available via GitHub. The focus of MetaLIMS is to store sample metadata prior to sequencing and analysis pipelines. Initially designed for environmental metagenomics labs, in addition to storing generic sample collection information and DNA/RNA processing information, the user can also add fields specific to the user's lab. MetaLIMS can also produce a basic sequencing submission form compatible with the proprietary Clarity LIMS system used by some sequencing facilities. To help ease the technical burden associated with web deployment, MetaLIMS options the use of commercial web hosting combined with MetaLIMS bash scripts for ease of setup. MetaLIMS overcomes key challenges common in LIMS by giving labs access to a low-cost and open-source tool that also has the flexibility to meet individual lab needs and an option for easy deployment. By making the web application open source and hosting it on GitHub, we hope to encourage the community to build upon MetaLIMS, making it more robust and tailored to the needs of more researchers. © The Authors 2017. Published by Oxford University Press.

  4. BioCreative Workshops for DOE Genome Sciences: Text Mining for Metagenomics

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Cathy H. [Univ. of Delaware, Newark, DE (United States). Center for Bioinformatics and Computational Biology; Hirschman, Lynette [The MITRE Corporation, Bedford, MA (United States)

    2016-10-29

    The objective of this project was to host BioCreative workshops to define and develop text mining tasks to meet the needs of the Genome Sciences community, focusing on metadata information extraction in metagenomics. Following the successful introduction of metagenomics at the BioCreative IV workshop, members of the metagenomics community and BioCreative communities continued discussion to identify candidate topics for a BioCreative metagenomics track for BioCreative V. Of particular interest was the capture of environmental and isolation source information from text. The outcome was to form a “community of interest” around work on the interactive EXTRACT system, which supported interactive tagging of environmental and species data. This experiment is included in the BioCreative V virtual issue of Database. In addition, there was broad participation by members of the metagenomics community in the panels held at BioCreative V, leading to valuable exchanges between the text mining developers and members of the metagenomics research community. These exchanges are reflected in a number of the overview and perspective pieces also being captured in the BioCreative V virtual issue. Overall, this conversation has exposed the metagenomics researchers to the possibilities of text mining, and educated the text mining developers to the specific needs of the metagenomics community.

  5. Optimizing protocols for extraction of bacteriophages prior to metagenomic analyses of phage communities in the human gut.

    Science.gov (United States)

    Castro-Mejía, Josué L; Muhammed, Musemma K; Kot, Witold; Neve, Horst; Franz, Charles M A P; Hansen, Lars H; Vogensen, Finn K; Nielsen, Dennis S

    2015-11-17

    The human gut is densely populated with archaea, eukaryotes, bacteria, and their viruses, such as bacteriophages. Advances in high-throughput sequencing (HTS) as well as bioinformatics have opened new opportunities for characterizing the viral communities harbored in our gut. However, limited attention has been given to the efficiency of protocols dealing with extraction of phages from fecal communities prior to HTS and their impact on the metagenomic dataset. We describe two optimized methods for extraction of phages from fecal samples based on tangential-flow filtration (TFF) and polyethylene glycol precipitation (PEG) approaches using an adapted method from a published protocol as control (literature-adapted protocol (LIT)). To quantify phage recovery, samples were spiked with low numbers of c2, ϕ29, and T4 phages (representatives of the Siphoviridae, Podoviridae, and Myoviridae families, respectively) and their concentration (plaque-forming units) followed at every step during the extraction procedure. Compared with LIT, TFF and PEG had higher recovery of all spiked phages, yielding up to 16 times more phage particles (PPs) and up to 68 times more phage DNA per volume, increasing thus the chances of extracting low abundant phages. TFF- and PEG-derived metaviromes showed 10% increase in relative abundance of Caudovirales and unclassified phages infecting gut-associated bacteria (>92% for TFF and PEG, 82.4% for LIT). Our methods obtained lower relative abundance of the Myoviridae family (32.5%, LIT 22.6%), which was achieved with the enhanced conditions of our procedures (e.g., reduced filter clogging). A high degree of phage diversity in samples extracted using TFF and PEG was documented by transmission electron microscopy. Two procedures (TFF and PEG) for extraction of bacteriophages from fecal samples were optimized using a set of spiked bacteriophages as process control. These protocols are highly efficient tools for extraction and purification of PPs prior

  6. A primer on metagenomics.

    Directory of Open Access Journals (Sweden)

    John C Wooley

    2010-02-01

    Full Text Available Metagenomics is a discipline that enables the genomic study of uncultured microorganisms. Faster, cheaper sequencing technologies and the ability to sequence uncultured microbes sampled directly from their habitats are expanding and transforming our view of the microbial world. Distilling meaningful information from the millions of new genomic sequences presents a serious challenge to bioinformaticians. In cultured microbes, the genomic data come from a single clone, making sequence assembly and annotation tractable. In metagenomics, the data come from heterogeneous microbial communities, sometimes containing more than 10,000 species, with the sequence data being noisy and partial. From sampling, to assembly, to gene calling and function prediction, bioinformatics faces new demands in interpreting voluminous, noisy, and often partial sequence data. Although metagenomics is a relative newcomer to science, the past few years have seen an explosion in computational methods applied to metagenomic-based research. It is therefore not within the scope of this article to provide an exhaustive review. Rather, we provide here a concise yet comprehensive introduction to the current computational requirements presented by metagenomics, and review the recent progress made. We also note whether there is software that implements any of the methods presented here, and briefly review its utility. Nevertheless, it would be useful if readers of this article would avail themselves of the comment section provided by this journal, and relate their own experiences. Finally, the last section of this article provides a few representative studies illustrating different facets of recent scientific discoveries made using metagenomics.

  7. Extracting information from multiplex networks

    Science.gov (United States)

    Iacovacci, Jacopo; Bianconi, Ginestra

    2016-06-01

    Multiplex networks are generalized network structures that are able to describe networks in which the same set of nodes are connected by links that have different connotations. Multiplex networks are ubiquitous since they describe social, financial, engineering, and biological networks as well. Extending our ability to analyze complex networks to multiplex network structures increases greatly the level of information that is possible to extract from big data. For these reasons, characterizing the centrality of nodes in multiplex networks and finding new ways to solve challenging inference problems defined on multiplex networks are fundamental questions of network science. In this paper, we discuss the relevance of the Multiplex PageRank algorithm for measuring the centrality of nodes in multilayer networks and we characterize the utility of the recently introduced indicator function Θ ˜ S for describing their mesoscale organization and community structure. As working examples for studying these measures, we consider three multiplex network datasets coming for social science.

  8. An Improved Method for High Quality Metagenomics DNA Extraction from Human and Environmental Samples

    DEFF Research Database (Denmark)

    Bag, Satyabrata; Saha, Bipasa; Mehta, Ojasvi

    2016-01-01

    To explore the natural microbial community of any ecosystems by high-resolution molecular approaches including next generation sequencing, it is extremely important to develop a sensitive and reproducible DNA extraction method that facilitate isolation of microbial DNA of sufficient purity...... and quantity from culturable and uncultured microbial species living in that environment. Proper lysis of heterogeneous community microbial cells without damaging their genomes is a major challenge. In this study, we have developed an improved method for extraction of community DNA from different environmental...... methodologies and the supremacy of our method was confirmed. Maximum recovery of genomic DNA in the absence of substantial amount of impurities made the method convenient for nucleic acid extraction. The nucleic acids obtained using this method are suitable for different downstream applications. This improved...

  9. Transductive Pattern Learning for Information Extraction

    National Research Council Canada - National Science Library

    McLernon, Brian; Kushmerick, Nicholas

    2006-01-01

    .... We present TPLEX, a semi-supervised learning algorithm for information extraction that can acquire extraction patterns from a small amount of labelled text in conjunction with a large amount of unlabelled text...

  10. Microbial food safety: Potential of DNA extraction methods for use in diagnostic metagenomics

    DEFF Research Database (Denmark)

    Josefsen, Mathilde Hasseldam; Andersen, Sandra Christine; Christensen, Julia

    2015-01-01

    ) yielding protocols. The PowerLyzer PowerSoil DNA Isolation Kit performed significantly better than all other protocols tested. Selected protocols were modified, i.e., extended heating and homogenization, resulting in increased yields of total DNA. For QIAamp Fast DNA Stool Mini Kit (Qiagen) a 7-fold...... of the protocols to extract DNA was observed. The highest DNA yield was obtained with the PowerLyzer PowerSoil DNA Isolation Kit, whereas the FastDNA SPIN Kit for Feces (MP Biomedicals) resulted in the highest amount of PCR-amplifiable C. jejuni DNA....

  11. Metagenomics and Applications

    Directory of Open Access Journals (Sweden)

    L Rafati

    2016-11-01

    Full Text Available Introduction: Bacteria are a group of microorganisms which in contrast to their diversity in nature, only very few of them can be grown and isolated in the current standard laboratories. Metagenomics as a new field of research, during the last decade has worked on clarification of the genomes of the non-cultured microbes and researchers around the world with serious study of this group of bacteria, looking for new compounds such as new antibiotics, anti-cancer agents, new enzymes and biomolecules. Methods: This article is reviews study which with study of Texts and Internet and handy browsing of key words from reliable scientific resources and sites amongst: Google Scholar, Pub med, Science direct, Sid and Scopus in the years 2000 to 2013 were collected and studied. Results: The data collection instrument in the study includes all printed metagenomics related texts. Although, nowadays metagenomics is used to screen samples but now as a perfect technique beside the medium application and other traditional techniques will have better position. The highest usage of metagenomics is in clinical cases where with conventional techniques can't be discovered microbial reasons. So for tests and analyze information need to skilled scientists. Conclusion: This paper focuses on some of the latest achievements of Metagenomics and its application in new drugs, detection of enzymes, potential of biotechnology and environment.

  12. Exploration of noncoding sequences in metagenomes.

    Directory of Open Access Journals (Sweden)

    Fabián Tobar-Tosse

    Full Text Available Environment-dependent genomic features have been defined for different metagenomes, whose genes and their associated processes are related to specific environments. Identification of ORFs and their functional categories are the most common methods for association between functional and environmental features. However, this analysis based on finding ORFs misses noncoding sequences and, therefore, some metagenome regulatory or structural information could be discarded. In this work we analyzed 23 whole metagenomes, including coding and noncoding sequences using the following sequence patterns: (G+C content, Codon Usage (Cd, Trinucleotide Usage (Tn, and functional assignments for ORF prediction. Herein, we present evidence of a high proportion of noncoding sequences discarded in common similarity-based methods in metagenomics, and the kind of relevant information present in those. We found a high density of trinucleotide repeat sequences (TRS in noncoding sequences, with a regulatory and adaptive function for metagenome communities. We present associations between trinucleotide values and gene function, where metagenome clustering correlate with microorganism adaptations and kinds of metagenomes. We propose here that noncoding sequences have relevant information to describe metagenomes that could be considered in a whole metagenome analysis in order to improve their organization, classification protocols, and their relation with the environment.

  13. Marine metagenomics as a source for bioprospecting

    KAUST Repository

    Kodzius, Rimantas

    2015-08-12

    This review summarizes usage of genome-editing technologies for metagenomic studies; these studies are used to retrieve and modify valuable microorganisms for production, particularly in marine metagenomics. Organisms may be cultivable or uncultivable. Metagenomics is providing especially valuable information for uncultivable samples. The novel genes, pathways and genomes can be deducted. Therefore, metagenomics, particularly genome engineering and system biology, allows for the enhancement of biological and chemical producers and the creation of novel bioresources. With natural resources rapidly depleting, genomics may be an effective way to efficiently produce quantities of known and novel foods, livestock feed, fuels, pharmaceuticals and fine or bulk chemicals.

  14. Soil-specific limitations for access and analysis of soil microbial communities by metagenomics.

    Science.gov (United States)

    Lombard, Nathalie; Prestat, Emmanuel; van Elsas, Jan Dirk; Simonet, Pascal

    2011-10-01

    Metagenomics approaches represent an important way to acquire information on the microbial communities present in complex environments like soil. However, to what extent do these approaches provide us with a true picture of soil microbial diversity? Soil is a challenging environment to work with. Its physicochemical properties affect microbial distributions inside the soil matrix, metagenome extraction and its subsequent analyses. To better understand the bias inherent to soil metagenome 'processing', we focus on soil physicochemical properties and their effects on the perceived bacterial distribution. In the light of this information, each step of soil metagenome processing is then discussed, with an emphasis on strategies for optimal soil sampling. Then, the interaction of cells and DNA with the soil matrix and the consequences for microbial DNA extraction are examined. Soil DNA extraction methods are compared and the veracity of the microbial profiles obtained is discussed. Finally, soil metagenomic sequence analysis and exploitation methods are reviewed. © 2011 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.

  15. DKIE: Open Source Information Extraction for Danish

    DEFF Research Database (Denmark)

    Derczynski, Leon; Field, Camilla Vilhelmsen; Bøgh, Kenneth Sejdenfaden

    2014-01-01

    Danish is a major Scandinavian language spoken daily by around six million people. However, it lacks a unified, open set of NLP tools. This demonstration will introduce DKIE, an extensible open-source toolkit for processing Danish text. We implement an information extraction architecture for Danish...

  16. Metagenomic Analysis of Dairy Bacteriophages

    DEFF Research Database (Denmark)

    Muhammed, Musemma K.; Kot, Witold; Neve, Horst

    2017-01-01

    Despite their huge potential for characterizing the biodiversity of phages, metagenomic studies are currently not available for dairy bacteriophages, partly due to the lack of a standard procedure for phage extraction. We optimized an extraction method that allows to remove the bulk protein from ...... diversity. Possible co-induction of temperate P335 prophages and satellite phages in one of the whey mixtures was also observed....

  17. Unsupervised information extraction by text segmentation

    CERN Document Server

    Cortez, Eli

    2013-01-01

    A new unsupervised approach to the problem of Information Extraction by Text Segmentation (IETS) is proposed, implemented and evaluated herein. The authors' approach relies on information available on pre-existing data to learn how to associate segments in the input string with attributes of a given domain relying on a very effective set of content-based features. The effectiveness of the content-based features is also exploited to directly learn from test data structure-based features, with no previous human-driven training, a feature unique to the presented approach. Based on the approach, a

  18. Extracting the information backbone in online system.

    Science.gov (United States)

    Zhang, Qian-Ming; Zeng, An; Shang, Ming-Sheng

    2013-01-01

    Information overload is a serious problem in modern society and many solutions such as recommender system have been proposed to filter out irrelevant information. In the literature, researchers have been mainly dedicated to improving the recommendation performance (accuracy and diversity) of the algorithms while they have overlooked the influence of topology of the online user-object bipartite networks. In this paper, we find that some information provided by the bipartite networks is not only redundant but also misleading. With such "less can be more" feature, we design some algorithms to improve the recommendation performance by eliminating some links from the original networks. Moreover, we propose a hybrid method combining the time-aware and topology-aware link removal algorithms to extract the backbone which contains the essential information for the recommender systems. From the practical point of view, our method can improve the performance and reduce the computational time of the recommendation system, thus improving both of their effectiveness and efficiency.

  19. Exploring neighborhoods in the metagenome universe.

    Science.gov (United States)

    Aßhauer, Kathrin P; Klingenberg, Heiner; Lingner, Thomas; Meinicke, Peter

    2014-07-14

    The variety of metagenomes in current databases provides a rapidly growing source of information for comparative studies. However, the quantity and quality of supplementary metadata is still lagging behind. It is therefore important to be able to identify related metagenomes by means of the available sequence data alone. We have studied efficient sequence-based methods for large-scale identification of similar metagenomes within a database retrieval context. In a broad comparison of different profiling methods we found that vector-based distance measures are well-suitable for the detection of metagenomic neighbors. Our evaluation on more than 1700 publicly available metagenomes indicates that for a query metagenome from a particular habitat on average nine out of ten nearest neighbors represent the same habitat category independent of the utilized profiling method or distance measure. While for well-defined labels a neighborhood accuracy of 100% can be achieved, in general the neighbor detection is severely affected by a natural overlap of manually annotated categories. In addition, we present results of a novel visualization method that is able to reflect the similarity of metagenomes in a 2D scatter plot. The visualization method shows a similarly high accuracy in the reduced space as compared with the high-dimensional profile space. Our study suggests that for inspection of metagenome neighborhoods the profiling methods and distance measures can be chosen to provide a convenient interpretation of results in terms of the underlying features. Furthermore, supplementary metadata of metagenome samples in the future needs to comply with readily available ontologies for fine-grained and standardized annotation. To make profile-based k-nearest-neighbor search and the 2D-visualization of the metagenome universe available to the research community, we included the proposed methods in our CoMet-Universe server for comparative metagenome analysis.

  20. Exploring Neighborhoods in the Metagenome Universe

    Science.gov (United States)

    Aßhauer, Kathrin P.; Klingenberg, Heiner; Lingner, Thomas; Meinicke, Peter

    2014-01-01

    The variety of metagenomes in current databases provides a rapidly growing source of information for comparative studies. However, the quantity and quality of supplementary metadata is still lagging behind. It is therefore important to be able to identify related metagenomes by means of the available sequence data alone. We have studied efficient sequence-based methods for large-scale identification of similar metagenomes within a database retrieval context. In a broad comparison of different profiling methods we found that vector-based distance measures are well-suitable for the detection of metagenomic neighbors. Our evaluation on more than 1700 publicly available metagenomes indicates that for a query metagenome from a particular habitat on average nine out of ten nearest neighbors represent the same habitat category independent of the utilized profiling method or distance measure. While for well-defined labels a neighborhood accuracy of 100% can be achieved, in general the neighbor detection is severely affected by a natural overlap of manually annotated categories. In addition, we present results of a novel visualization method that is able to reflect the similarity of metagenomes in a 2D scatter plot. The visualization method shows a similarly high accuracy in the reduced space as compared with the high-dimensional profile space. Our study suggests that for inspection of metagenome neighborhoods the profiling methods and distance measures can be chosen to provide a convenient interpretation of results in terms of the underlying features. Furthermore, supplementary metadata of metagenome samples in the future needs to comply with readily available ontologies for fine-grained and standardized annotation. To make profile-based k-nearest-neighbor search and the 2D-visualization of the metagenome universe available to the research community, we included the proposed methods in our CoMet-Universe server for comparative metagenome analysis. PMID:25026170

  1. Metaproteomics: extracting and mining proteome information to characterize metabolic activities in microbial communities

    Energy Technology Data Exchange (ETDEWEB)

    Abraham, Paul E [ORNL; Giannone, Richard J [ORNL; Xiong, Weili [ORNL; Hettich, Robert {Bob} L [ORNL

    2014-01-01

    Contemporary microbial ecology studies usually employ one or more omics approaches to investigate the structure and function of microbial communities. Among these, metaproteomics aims to characterize the metabolic activities of the microbial membership, providing a direct link between the genetic potential and functional metabolism. The successful deployment of metaproteomics research depends on the integration of high-quality experimental and bioinformatic techniques for uncovering the metabolic activities of a microbial community in a way that is complementary to other meta-omic approaches. The essential, quality-defining informatics steps in metaproteomics investigations are: (1) construction of the metagenome, (2) functional annotation of predicted protein-coding genes, (3) protein database searching, (4) protein inference, and (5) extraction of metabolic information. In this article, we provide an overview of current bioinformatic approaches and software implementations in metaproteome studies in order to highlight the key considerations needed for successful implementation of this powerful community-biology tool.

  2. Extraction of information from unstructured text

    Energy Technology Data Exchange (ETDEWEB)

    Irwin, N.H.; DeLand, S.M.; Crowder, S.V.

    1995-11-01

    Extracting information from unstructured text has become an emphasis in recent years due to the large amount of text now electronically available. This status report describes the findings and work done by the end of the first year of a two-year LDRD. Requirements of the approach included that it model the information in a domain independent way. This means that it would differ from current systems by not relying on previously built domain knowledge and that it would do more than keyword identification. Three areas that are discussed and expected to contribute to a solution include (1) identifying key entities through document level profiling and preprocessing, (2) identifying relationships between entities through sentence level syntax, and (3) combining the first two with semantic knowledge about the terms.

  3. LAND COVER INFORMATION EXTRACTION USING LIDAR DATA

    Directory of Open Access Journals (Sweden)

    A. Shaker

    2012-07-01

    Full Text Available Light Detection and Ranging (LiDAR systems are used intensively in terrain surface modelling based on the range data determined by the LiDAR sensors. LiDAR sensors record the distance between the sensor and the targets (range data with a capability to record the strength of the backscatter energy reflected from the targets (intensity data. The LiDAR sensors use the near-infrared spectrum range which has high separability in the reflected energy from different targets. This characteristic is investigated to implement the LiDAR intensity data in land-cover classification. The goal of this paper is to investigate and evaluates the use of LiDAR data only (range and intensity data to extract land cover information. Different bands generated from the LiDAR data (Normal Heights, Intensity Texture, Surfaces Slopes, and PCA are combined with the original data to study the influence of including these layers on the classification accuracy. The Maximum likelihood classifier is used to conduct the classification process for the LiDAR Data as one of the best classification techniques from literature. A study area covering an urban district in Burnaby, British Colombia, Canada, is selected to test the different band combinations to extract four information classes: buildings, roads and parking areas, trees, and low vegetation (grass areas. The results show that an overall accuracy of more than 70% can be achieved using the intensity data, and other auxiliary data generated from the range and intensity data. Bands of the Principle Component Analysis (PCA are also created from the LiDAR original and auxiliary data. Similar overall accuracy of the results can be achieved using the four bands extracted from the Principal Component Analysis (PCA.

  4. Land Cover Information Extraction Using LIDAR Data

    Science.gov (United States)

    Shaker, A.; El-Ashmawy, N.

    2012-07-01

    Light Detection and Ranging (LiDAR) systems are used intensively in terrain surface modelling based on the range data determined by the LiDAR sensors. LiDAR sensors record the distance between the sensor and the targets (range data) with a capability to record the strength of the backscatter energy reflected from the targets (intensity data). The LiDAR sensors use the near-infrared spectrum range which has high separability in the reflected energy from different targets. This characteristic is investigated to implement the LiDAR intensity data in land-cover classification. The goal of this paper is to investigate and evaluates the use of LiDAR data only (range and intensity data) to extract land cover information. Different bands generated from the LiDAR data (Normal Heights, Intensity Texture, Surfaces Slopes, and PCA) are combined with the original data to study the influence of including these layers on the classification accuracy. The Maximum likelihood classifier is used to conduct the classification process for the LiDAR Data as one of the best classification techniques from literature. A study area covering an urban district in Burnaby, British Colombia, Canada, is selected to test the different band combinations to extract four information classes: buildings, roads and parking areas, trees, and low vegetation (grass) areas. The results show that an overall accuracy of more than 70% can be achieved using the intensity data, and other auxiliary data generated from the range and intensity data. Bands of the Principle Component Analysis (PCA) are also created from the LiDAR original and auxiliary data. Similar overall accuracy of the results can be achieved using the four bands extracted from the Principal Component Analysis (PCA).

  5. Evaluation of methods for the concentration and extraction of viruses from sewage in the context of metagenomic sequencing

    DEFF Research Database (Denmark)

    Hjelmsø, Mathis Hjort; Hellmér, Maria; Fernandez-Cassi, Xavier

    2017-01-01

    concentrations. This necessitates a step of sample concentration to allow for sensitive virus detection. Additionally, viruses harbor a large diversity of both surface and genome structures, which makes universal viral genomic extraction difficult. Current studies have tackled these challenges in many different...... this study aimed to evaluate the efficiency of four commonly applied viral concentrations techniques (precipitation with polyethylene glycol, organic flocculation with skim milk, monolithic adsorption filtration and glass wool filtration) and extraction methods (Nucleospin RNA XS, QIAamp Viral RNA Mini Kit...... or PowerViral® Environmental RNA/DNA Isolation Kit. Highest viral specificity were found in samples concentrated by precipitation with polyethylene glycol or extracted with Nucleospin RNA XS. Detection of viral pathogens depended on the method used. These results contribute to the understanding of method...

  6. Extracting the information backbone in online system.

    Directory of Open Access Journals (Sweden)

    Qian-Ming Zhang

    Full Text Available Information overload is a serious problem in modern society and many solutions such as recommender system have been proposed to filter out irrelevant information. In the literature, researchers have been mainly dedicated to improving the recommendation performance (accuracy and diversity of the algorithms while they have overlooked the influence of topology of the online user-object bipartite networks. In this paper, we find that some information provided by the bipartite networks is not only redundant but also misleading. With such "less can be more" feature, we design some algorithms to improve the recommendation performance by eliminating some links from the original networks. Moreover, we propose a hybrid method combining the time-aware and topology-aware link removal algorithms to extract the backbone which contains the essential information for the recommender systems. From the practical point of view, our method can improve the performance and reduce the computational time of the recommendation system, thus improving both of their effectiveness and efficiency.

  7. Extracting the Information Backbone in Online System

    Science.gov (United States)

    Zhang, Qian-Ming; Zeng, An; Shang, Ming-Sheng

    2013-01-01

    Information overload is a serious problem in modern society and many solutions such as recommender system have been proposed to filter out irrelevant information. In the literature, researchers have been mainly dedicated to improving the recommendation performance (accuracy and diversity) of the algorithms while they have overlooked the influence of topology of the online user-object bipartite networks. In this paper, we find that some information provided by the bipartite networks is not only redundant but also misleading. With such “less can be more” feature, we design some algorithms to improve the recommendation performance by eliminating some links from the original networks. Moreover, we propose a hybrid method combining the time-aware and topology-aware link removal algorithms to extract the backbone which contains the essential information for the recommender systems. From the practical point of view, our method can improve the performance and reduce the computational time of the recommendation system, thus improving both of their effectiveness and efficiency. PMID:23690946

  8. [Information extraction methodology used in electronic medical records].

    Science.gov (United States)

    Chen, Yingying; Ye, Feng

    2011-01-01

    We try to use information extraction technology in some parts of the medical records and extract disease information to accumulate experience for extracting complete information from medical records. This paper attempts to use dictionary and rules to achieve the named entity recognition. Information extraction is based on shallow parsing and use pattern sentence matching method with the help of a 3 levels finite state automaton.

  9. Extraction of quantifiable information from complex systems

    CERN Document Server

    Dahmen, Wolfgang; Griebel, Michael; Hackbusch, Wolfgang; Ritter, Klaus; Schneider, Reinhold; Schwab, Christoph; Yserentant, Harry

    2014-01-01

    In April 2007, the  Deutsche Forschungsgemeinschaft (DFG) approved the  Priority Program 1324 “Mathematical Methods for Extracting Quantifiable Information from Complex Systems.” This volume presents a comprehensive overview of the most important results obtained over the course of the program.   Mathematical models of complex systems provide the foundation for further technological developments in science, engineering and computational finance.  Motivated by the trend toward steadily increasing computer power, ever more realistic models have been developed in recent years. These models have also become increasingly complex, and their numerical treatment poses serious challenges.   Recent developments in mathematics suggest that, in the long run, much more powerful numerical solution strategies could be derived if the interconnections between the different fields of research were systematically exploited at a conceptual level. Accordingly, a deeper understanding of the mathematical foundations as w...

  10. Comparative fecal metagenomics unveils unique functional capacity of the swine gut

    Directory of Open Access Journals (Sweden)

    Martinson John

    2011-05-01

    Full Text Available Abstract Background Uncovering the taxonomic composition and functional capacity within the swine gut microbial consortia is of great importance to animal physiology and health as well as to food and water safety due to the presence of human pathogens in pig feces. Nonetheless, limited information on the functional diversity of the swine gut microbiome is available. Results Analysis of 637, 722 pyrosequencing reads (130 megabases generated from Yorkshire pig fecal DNA extracts was performed to help better understand the microbial diversity and largely unknown functional capacity of the swine gut microbiome. Swine fecal metagenomic sequences were annotated using both MG-RAST and JGI IMG/M-ER pipelines. Taxonomic analysis of metagenomic reads indicated that swine fecal microbiomes were dominated by Firmicutes and Bacteroidetes phyla. At a finer phylogenetic resolution, Prevotella spp. dominated the swine fecal metagenome, while some genes associated with Treponema and Anareovibrio species were found to be exclusively within the pig fecal metagenomic sequences analyzed. Functional analysis revealed that carbohydrate metabolism was the most abundant SEED subsystem, representing 13% of the swine metagenome. Genes associated with stress, virulence, cell wall and cell capsule were also abundant. Virulence factors associated with antibiotic resistance genes with highest sequence homology to genes in Bacteroidetes, Clostridia, and Methanosarcina were numerous within the gene families unique to the swine fecal metagenomes. Other abundant proteins unique to the distal swine gut shared high sequence homology to putative carbohydrate membrane transporters. Conclusions The results from this metagenomic survey demonstrated the presence of genes associated with resistance to antibiotics and carbohydrate metabolism suggesting that the swine gut microbiome may be shaped by husbandry practices.

  11. Metagenomic Analysis of Dairy Bacteriophages: Extraction Method and Pilot Study on Whey Samples Derived from Using Undefined and Defined Mesophilic Starter Cultures.

    Science.gov (United States)

    Muhammed, Musemma K; Kot, Witold; Neve, Horst; Mahony, Jennifer; Castro-Mejía, Josué L; Krych, Lukasz; Hansen, Lars H; Nielsen, Dennis S; Sørensen, Søren J; Heller, Knut J; van Sinderen, Douwe; Vogensen, Finn K

    2017-10-01

    Despite being potentially highly useful for characterizing the biodiversity of phages, metagenomic studies are currently not available for dairy bacteriophages, partly due to the lack of a standard procedure for phage extraction. We optimized an extraction method that allows the removal of the bulk protein from whey and milk samples with losses of less than 50% of spiked phages. The protocol was applied to extract phages from whey in order to test the notion that members of Lactococcus lactis 936 (now Sk1virus ), P335, c2 (now C2virus ) and Leuconostoc phage groups are the most frequently encountered in the dairy environment. The relative abundance and diversity of phages in eight and four whey mixtures from dairies using undefined mesophilic mixed-strain cultures containing Lactococcus lactis subsp. lactis biovar diacetylactis and Leuconostoc species (i.e., DL starter cultures) and defined cultures, respectively, were assessed. Results obtained from transmission electron microscopy and high-throughput sequence analyses revealed the dominance of Lc. lactis 936 phages (order Caudovirales , family Siphoviridae ) in dairies using undefined DL starter cultures and Lc. lactis c2 phages (order Caudovirales , family Siphoviridae ) in dairies using defined cultures. The 936 and Leuconostoc phages demonstrated limited diversity. Possible coinduction of temperate P335 prophages and satellite phages in one of the whey mixtures was also observed. IMPORTANCE The method optimized in this study could provide an important basis for understanding the dynamics of the phage community (abundance, development, diversity, evolution, etc.) in dairies with different sizes, locations, and production strategies. It may also enable the discovery of previously unknown phages, which is crucial for the development of rapid molecular biology-based methods for phage burden surveillance systems. The dominance of only a few phage groups in the dairy environment signifies the depth of knowledge

  12. Respiratory Information Extraction from Electrocardiogram Signals

    KAUST Repository

    Amin, Gamal El Din Fathy

    2010-12-01

    The Electrocardiogram (ECG) is a tool measuring the electrical activity of the heart, and it is extensively used for diagnosis and monitoring of heart diseases. The ECG signal reflects not only the heart activity but also many other physiological processes. The respiratory activity is a prominent process that affects the ECG signal due to the close proximity of the heart and the lungs. In this thesis, several methods for the extraction of respiratory process information from the ECG signal are presented. These methods allow an estimation of the lung volume and the lung pressure from the ECG signal. The potential benefit of this is to eliminate the corresponding sensors used to measure the respiration activity. A reduction of the number of sensors connected to patients will increase patients’ comfort and reduce the costs associated with healthcare. As a further result, the efficiency of diagnosing respirational disorders will increase since the respiration activity can be monitored with a common, widely available method. The developed methods can also improve the detection of respirational disorders that occur while patients are sleeping. Such disorders are commonly diagnosed in sleeping laboratories where the patients are connected to a number of different sensors. Any reduction of these sensors will result in a more natural sleeping environment for the patients and hence a higher sensitivity of the diagnosis.

  13. Differences in sequencing technologies improve the retrieval of anammox bacterial genome from metagenomes

    NARCIS (Netherlands)

    Gori, F.; Tringe, S.G.; Folino, G.; Van Hijum, S.A.F.T.; Op den Camp, H.J.M.; Jetten, M.S.M.; Marchiori, E.

    2013-01-01

    Background Sequencing technologies have different biases, in single-genome sequencing and metagenomic sequencing; these can significantly affect ORFs recovery and the population distribution of a metagenome. In this paper we investigate how well different technologies represent information related

  14. Differences in sequencing technologies improve the retrieval of anammox bacterial genome from metagenomes

    NARCIS (Netherlands)

    Gori, F.; Tringe, S.G.; Folino, G.; Hijum, S.A.F.T. van; Camp, H.J. Op den; Jetten, M.S.; Marchiori, E.

    2013-01-01

    BACKGROUND: Sequencing technologies have different biases, in single-genome sequencing and metagenomic sequencing; these can significantly affect ORFs recovery and the population distribution of a metagenome. In this paper we investigate how well different technologies represent information related

  15. [Construction of large fragment metagenome library of natural mangrove soil].

    Science.gov (United States)

    Jiang, Yun-Xia; Zheng, Tian-Ling

    2007-11-01

    Applying our optimized direct extraction method, the percentage of large fragment DNA in the total extracted mangrove soil DNA was significant increased. The large fragment metagenome library derived from natural mangrove soil over four seasons was successfully constructed by the optimized DNA extraction and electro elution purification method. All of the clones had recombinant Cosmids and each differed in their fragment profiles when Cosmid DNA was extracted from 12 randomly picked colonies and digested with BamHI. The average insert size for this library was larger than 35 kbp. This culturing-independent library at least encompassed 335 Mbp valuable genetic information of mangrove soil microbes. It allowed mining of valuable intertidal microbial resource to become a reality. It is a recommended method for those researchers who have still not circumvented the large insert environmental libraries or for those beginning research in this field, so as to avoid them attempting repetitive, fussy work.

  16. Metagenomics at Grass Roots

    Indian Academy of Sciences (India)

    Home; Journals; Resonance – Journal of Science Education; Volume 22; Issue 3. Metagenomics at Grass Roots. Sudeshna ... benefit human health, agriculture, and ecosystemfunctions. This article provides a brief history of technicaladvances in metagenomics, including DNA sequencing methods,and some case studies.

  17. Functional metagenomics of extreme environments.

    Science.gov (United States)

    Mirete, Salvador; Morgante, Verónica; González-Pastor, José Eduardo

    2016-04-01

    The bioprospecting of enzymes that operate under extreme conditions is of particular interest for many biotechnological and industrial processes. Nevertheless, there is a considerable limitation to retrieve novel enzymes as only a small fraction of microorganisms derived from extreme environments can be cultured under standard laboratory conditions. Functional metagenomics has the advantage of not requiring the cultivation of microorganisms or previous sequence information to known genes, thus representing a valuable approach for mining enzymes with new features. In this review, we summarize studies showing how functional metagenomics was employed to retrieve genes encoding for proteins involved not only in molecular adaptation and resistance to extreme environmental conditions but also in other enzymatic activities of biotechnological interest. Copyright © 2016 Elsevier Ltd. All rights reserved.

  18. Sample-based XPath Ranking for Web Information Extraction

    NARCIS (Netherlands)

    Jundt, Oliver; van Keulen, Maurice

    Web information extraction typically relies on a wrapper, i.e., program code or a configuration that specifies how to extract some information from web pages at a specific website. Manually creating and maintaining wrappers is a cumbersome and error-prone task. It may even be prohibitive as some

  19. Online Semi-Supervised Learning: Algorithm and Application in Metagenomics

    NARCIS (Netherlands)

    Imangaliyev, S.; Keijser, B.J.F.; Crielaard, W.; Tsivtsivadze, E.

    2013-01-01

    As the amount of metagenomic data grows rapidly, online statistical learning algorithms are poised to play key rolein metagenome analysis tasks. Frequently, data are only partially labeled, namely dataset contains partial information about the problem of interest. This work presents an algorithm and

  20. Online semi-supervised learning: algorithm and application in metagenomics

    NARCIS (Netherlands)

    Imangaliyev, S.; Keijser, B.J.; Crielaard, W.; Tsivtsivadze, E.; Li, G.Z.; Kim, S.; Hughes, M.; McLachlan, G.; Sun, H.; Hu, X.; Ressom, H.; Liu, B.; Liebman, M.

    2013-01-01

    As the amount of metagenomic data grows rapidly, online statistical learning algorithms are poised to play key role in metagenome analysis tasks. Frequently, data are only partially labeled, namely dataset contains partial information about the problem of interest. This work presents an algorithm

  1. Information Extraction Using Distant Supervision and Semantic Similarities

    Directory of Open Access Journals (Sweden)

    PARK, Y.

    2016-02-01

    Full Text Available Information extraction is one of the main research tasks in natural language processing and text mining that extracts useful information from unstructured sentences. Information extraction techniques include named entity recognition, relation extraction, and co-reference resolution. Among them, relation extraction refers to a task that extracts semantic relations between entities such as personal and geographic names in documents. This is an important research area, which is used in knowledge base construction and question and answering systems. This study presents relation extraction using a distant supervision learning technique among semi-supervised learning methods, which have been spotlighted in recent years to reduce human manual work and costs required for supervised learning. That is, this study proposes a method that can improve relation extraction by improving a distant supervision learning technique by applying a clustering method to create a learning corpus and semantic analysis for relation extraction that is difficult to identify using existing distant supervision. Through comparison experiments of various semantic similarity comparison methods, similarity calculation methods that are useful to relation extraction using distant supervision are searched, and a large number of accurate relation triples can be extracted using the proposed structural advantages and semantic similarity comparison.

  2. The Agent of extracting Internet Information with Lead Order

    Science.gov (United States)

    Mo, Zan; Huang, Chuliang; Liu, Aijun

    In order to carry out e-commerce better, advanced technologies to access business information are in need urgently. An agent is described to deal with the problems of extracting internet information that caused by the non-standard and skimble-scamble structure of Chinese websites. The agent designed includes three modules which respond to the process of extracting information separately. A method of HTTP tree and a kind of Lead algorithm is proposed to generate a lead order, with which the required web can be retrieved easily. How to transform the extracted information structuralized with natural language is also discussed.

  3. Compressive Information Extraction: A Dynamical Systems Approach

    Science.gov (United States)

    2016-01-24

    sparsely encoded in very large data streams. (a) Target tracking in an urban canyon; (b) and (c) sample frames showing contextually abnormal events: onset...tractable convex relaxation can be obtained by using the nuclear norm as a surrogate for rank, leading to a convex semi- definite program that can be...extraction to identify contextually abnormal se- quences (see section 2.2.3). Formally, the problem of interest can be stated as establishing whether a noisy

  4. Shotgun metagenomic data streams: surfing without fear

    Energy Technology Data Exchange (ETDEWEB)

    Berendzen, Joel R [Los Alamos National Laboratory

    2010-12-06

    Timely information about bio-threat prevalence, consequence, propagation, attribution, and mitigation is needed to support decision-making, both routinely and in a crisis. One DNA sequencer can stream 25 Gbp of information per day, but sampling strategies and analysis techniques are needed to turn raw sequencing power into actionable knowledge. Shotgun metagenomics can enable biosurveillance at the level of a single city, hospital, or airplane. Metagenomics characterizes viruses and bacteria from complex environments such as soil, air filters, or sewage. Unlike targeted-primer-based sequencing, shotgun methods are not blind to sequences that are truly novel, and they can measure absolute prevalence. Shotgun metagenomic sampling can be non-invasive, efficient, and inexpensive while being informative. We have developed analysis techniques for shotgun metagenomic sequencing that rely upon phylogenetic signature patterns. They work by indexing local sequence patterns in a manner similar to web search engines. Our methods are laptop-fast and favorable scaling properties ensure they will be sustainable as sequencing methods grow. We show examples of application to soil metagenomic samples.

  5. Information extraction from multi-institutional radiology reports

    Science.gov (United States)

    Hassanpour, Saeed; Langlotz, Curtis P.

    2016-01-01

    Objectives The radiology report is the most important source of clinical imaging information. It documents critical information about the patient’s health and the radiologist’s interpretation of medical findings. It also communicates information to the referring physicians and records that information for future clinical and research use. Although efforts to structure some radiology report information through predefined templates are beginning to bear fruit, a large portion of radiology report information is entered in free text. The free text format is a major obstacle for rapid extraction and subsequent use of information by clinicians, researchers, and healthcare information systems. This difficulty is due to the ambiguity and subtlety of natural language, complexity of described images, and variations among different radiologists and healthcare organizations. As a result, radiology reports are used only once by the clinician who ordered the study and rarely are used again for research and data mining. In this work, machine learning techniques and a large multi-institutional radiology report repository are used to extract the semantics of the radiology report and overcome the barriers to the re-use of radiology report information in clinical research and other healthcare applications. Material and methods We describe a machine learning system to annotate radiology reports and extract report contents according to an information model. This information model covers the majority of clinically significant contents in radiology reports and is applicable to a wide variety of radiology study types. Our automated approach uses discriminative sequence classifiers for named-entity recognition to extract and organize clinically significant terms and phrases consistent with the information model. We evaluated our information extraction system on 150 radiology reports from three major healthcare organizations and compared its results to a commonly used non

  6. ChemEx: information extraction system for chemical data curation.

    Science.gov (United States)

    Tharatipyakul, Atima; Numnark, Somrak; Wichadakul, Duangdao; Ingsriswang, Supawadee

    2012-01-01

    Manual chemical data curation from publications is error-prone, time consuming, and hard to maintain up-to-date data sets. Automatic information extraction can be used as a tool to reduce these problems. Since chemical structures usually described in images, information extraction needs to combine structure image recognition and text mining together. We have developed ChemEx, a chemical information extraction system. ChemEx processes both text and images in publications. Text annotator is able to extract compound, organism, and assay entities from text content while structure image recognition enables translation of chemical raster images to machine readable format. A user can view annotated text along with summarized information of compounds, organism that produces those compounds, and assay tests. ChemEx facilitates and speeds up chemical data curation by extracting compounds, organisms, and assays from a large collection of publications. The software and corpus can be downloaded from http://www.biotec.or.th/isl/ChemEx.

  7. MGC: a metagenomic gene caller.

    Science.gov (United States)

    El Allali, Achraf; Rose, John R

    2013-01-01

    Computational gene finding algorithms have proven their robustness in identifying genes in complete genomes. However, metagenomic sequencing has presented new challenges due to the incomplete and fragmented nature of the data. During the last few years, attempts have been made to extract complete and incomplete open reading frames (ORFs) directly from short reads and identify the coding ORFs, bypassing other challenging tasks such as the assembly of the metagenome. In this paper we introduce a metagenomics gene caller (MGC) which is an improvement over the state-of-the-art prediction algorithm Orphelia. Orphelia uses a two-stage machine learning approach and computes a model that classifies extracted ORFs from fragmented sequences. We hypothesise and demonstrate evidence that sequences need separate models based on their local GC-content in order to avoid the noise introduced to a single model computed with sequences from the entire GC spectrum. We have also added two amino-acid features based on the benefit of amino-acid usage shown in our previous research. Our algorithm is able to predict genes and translation initiation sites (TIS) more accurately than Orphelia which uses a single model. Learning separate models for several pre-defined GC-content regions as opposed to a single model approach improves the performance of the neural network as demonstrated by the experimental results presented in this paper. The inclusion of amino-acid usage features also helps improve the overall accuracy of our algorithm. MGC's improvement sets the ground for further investigation into the use of GC-content to separate data for training models in machine learning based gene finders.

  8. Metagenomics at Grass Roots

    Indian Academy of Sciences (India)

    Metagenomics is a robust, interdisciplinary approach for studyingmicrobial community composition, function, and dynamics.It typically involves a core of molecular biology, microbiology,ecology, statistics, and computational biology. Excitingoutcomes anticipated from these studies include unravelingof complex interactions ...

  9. Cause Information Extraction from Financial Articles Concerning Business Performance

    Science.gov (United States)

    Sakai, Hiroyuki; Masuyama, Shigeru

    We propose a method of extracting cause information from Japanese financial articles concerning business performance. Our method acquires cause informtion, e. g. “_??__??__??__??__??__??__??__??__??__??_ (zidousya no uriage ga koutyou: Sales of cars were good)”. Cause information is useful for investors in selecting companies to invest. Our method extracts cause information as a form of causal expression by using statistical information and initial clue expressions automatically. Our method can extract causal expressions without predetermined patterns or complex rules given by hand, and is expected to be applied to other tasks for acquiring phrases that have a particular meaning not limited to cause information. We compared our method with our previous one originally proposed for extracting phrases concerning traffic accident causes and experimental results showed that our new method outperforms our previous one.

  10. Ocean microbial metagenomics

    Science.gov (United States)

    Kerkhof, Lee J.; Goodman, Robert M.

    2009-09-01

    Technology for accessing the genomic DNA of microorganisms, directly from environmental samples without prior cultivation, has opened new vistas to understanding microbial diversity and functions. Especially as applied to soils and the oceans, environments on Earth where microbial diversity is vast, metagenomics and its emergent approaches have the power to transform rapidly our understanding of environmental microbiology. Here we explore select recent applications of the metagenomic suite to ocean microbiology.

  11. Interactive metagenomic visualization in a Web browser.

    Science.gov (United States)

    Ondov, Brian D; Bergman, Nicholas H; Phillippy, Adam M

    2011-09-30

    A critical output of metagenomic studies is the estimation of abundances of taxonomical or functional groups. The inherent uncertainty in assignments to these groups makes it important to consider both their hierarchical contexts and their prediction confidence. The current tools for visualizing metagenomic data, however, omit or distort quantitative hierarchical relationships and lack the facility for displaying secondary variables. Here we present Krona, a new visualization tool that allows intuitive exploration of relative abundances and confidences within the complex hierarchies of metagenomic classifications. Krona combines a variant of radial, space-filling displays with parametric coloring and interactive polar-coordinate zooming. The HTML5 and JavaScript implementation enables fully interactive charts that can be explored with any modern Web browser, without the need for installed software or plug-ins. This Web-based architecture also allows each chart to be an independent document, making them easy to share via e-mail or post to a standard Web server. To illustrate Krona's utility, we describe its application to various metagenomic data sets and its compatibility with popular metagenomic analysis tools. Krona is both a powerful metagenomic visualization tool and a demonstration of the potential of HTML5 for highly accessible bioinformatic visualizations. Its rich and interactive displays facilitate more informed interpretations of metagenomic analyses, while its implementation as a browser-based application makes it extremely portable and easily adopted into existing analysis packages. Both the Krona rendering code and conversion tools are freely available under a BSD open-source license, and available from: http://krona.sourceforge.net.

  12. Interactive metagenomic visualization in a Web browser

    Directory of Open Access Journals (Sweden)

    Phillippy Adam M

    2011-09-01

    Full Text Available Abstract Background A critical output of metagenomic studies is the estimation of abundances of taxonomical or functional groups. The inherent uncertainty in assignments to these groups makes it important to consider both their hierarchical contexts and their prediction confidence. The current tools for visualizing metagenomic data, however, omit or distort quantitative hierarchical relationships and lack the facility for displaying secondary variables. Results Here we present Krona, a new visualization tool that allows intuitive exploration of relative abundances and confidences within the complex hierarchies of metagenomic classifications. Krona combines a variant of radial, space-filling displays with parametric coloring and interactive polar-coordinate zooming. The HTML5 and JavaScript implementation enables fully interactive charts that can be explored with any modern Web browser, without the need for installed software or plug-ins. This Web-based architecture also allows each chart to be an independent document, making them easy to share via e-mail or post to a standard Web server. To illustrate Krona's utility, we describe its application to various metagenomic data sets and its compatibility with popular metagenomic analysis tools. Conclusions Krona is both a powerful metagenomic visualization tool and a demonstration of the potential of HTML5 for highly accessible bioinformatic visualizations. Its rich and interactive displays facilitate more informed interpretations of metagenomic analyses, while its implementation as a browser-based application makes it extremely portable and easily adopted into existing analysis packages. Both the Krona rendering code and conversion tools are freely available under a BSD open-source license, and available from: http://krona.sourceforge.net.

  13. Mining knowledge from text repositories using information extraction ...

    Indian Academy of Sciences (India)

    language documents. Thus, IE systems can extract structured information from unstructured text. One type of IE is named entity extraction and then creation of filled templates (Konchady. 2009). The named entity extractor identifies references to particular kinds of objects such as names of people, companies, and locations.

  14. Techniques for information extraction from compressed GPS traces : final report.

    Science.gov (United States)

    2015-12-31

    Developing techniques for extracting information requires a good understanding of methods used to compress the traces. Many techniques for compressing trace data : consisting of position (i.e., latitude/longitude) and time values have been developed....

  15. Mars Target Encyclopedia: Information Extraction for Planetary Science

    Science.gov (United States)

    Wagstaff, K. L.; Francis, R.; Gowda, T.; Lu, Y.; Riloff, E.; Singh, K.

    2017-06-01

    Mars surface targets / and published compositions / Seek and ye will find. We used text mining methods to extract information from LPSC abstracts about the composition of Mars surface targets. Users can search by element, mineral, or target.

  16. Can we replace curation with information extraction software?

    Science.gov (United States)

    Karp, Peter D

    2016-01-01

    Can we use programs for automated or semi-automated information extraction from scientific texts as practical alternatives to professional curation? I show that error rates of current information extraction programs are too high to replace professional curation today. Furthermore, current IEP programs extract single narrow slivers of information, such as individual protein interactions; they cannot extract the large breadth of information extracted by professional curators for databases such as EcoCyc. They also cannot arbitrate among conflicting statements in the literature as curators can. Therefore, funding agencies should not hobble the curation efforts of existing databases on the assumption that a problem that has stymied Artificial Intelligence researchers for more than 60 years will be solved tomorrow. Semi-automated extraction techniques appear to have significantly more potential based on a review of recent tools that enhance curator productivity. But a full cost-benefit analysis for these tools is lacking. Without such analysis it is possible to expend significant effort developing information-extraction tools that automate small parts of the overall curation workflow without achieving a significant decrease in curation costs.Database URL. © The Author(s) 2016. Published by Oxford University Press.

  17. Moving Target Information Extraction Based on Single Satellite Image

    Directory of Open Access Journals (Sweden)

    ZHAO Shihu

    2015-03-01

    Full Text Available The spatial and time variant effects in high resolution satellite push broom imaging are analyzed. A spatial and time variant imaging model is established. A moving target information extraction method is proposed based on a single satellite remote sensing image. The experiment computes two airplanes' flying speed using ZY-3 multispectral image and proves the validity of spatial and time variant model and moving information extracting method.

  18. Addressing Information Proliferation: Applications of Information Extraction and Text Mining

    Science.gov (United States)

    Li, Jingjing

    2013-01-01

    The advent of the Internet and the ever-increasing capacity of storage media have made it easy to store, deliver, and share enormous volumes of data, leading to a proliferation of information on the Web, in online libraries, on news wires, and almost everywhere in our daily lives. Since our ability to process and absorb this information remains…

  19. Information extraction from multi-institutional radiology reports.

    Science.gov (United States)

    Hassanpour, Saeed; Langlotz, Curtis P

    2016-01-01

    The radiology report is the most important source of clinical imaging information. It documents critical information about the patient's health and the radiologist's interpretation of medical findings. It also communicates information to the referring physicians and records that information for future clinical and research use. Although efforts to structure some radiology report information through predefined templates are beginning to bear fruit, a large portion of radiology report information is entered in free text. The free text format is a major obstacle for rapid extraction and subsequent use of information by clinicians, researchers, and healthcare information systems. This difficulty is due to the ambiguity and subtlety of natural language, complexity of described images, and variations among different radiologists and healthcare organizations. As a result, radiology reports are used only once by the clinician who ordered the study and rarely are used again for research and data mining. In this work, machine learning techniques and a large multi-institutional radiology report repository are used to extract the semantics of the radiology report and overcome the barriers to the re-use of radiology report information in clinical research and other healthcare applications. We describe a machine learning system to annotate radiology reports and extract report contents according to an information model. This information model covers the majority of clinically significant contents in radiology reports and is applicable to a wide variety of radiology study types. Our automated approach uses discriminative sequence classifiers for named-entity recognition to extract and organize clinically significant terms and phrases consistent with the information model. We evaluated our information extraction system on 150 radiology reports from three major healthcare organizations and compared its results to a commonly used non-machine learning information extraction method. We

  20. Fine-grained information extraction from German transthoracic echocardiography reports.

    Science.gov (United States)

    Toepfer, Martin; Corovic, Hamo; Fette, Georg; Klügl, Peter; Störk, Stefan; Puppe, Frank

    2015-11-12

    Information extraction techniques that get structured representations out of unstructured data make a large amount of clinically relevant information about patients accessible for semantic applications. These methods typically rely on standardized terminologies that guide this process. Many languages and clinical domains, however, lack appropriate resources and tools, as well as evaluations of their applications, especially if detailed conceptualizations of the domain are required. For instance, German transthoracic echocardiography reports have not been targeted sufficiently before, despite of their importance for clinical trials. This work therefore aimed at development and evaluation of an information extraction component with a fine-grained terminology that enables to recognize almost all relevant information stated in German transthoracic echocardiography reports at the University Hospital of Würzburg. A domain expert validated and iteratively refined an automatically inferred base terminology. The terminology was used by an ontology-driven information extraction system that outputs attribute value pairs. The final component has been mapped to the central elements of a standardized terminology, and it has been evaluated according to documents with different layouts. The final system achieved state-of-the-art precision (micro average.996) and recall (micro average.961) on 100 test documents that represent more than 90 % of all reports. In particular, principal aspects as defined in a standardized external terminology were recognized with f 1=.989 (micro average) and f 1=.963 (macro average). As a result of keyword matching and restraint concept extraction, the system obtained high precision also on unstructured or exceptionally short documents, and documents with uncommon layout. The developed terminology and the proposed information extraction system allow to extract fine-grained information from German semi-structured transthoracic echocardiography reports

  1. Automatic information extraction from unstructured mammography reports using distributed semantics.

    Science.gov (United States)

    Gupta, Anupama; Banerjee, Imon; Rubin, Daniel L

    2018-02-01

    To date, the methods developed for automated extraction of information from radiology reports are mainly rule-based or dictionary-based, and, therefore, require substantial manual effort to build these systems. Recent efforts to develop automated systems for entity detection have been undertaken, but little work has been done to automatically extract relations and their associated named entities in narrative radiology reports that have comparable accuracy to rule-based methods. Our goal is to extract relations in a unsupervised way from radiology reports without specifying prior domain knowledge. We propose a hybrid approach for information extraction that combines dependency-based parse tree with distributed semantics for generating structured information frames about particular findings/abnormalities from the free-text mammography reports. The proposed IE system obtains a F 1 -score of 0.94 in terms of completeness of the content in the information frames, which outperforms a state-of-the-art rule-based system in this domain by a significant margin. The proposed system can be leveraged in a variety of applications, such as decision support and information retrieval, and may also easily scale to other radiology domains, since there is no need to tune the system with hand-crafted information extraction rules. Copyright © 2018 Elsevier Inc. All rights reserved.

  2. The study of the extraction of 3-D informations

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Min Ki [Korea Univ., Seoul (Korea); Kim, Jin Hun; Kim, Hui Yung; Lee, Gi Sik; Lee, Yung Shin [Sokyung Univ., Seoul (Korea)

    1998-04-01

    To extract three dimensional information in 3 dimensional real world two methods are applied (stereo image method, virtual reality environment method). 1. Stereo image method. From the paris of stereo image matching methods are applied to find the corresponding points in the two images. To solve the problem various methods are applied 2. Virtual reality environment method. As an alternate method to extract 3-D information, virtual reality environment is use. It is very useful to fine 6 DOF for a some given target points in 3-D space. We considered the accuracies and reliability of the 3-D informations. 34 figs., 4 tabs. (Author)

  3. Metagenomic Sequencing of an In Vitro-Simulated Microbial Community

    Energy Technology Data Exchange (ETDEWEB)

    Morgan, Jenna L.; Darling, Aaron E.; Eisen, Jonathan A.

    2009-12-01

    Background: Microbial life dominates the earth, but many species are difficult or even impossible to study under laboratory conditions. Sequencing DNA directly from the environment, a technique commonly referred to as metagenomics, is an important tool for cataloging microbial life. This culture-independent approach involves collecting samples that include microbes in them, extracting DNA from the samples, and sequencing the DNA. A sample may contain many different microorganisms, macroorganisms, and even free-floating environmental DNA. A fundamental challenge in metagenomics has been estimating the abundance of organisms in a sample based on the frequency with which the organism's DNA was observed in reads generated via DNA sequencing. Methodology/Principal Findings: We created mixtures of ten microbial species for which genome sequences are known. Each mixture contained an equal number of cells of each species. We then extracted DNA from the mixtures, sequenced the DNA, and measured the frequency with which genomic regions from each organism was observed in the sequenced DNA. We found that the observed frequency of reads mapping to each organism did not reflect the equal numbers of cells that were known to be included in each mixture. The relative organism abundances varied significantly depending on the DNA extraction and sequencing protocol utilized. Conclusions/Significance: We describe a new data resource for measuring the accuracy of metagenomic binning methods, created by in vitro-simulation of a metagenomic community. Our in vitro simulation can be used to complement previous in silico benchmark studies. In constructing a synthetic community and sequencing its metagenome, we encountered several sources of observation bias that likely affect most metagenomic experiments to date and present challenges for comparative metagenomic studies. DNA preparation methods have a particularly profound effect in our study, implying that samples prepared with

  4. Biotechnological applications of functional metagenomics in the food and pharmaceutical industries.

    Science.gov (United States)

    Coughlan, Laura M; Cotter, Paul D; Hill, Colin; Alvarez-Ordóñez, Avelino

    2015-01-01

    Microorganisms are found throughout nature, thriving in a vast range of environmental conditions. The majority of them are unculturable or difficult to culture by traditional methods. Metagenomics enables the study of all microorganisms, regardless of whether they can be cultured or not, through the analysis of genomic data obtained directly from an environmental sample, providing knowledge of the species present, and allowing the extraction of information regarding the functionality of microbial communities in their natural habitat. Function-based screenings, following the cloning and expression of metagenomic DNA in a heterologous host, can be applied to the discovery of novel proteins of industrial interest encoded by the genes of previously inaccessible microorganisms. Functional metagenomics has considerable potential in the food and pharmaceutical industries, where it can, for instance, aid (i) the identification of enzymes with desirable technological properties, capable of catalyzing novel reactions or replacing existing chemically synthesized catalysts which may be difficult or expensive to produce, and able to work under a wide range of environmental conditions encountered in food and pharmaceutical processing cycles including extreme conditions of temperature, pH, osmolarity, etc; (ii) the discovery of novel bioactives including antimicrobials active against microorganisms of concern both in food and medical settings; (iii) the investigation of industrial and societal issues such as antibiotic resistance development. This review article summarizes the state-of-the-art functional metagenomic methods available and discusses the potential of functional metagenomic approaches to mine as yet unexplored environments to discover novel genes with biotechnological application in the food and pharmaceutical industries.

  5. Biotechnological applications of functional metagenomics in the food and pharmaceutical industries

    Directory of Open Access Journals (Sweden)

    Laura M Coughlan

    2015-06-01

    Full Text Available Microorganisms are found throughout nature, thriving in a vast range of environmental conditions. The majority of them are unculturable or difficult to culture by traditional methods. Metagenomics enables the study of all microorganisms, regardless of whether they can be cultured or not, through the analysis of genomic data obtained directly from an environmental sample, providing knowledge of the species present and allowing the extraction of information regarding the functionality of microbial communities in their natural habitat. Function-based screenings, following the cloning and expression of metagenomic DNA in a heterologous host, can be applied to the discovery of novel proteins of industrial interest encoded by the genes of previously inaccessible microorganisms. Functional metagenomics has considerable potential in the food and pharmaceutical industries, where it can, for instance, aid (i the identification of enzymes with desirable technological properties, capable of catalysing novel reactions or replacing existing chemically synthesized catalysts which may be difficult or expensive to produce, and able to work under a wide range of environmental conditions encountered in food and pharmaceutical processing cycles including extreme conditions of temperature, pH, osmolarity, etc; (ii the discovery of novel bioactives including antimicrobials active against microorganisms of concern both in food and medical settings; (iii the investigation of industrial and societal issues such as antibiotic resistance development. This review article summarizes the state-of-the-art functional metagenomic methods available and discusses the potential of functional metagenomic approaches to mine as yet unexplored environments to discover novel genes with biotechnological application in the food and pharmaceutical industries.

  6. Metagenomics of extreme environments.

    Science.gov (United States)

    Cowan, D A; Ramond, J-B; Makhalanyane, T P; De Maayer, P

    2015-06-01

    Whether they are exposed to extremes of heat or cold, or buried deep beneath the Earth's surface, microorganisms have an uncanny ability to survive under these conditions. This ability to survive has fascinated scientists for nearly a century, but the recent development of metagenomics and 'omics' tools has allowed us to make huge leaps in understanding the remarkable complexity and versatility of extremophile communities. Here, in the context of the recently developed metagenomic tools, we discuss recent research on the community composition, adaptive strategies and biological functions of extremophiles. Copyright © 2015 Elsevier Ltd. All rights reserved.

  7. Knowledge Dictionary for Information Extraction on the Arabic Text Data

    Directory of Open Access Journals (Sweden)

    Wahyu Jauharis Saputra

    2013-04-01

    Full Text Available Information extraction is an early stage of a process of textual data analysis. Information extraction is required to get information from textual data that can be used for process analysis, such as classification and categorization. A textual data is strongly influenced by the language. Arabic is gaining a significant attention in many studies because Arabic language is very different from others, and in contrast to other languages, tools and research on the Arabic language is still lacking. The information extracted using the knowledge dictionary is a concept of expression. A knowledge dictionary is usually constructed manually by an expert and this would take a long time and is specific to a problem only. This paper proposed a method for automatically building a knowledge dictionary. Dictionary knowledge is formed by classifying sentences having the same concept, assuming that they will have a high similarity value. The concept that has been extracted can be used as features for subsequent computational process such as classification or categorization. Dataset used in this paper was the Arabic text dataset. Extraction result was tested by using a decision tree classification engine and the highest precision value obtained was 71.0% while the highest recall value was 75.0%. 

  8. Medication information extraction with linguistic pattern matching and semantic rules.

    Science.gov (United States)

    Spasic, Irena; Sarafraz, Farzaneh; Keane, John A; Nenadic, Goran

    2010-01-01

    This study presents a system developed for the 2009 i2b2 Challenge in Natural Language Processing for Clinical Data, whose aim was to automatically extract certain information about medications used by a patient from his/her medical report. The aim was to extract the following information for each medication: name, dosage, mode/route, frequency, duration and reason. The system implements a rule-based methodology, which exploits typical morphological, lexical, syntactic and semantic features of the targeted information. These features were acquired from the training dataset and public resources such as the UMLS and relevant web pages. Information extracted by pattern matching was combined together using context-sensitive heuristic rules. The system was applied to a set of 547 previously unseen discharge summaries, and the extracted information was evaluated against a manually prepared gold standard consisting of 251 documents. The overall ranking of the participating teams was obtained using the micro-averaged F-measure as the primary evaluation metric. The implemented method achieved the micro-averaged F-measure of 81% (with 86% precision and 77% recall), which ranked this system third in the challenge. The significance tests revealed the system's performance to be not significantly different from that of the second ranked system. Relative to other systems, this system achieved the best F-measure for the extraction of duration (53%) and reason (46%). Based on the F-measure, the performance achieved (81%) was in line with the initial agreement between human annotators (82%), indicating that such a system may greatly facilitate the process of extracting relevant information from medical records by providing a solid basis for a manual review process.

  9. Ontology-Based Information Extraction for Business Intelligence

    Science.gov (United States)

    Saggion, Horacio; Funk, Adam; Maynard, Diana; Bontcheva, Kalina

    Business Intelligence (BI) requires the acquisition and aggregation of key pieces of knowledge from multiple sources in order to provide valuable information to customers or feed statistical BI models and tools. The massive amount of information available to business analysts makes information extraction and other natural language processing tools key enablers for the acquisition and use of that semantic information. We describe the application of ontology-based extraction and merging in the context of a practical e-business application for the EU MUSING Project where the goal is to gather international company intelligence and country/region information. The results of our experiments so far are very promising and we are now in the process of building a complete end-to-end solution.

  10. Surveillance of Foodborne Pathogens: Towards Diagnostic Metagenomics of Fecal Samples

    Directory of Open Access Journals (Sweden)

    Sandra Christine Andersen

    2018-01-01

    Full Text Available Diagnostic metagenomics is a rapidly evolving laboratory tool for culture-independent tracing of foodborne pathogens. The method has the potential to become a generic platform for detection of most pathogens and many sample types. Today, however, it is still at an early and experimental stage. Studies show that metagenomic methods, from sample storage and DNA extraction to library preparation and shotgun sequencing, have a great influence on data output. To construct protocols that extract the complete metagenome but with minimal bias is an ongoing challenge. Many different software strategies for data analysis are being developed, and several studies applying diagnostic metagenomics to human clinical samples have been published, detecting, and sometimes, typing bacterial infections. It is possible to obtain a draft genome of the pathogen and to develop methods that can theoretically be applied in real-time. Finally, diagnostic metagenomics can theoretically be better geared than conventional methods to detect co-infections. The present review focuses on the current state of test development, as well as practical implementation of diagnostic metagenomics to trace foodborne bacterial infections in fecal samples from animals and humans.

  11. Bracken: estimating species abundance in metagenomics data

    Directory of Open Access Journals (Sweden)

    Jennifer Lu

    2017-01-01

    Full Text Available Metagenomic experiments attempt to characterize microbial communities using high-throughput DNA sequencing. Identification of the microorganisms in a sample provides information about the genetic profile, population structure, and role of microorganisms within an environment. Until recently, most metagenomics studies focused on high-level characterization at the level of phyla, or alternatively sequenced the 16S ribosomal RNA gene that is present in bacterial species. As the cost of sequencing has fallen, though, metagenomics experiments have increasingly used unbiased shotgun sequencing to capture all the organisms in a sample. This approach requires a method for estimating abundance directly from the raw read data. Here we describe a fast, accurate new method that computes the abundance at the species level using the reads collected in a metagenomics experiment. Bracken (Bayesian Reestimation of Abundance after Classification with KrakEN uses the taxonomic assignments made by Kraken, a very fast read-level classifier, along with information about the genomes themselves to estimate abundance at the species level, the genus level, or above. We demonstrate that Bracken can produce accurate species- and genus-level abundance estimates even when a sample contains multiple near-identical species.

  12. Metagenomics at Grass Roots

    Indian Academy of Sciences (India)

    Metagenomics is a robust, interdisciplinary approach for study- ing microbial community composition, function, and dynam- ics. It typically involves a core of molecular biology, micro- biology, ecology, statistics, and computational biology. Excit- ing outcomes anticipated from these studies include unrav- eling of complex ...

  13. PDF text classification to leverage information extraction from publication reports.

    Science.gov (United States)

    Bui, Duy Duc An; Del Fiol, Guilherme; Jonnalagadda, Siddhartha

    2016-06-01

    Data extraction from original study reports is a time-consuming, error-prone process in systematic review development. Information extraction (IE) systems have the potential to assist humans in the extraction task, however majority of IE systems were not designed to work on Portable Document Format (PDF) document, an important and common extraction source for systematic review. In a PDF document, narrative content is often mixed with publication metadata or semi-structured text, which add challenges to the underlining natural language processing algorithm. Our goal is to categorize PDF texts for strategic use by IE systems. We used an open-source tool to extract raw texts from a PDF document and developed a text classification algorithm that follows a multi-pass sieve framework to automatically classify PDF text snippets (for brevity, texts) into TITLE, ABSTRACT, BODYTEXT, SEMISTRUCTURE, and METADATA categories. To validate the algorithm, we developed a gold standard of PDF reports that were included in the development of previous systematic reviews by the Cochrane Collaboration. In a two-step procedure, we evaluated (1) classification performance, and compared it with machine learning classifier, and (2) the effects of the algorithm on an IE system that extracts clinical outcome mentions. The multi-pass sieve algorithm achieved an accuracy of 92.6%, which was 9.7% (pmachine learning classifier that used a logistic regression algorithm. F-measure improvements were observed in the classification of TITLE (+15.6%), ABSTRACT (+54.2%), BODYTEXT (+3.7%), SEMISTRUCTURE (+34%), and MEDADATA (+14.2%). In addition, use of the algorithm to filter semi-structured texts and publication metadata improved performance of the outcome extraction system (F-measure +4.1%, p=0.002). It also reduced of number of sentences to be processed by 44.9% (pclassification is an important prerequisite step to leverage information extraction from PDF documents. Copyright © 2016 Elsevier Inc. All

  14. The metagenomic telescope.

    Directory of Open Access Journals (Sweden)

    Balázs Szalkai

    Full Text Available Next generation sequencing technologies led to the discovery of numerous new microbe species in diverse environmental samples. Some of the new species contain genes never encountered before. Some of these genes encode proteins with novel functions, and some of these genes encode proteins that perform some well-known function in a novel way. A tool, named the Metagenomic Telescope, is described here that applies artificial intelligence methods, and seems to be capable of identifying new protein functions even in the well-studied model organisms. As a proof-of-principle demonstration of the Metagenomic Telescope, we considered DNA repair enzymes in the present work. First we identified proteins in DNA repair in well-known organisms (i.e., proteins in base excision repair, nucleotide excision repair, mismatch repair and DNA break repair; next we applied multiple alignments and then built hidden Markov profiles for each protein separately, across well-researched organisms; next, using public depositories of metagenomes, originating from extreme environments, we identified DNA repair genes in the samples. While the phylogenetic classification of the metagenomic samples are not typically available, we hypothesized that some very special DNA repair strategies need to be applied in bacteria and Archaea living in those extreme circumstances. It is a difficult task to evaluate the results obtained from mostly unknown species; therefore we applied again the hidden Markov profiling: for the identified DNA repair genes in the extreme metagenomes, we prepared new hidden Markov profiles (for each genes separately, subsequent to a cluster analysis; and we searched for similarities to those profiles in model organisms. We have found well known DNA repair proteins, numerous proteins with unknown functions, and also proteins with known, but different functions in the model organisms.

  15. Automated extraction of radiation dose information for CT examinations.

    Science.gov (United States)

    Cook, Tessa S; Zimmerman, Stefan; Maidment, Andrew D A; Kim, Woojin; Boonn, William W

    2010-11-01

    Exposure to radiation as a result of medical imaging is currently in the spotlight, receiving attention from Congress as well as the lay press. Although scanner manufacturers are moving toward including effective dose information in the Digital Imaging and Communications in Medicine headers of imaging studies, there is a vast repository of retrospective CT data at every imaging center that stores dose information in an image-based dose sheet. As such, it is difficult for imaging centers to participate in the ACR's Dose Index Registry. The authors have designed an automated extraction system to query their PACS archive and parse CT examinations to extract the dose information stored in each dose sheet. First, an open-source optical character recognition program processes each dose sheet and converts the information to American Standard Code for Information Interchange (ASCII) text. Each text file is parsed, and radiation dose information is extracted and stored in a database which can be queried using an existing pathology and radiology enterprise search tool. Using this automated extraction pipeline, it is possible to perform dose analysis on the >800,000 CT examinations in the PACS archive and generate dose reports for all of these patients. It is also possible to more effectively educate technologists, radiologists, and referring physicians about exposure to radiation from CT by generating report cards for interpreted and performed studies. The automated extraction pipeline enables compliance with the ACR's reporting guidelines and greater awareness of radiation dose to patients, thus resulting in improved patient care and management. Copyright © 2010 American College of Radiology. Published by Elsevier Inc. All rights reserved.

  16. Spatiotemporal Information Extraction from a Historic Expedition Gazetteer

    Directory of Open Access Journals (Sweden)

    Mafkereseb Kassahun Bekele

    2016-11-01

    Full Text Available Historic expeditions are events that are flavored by exploratory, scientific, military or geographic characteristics. Such events are often documented in literature, journey notes or personal diaries. A typical historic expedition involves multiple site visits and their descriptions contain spatiotemporal and attributive contexts. Expeditions involve movements in space that can be represented by triplet features (location, time and description. However, such features are implicit and innate parts of textual documents. Extracting the geospatial information from these documents requires understanding the contextualized entities in the text. To this end, we developed a semi-automated framework that has multiple Information Retrieval and Natural Language Processing components to extract the spatiotemporal information from a two-volume historic expedition gazetteer. Our framework has three basic components, namely, the Text Preprocessor, the Gazetteer Processing Machine and the JAPE (Java Annotation Pattern Engine Transducer. We used the Brazilian Ornithological Gazetteer as an experimental dataset and extracted the spatial and temporal entities from entries that refer to three expeditioners’ site visits (which took place between 1910 and 1926 and mapped the trajectory of each expedition using the extracted information. Finally, one of the mapped trajectories was manually compared with a historical reference map of that expedition to assess the reliability of our framework.

  17. Baseline information on using fermented crude extracts from ...

    African Journals Online (AJOL)

    Bio-pesticides, when used as a post-planting pesticide, are limited by their potential ability to suppress the pest and their degree of phytotoxicity. Baseline information on the suitability of fermented crude extracts (FCE) of Cucumis africanus fruit as a post-planting bio-nematicide was determined on Meloidogyne incognita ...

  18. Quantitative metagenomic analyses based on average genome size normalization

    DEFF Research Database (Denmark)

    Frank, Jeremy Alexander; Sørensen, Søren Johannes

    2011-01-01

    Over the past quarter-century, microbiologists have used DNA sequence information to aid in the characterization of microbial communities. During the last decade, this has expanded from single genes to microbial community genomics, or metagenomics, in which the gene content of an environment can...... provide not just a census of the community members but direct information on metabolic capabilities and potential interactions among community members. Here we introduce a method for the quantitative characterization and comparison of microbial communities based on the normalization of metagenomic data...... by estimating average genome sizes. This normalization can relieve comparative biases introduced by differences in community structure, number of sequencing reads, and sequencing read lengths between different metagenomes. We demonstrate the utility of this approach by comparing metagenomes from two different...

  19. Soil metagenomics and tropical soil productivity

    OpenAIRE

    Garrett, Karen A.

    2009-01-01

    This presentation summarizes research in the soil metagenomics cross cutting research activity. Soil metagenomics studies soil microbial communities as contributors to soil health.C CCRA-4 (Soil Metagenomics)

  20. Rapid automatic keyword extraction for information retrieval and analysis

    Science.gov (United States)

    Rose, Stuart J [Richland, WA; Cowley,; E, Wendy [Richland, WA; Crow, Vernon L [Richland, WA; Cramer, Nicholas O [Richland, WA

    2012-03-06

    Methods and systems for rapid automatic keyword extraction for information retrieval and analysis. Embodiments can include parsing words in an individual document by delimiters, stop words, or both in order to identify candidate keywords. Word scores for each word within the candidate keywords are then calculated based on a function of co-occurrence degree, co-occurrence frequency, or both. Based on a function of the word scores for words within the candidate keyword, a keyword score is calculated for each of the candidate keywords. A portion of the candidate keywords are then extracted as keywords based, at least in part, on the candidate keywords having the highest keyword scores.

  1. Extracting Semantic Information from Visual Data: A Survey

    Directory of Open Access Journals (Sweden)

    Qiang Liu

    2016-03-01

    Full Text Available The traditional environment maps built by mobile robots include both metric ones and topological ones. These maps are navigation-oriented and not adequate for service robots to interact with or serve human users who normally rely on the conceptual knowledge or semantic contents of the environment. Therefore, the construction of semantic maps becomes necessary for building an effective human-robot interface for service robots. This paper reviews recent research and development in the field of visual-based semantic mapping. The main focus is placed on how to extract semantic information from visual data in terms of feature extraction, object/place recognition and semantic representation methods.

  2. Source-specific Informative Prior for i-Vector Extraction

    DEFF Research Database (Denmark)

    Shepstone, Sven Ewan; Lee, Kong Aik; Li, Haizhou

    2015-01-01

    An i-vector is a low-dimensional fixed-length representation of a variable-length speech utterance, and is defined as the posterior mean of a latent variable conditioned on the observed feature sequence of an utterance. The assumption is that the prior for the latent variable is non......-informative, since for homogeneous datasets there is no gain in generality in using an informative prior. This work shows that extracting i-vectors for a heterogeneous dataset, containing speech samples recorded from multiple sources, using informative priors instead is applicable, and leads to favorable results...

  3. Evaluation of ddRADseq for reduced representation metagenome sequencing

    Directory of Open Access Journals (Sweden)

    Michael Y. Liu

    2017-09-01

    Full Text Available Background Profiling of microbial communities via metagenomic shotgun sequencing has enabled researches to gain unprecedented insight into microbial community structure and the functional roles of community members. This study describes a method and basic analysis for a metagenomic adaptation of the double digest restriction site associated DNA sequencing (ddRADseq protocol for reduced representation metagenome profiling. Methods This technique takes advantage of the sequence specificity of restriction endonucleases to construct an Illumina-compatible sequencing library containing DNA fragments that are between a pair of restriction sites located within close proximity. This results in a reduced sequencing library with coverage breadth that can be tuned by size selection. We assessed the performance of the metagenomic ddRADseq approach by applying the full method to human stool samples and generating sequence data. Results The ddRADseq data yields a similar estimate of community taxonomic profile as obtained from shotgun metagenome sequencing of the same human stool samples. No obvious bias with respect to genomic G + C content and the estimated relative species abundance was detected. Discussion Although ddRADseq does introduce some bias in taxonomic representation, the bias is likely to be small relative to DNA extraction bias. ddRADseq appears feasible and could have value as a tool for metagenome-wide association studies.

  4. Multiple comparative metagenomics using multiset k-mer counting

    Directory of Open Access Journals (Sweden)

    Gaëtan Benoit

    2016-11-01

    Full Text Available Background Large scale metagenomic projects aim to extract biodiversity knowledge between different environmental conditions. Current methods for comparing microbial communities face important limitations. Those based on taxonomical or functional assignation rely on a small subset of the sequences that can be associated to known organisms. On the other hand, de novo methods, that compare the whole sets of sequences, either do not scale up on ambitious metagenomic projects or do not provide precise and exhaustive results. Methods These limitations motivated the development of a new de novo metagenomic comparative method, called Simka. This method computes a large collection of standard ecological distances by replacing species counts by k-mer counts. Simka scales-up today’s metagenomic projects thanks to a new parallel k-mer counting strategy on multiple datasets. Results Experiments on public Human Microbiome Project datasets demonstrate that Simka captures the essential underlying biological structure. Simka was able to compute in a few hours both qualitative and quantitative ecological distances on hundreds of metagenomic samples (690 samples, 32 billions of reads. We also demonstrate that analyzing metagenomes at the k-mer level is highly correlated with extremely precise de novo comparison techniques which rely on all-versus-all sequences alignment strategy or which are based on taxonomic profiling.

  5. Evaluation of ddRADseq for reduced representation metagenome sequencing.

    Science.gov (United States)

    Liu, Michael Y; Worden, Paul; Monahan, Leigh G; DeMaere, Matthew Z; Burke, Catherine M; Djordjevic, Steven P; Charles, Ian G; Darling, Aaron E

    2017-01-01

    Profiling of microbial communities via metagenomic shotgun sequencing has enabled researches to gain unprecedented insight into microbial community structure and the functional roles of community members. This study describes a method and basic analysis for a metagenomic adaptation of the double digest restriction site associated DNA sequencing (ddRADseq) protocol for reduced representation metagenome profiling. This technique takes advantage of the sequence specificity of restriction endonucleases to construct an Illumina-compatible sequencing library containing DNA fragments that are between a pair of restriction sites located within close proximity. This results in a reduced sequencing library with coverage breadth that can be tuned by size selection. We assessed the performance of the metagenomic ddRADseq approach by applying the full method to human stool samples and generating sequence data. The ddRADseq data yields a similar estimate of community taxonomic profile as obtained from shotgun metagenome sequencing of the same human stool samples. No obvious bias with respect to genomic G + C content and the estimated relative species abundance was detected. Although ddRADseq does introduce some bias in taxonomic representation, the bias is likely to be small relative to DNA extraction bias. ddRADseq appears feasible and could have value as a tool for metagenome-wide association studies.

  6. Advanced applications of natural language processing for performing information extraction

    CERN Document Server

    Rodrigues, Mário

    2015-01-01

    This book explains how can be created information extraction (IE) applications that are able to tap the vast amount of relevant information available in natural language sources: Internet pages, official documents such as laws and regulations, books and newspapers, and social web. Readers are introduced to the problem of IE and its current challenges and limitations, supported with examples. The book discusses the need to fill the gap between documents, data, and people, and provides a broad overview of the technology supporting IE. The authors present a generic architecture for developing systems that are able to learn how to extract relevant information from natural language documents, and illustrate how to implement working systems using state-of-the-art and freely available software tools. The book also discusses concrete applications illustrating IE uses.   ·         Provides an overview of state-of-the-art technology in information extraction (IE), discussing achievements and limitations for t...

  7. The YNP metagenome project

    DEFF Research Database (Denmark)

    Inskeep, William P.; Jay, Zackary J.; Tringe, Susannah G.

    2013-01-01

    The Yellowstone geothermal complex contains over 10,000 diverse geothermal features that host numerous phylogenetically deeply rooted and poorly understood archaea, bacteria, and viruses. Microbial communities in high-temperature environments are generally less diverse than soil, marine, sediment......, and environmental variables. Twenty geochemically distinct geothermal ecosystems representing a broad spectrum of Yellowstone hot-spring environments were used for metagenomic and geochemical analysis and included approximately equal numbers of: (1) phototrophic mats, (2) “filamentous streamer” communities, and (3...

  8. Automatically extracting information needs from complex clinical questions.

    Science.gov (United States)

    Cao, Yong-gang; Cimino, James J; Ely, John; Yu, Hong

    2010-12-01

    Clinicians pose complex clinical questions when seeing patients, and identifying the answers to those questions in a timely manner helps improve the quality of patient care. We report here on two natural language processing models, namely, automatic topic assignment and keyword identification, that together automatically and effectively extract information needs from ad hoc clinical questions. Our study is motivated in the context of developing the larger clinical question answering system AskHERMES (Help clinicians to Extract and aRrticulate Multimedia information for answering clinical quEstionS). We developed supervised machine-learning systems to automatically assign predefined general categories (e.g. etiology, procedure, and diagnosis) to a question. We also explored both supervised and unsupervised systems to automatically identify keywords that capture the main content of the question. We evaluated our systems on 4654 annotated clinical questions that were collected in practice. We achieved an F1 score of 76.0% for the task of general topic classification and 58.0% for keyword extraction. Our systems have been implemented into the larger question answering system AskHERMES. Our error analyses suggested that inconsistent annotation in our training data have hurt both question analysis tasks. Our systems, available at http://www.askhermes.org, can automatically extract information needs from both short (the number of word tokens 20), and from both well-structured and ill-formed questions. We speculate that the performance of general topic classification and keyword extraction can be further improved if consistently annotated data are made available. Copyright © 2010 Elsevier Inc. All rights reserved.

  9. Reference Information Extraction and Processing Using Random Conditional Fields

    Directory of Open Access Journals (Sweden)

    Tudor Groza

    2012-06-01

    Full Text Available Fostering both the creation and the linking of data with the scope of supporting the growth of the Linked Data Web requires us to improve the acquisition and extraction mechanisms of the underlying semantic metadata. This is particularly important for the scientific publishing domain, where currently most of the datasets are being created in an author-driven, manual manner. In addition, such datasets capture only fragments of the complete metadata, omitting usually, important elements such as the references, although they represent valuable information. In this paper we present an approach that aims at dealing with this aspect of extraction and processing of reference information. The experimental evaluation shows that, currently, our solution handles very well diverse types of reference format, thus making it usable for, or adaptable to, any area of scientific publishing.

  10. Real-Time Information Extraction from Big Data

    Science.gov (United States)

    2015-10-01

    from big data is enormously complex and extremely challenging. We argue that data movement is of crucial importance in big data analytics , and show...important communications hardware and networks underlying efficient big data analytics , which is often performed by Graph Processors capable of consuming...the connecting edges of a graph. Therefore, a holistically optimized Graph Processor (HOGP)3 for real-time information extraction from big data

  11. Automated Extraction of Family History Information from Clinical Notes

    Science.gov (United States)

    Bill, Robert; Pakhomov, Serguei; Chen, Elizabeth S.; Winden, Tamara J.; Carter, Elizabeth W.; Melton, Genevieve B.

    2014-01-01

    Despite increased functionality for obtaining family history in a structured format within electronic health record systems, clinical notes often still contain this information. We developed and evaluated an Unstructured Information Management Application (UIMA)-based natural language processing (NLP) module for automated extraction of family history information with functionality for identifying statements, observations (e.g., disease or procedure), relative or side of family with attributes (i.e., vital status, age of diagnosis, certainty, and negation), and predication (“indicator phrases”), the latter of which was used to establish relationships between observations and family member. The family history NLP system demonstrated F-scores of 66.9, 92.4, 82.9, 57.3, 97.7, and 61.9 for detection of family history statements, family member identification, observation identification, negation identification, vital status, and overall extraction of the predications between family members and observations, respectively. While the system performed well for detection of family history statements and predication constituents, further work is needed to improve extraction of certainty and temporal modifications. PMID:25954443

  12. Environmental Metagenomics: The Data Assembly and Data Analysis Perspectives

    Science.gov (United States)

    Kumar, Vinay; Maitra, S. S.; Shukla, Rohit Nandan

    2015-01-01

    Novel gene finding is one of the emerging fields in the environmental research. In the past decades the research was focused mainly on the discovery of microorganisms which were capable of degrading a particular compound. A lot of methods are available in literature about the cultivation and screening of these novel microorganisms. All of these methods are efficient for screening of microbes which can be cultivated in the laboratory. Microorganisms which live in extreme conditions like hot springs, frozen glaciers, acid mine drainage, etc. cannot be cultivated in the laboratory, this is because of incomplete knowledge about their growth requirements like temperature, nutrients and their mutual dependence on each other. The microbes that can be cultivated correspond only to less than 1 % of the total microbes which are present in the earth. Rest of the 99 % of uncultivated majority remains inaccessible. Metagenomics transcends the culture requirements of microbes. In metagenomics DNA is directly extracted from the environmental samples such as soil, seawater, acid mine drainage etc., followed by construction and screening of metagenomic library. With the ongoing research, a huge amount of metagenomic data is accumulating. Understanding this data is an essential step to extract novel genes of industrial importance. Various bioinformatics tools have been designed to analyze and annotate the data produced from the metagenome. The Bio-informatic requirements of metagenomics data analysis are different in theory and practice. This paper reviews the tools that are available for metagenomic data analysis and the capability such tools—what they can do and their web availability.

  13. Generating viral metagenomes from the coral holobiont

    Directory of Open Access Journals (Sweden)

    Karen Dawn Weynberg

    2014-05-01

    Full Text Available Reef-building corals comprise multipartite symbioses where the cnidarian animal is host to an array of eukaryotic and prokaryotic organisms, and the viruses that infect them. These viruses are critical elements of the coral holobiont, serving not only as agents of mortality, but also as potential vectors for lateral gene flow, and as elements encoding a variety of auxiliary metabolic functions. Consequently, understanding the functioning and health of the coral holobiont requires detailed knowledge of the associated viral assemblage and its function. Currently, the most tractable way of uncovering viral diversity and function is through metagenomic approaches, which is inherently difficult in corals because of the complex holobiont community, an extracellular mucus layer that all corals secrete, and the variety of sizes and structures of nucleic acids found in viruses. Here we present the first protocol for isolating, purifying and amplifying viral nucleic acids from corals based on mechanical disruption of cells. This method produces at least 50% higher yields of viral nucleic acids, has very low levels of cellular sequence contamination and captures wider viral diversity than previously used chemical-based extraction methods. We demonstrate that our mechanical-based method profiles a greater diversity of DNA and RNA genomes, including virus groups such as Retro-transcribing and ssRNA viruses, which are absent from metagenomes generated via chemical-based methods. In addition, we briefly present (and make publically available the first paired DNA and RNA viral metagenomes from the coral Acropora tenuis.

  14. Transliteration normalization for Information Extraction and Machine Translation

    Directory of Open Access Journals (Sweden)

    Yuval Marton

    2014-12-01

    Full Text Available Foreign name transliterations typically include multiple spelling variants. These variants cause data sparseness and inconsistency problems, increase the Out-of-Vocabulary (OOV rate, and present challenges for Machine Translation, Information Extraction and other natural language processing (NLP tasks. This work aims to identify and cluster name spelling variants using a Statistical Machine Translation method: word alignment. The variants are identified by being aligned to the same “pivot” name in another language (the source-language in Machine Translation settings. Based on word-to-word translation and transliteration probabilities, as well as the string edit distance metric, names with similar spellings in the target language are clustered and then normalized to a canonical form. With this approach, tens of thousands of high-precision name transliteration spelling variants are extracted from sentence-aligned bilingual corpora in Arabic and English (in both languages. When these normalized name spelling variants are applied to Information Extraction tasks, improvements over strong baseline systems are observed. When applied to Machine Translation tasks, a large improvement potential is shown.

  15. Exploring Antibiotic Resistance Genes and Metal Resistance Genes in Plasmid Metagenomes from Wastewater Treatment Plants

    Directory of Open Access Journals (Sweden)

    An-Dong eLi

    2015-09-01

    Full Text Available Plasmids operate as independent genetic elements in microorganism communities. Through horizontal gene transfer, they can provide their host microorganisms with important functions such as antibiotic resistance and heavy metal resistance. In this study, six metagenomic libraries were constructed with plasmid DNA extracted from influent, activated sludge and digested sludge of two wastewater treatment plants. Compared with the metagenomes of the total DNA extracted from the same sectors of the wastewater treatment plant, the plasmid metagenomes had significantly higher annotation rates, indicating that the functional genes on plasmids are commonly shared by those studied microorganisms. Meanwhile, the plasmid metagenomes also encoded many more genes related to defense mechanisms, including ARGs. Searching against an antibiotic resistance genes (ARGs database and a metal resistance genes (MRGs database revealed a broad-spectrum of antibiotic (323 out of a total 618 subtypes and metal resistance genes (23 out of a total 23 types on these plasmid metagenomes. The influent plasmid metagenomes contained many more resistance genes (both ARGs and MRGs than the activated sludge and the digested sludge metagenomes. Sixteen novel plasmids with a complete circular structure that carried these resistance genes were assembled from the plasmid metagenomes. The results of this study demonstrated that the plasmids in wastewater treatment plants could be important reservoirs for resistance genes, and may play a significant role in the horizontal transfer of these genes.

  16. Microbial Metagenomics: Beyond the Genome

    Science.gov (United States)

    Gilbert, Jack A.; Dupont, Christopher L.

    2011-01-01

    Metagenomics literally means “beyond the genome.” Marine microbial metagenomic databases presently comprise ˜400 billion base pairs of DNA, only ˜3% of that found in 1 ml of seawater. Very soon a trillion-base-pair sequence run will be feasible, so it is time to reflect on what we have learned from metagenomics. We review the impact of metagenomics on our understanding of marine microbial communities. We consider the studies facilitated by data generated through the Global Ocean Sampling expedition, as well as the revolution wrought at the individual laboratory level through next generation sequencing technologies. We review recent studies and discoveries since 2008, provide a discussion of bioinformatic analyses, including conceptual pipelines and sequence annotation and predict the future of metagenomics, with suggestions of collaborative community studies tailored toward answering some of the fundamental questions in marine microbial ecology.

  17. A Bioinformatician's Guide to Metagenomics

    Energy Technology Data Exchange (ETDEWEB)

    Kunin, Victor; Copeland, Alex; Lapidus, Alla; Mavromatis, Konstantinos; Hugenholtz, Philip

    2008-08-01

    As random shotgun metagenomic projects proliferate and become the dominant source of publicly available sequence data, procedures for best practices in their execution and analysis become increasingly important. Based on our experience at the Joint Genome Institute, we describe step-by-step the chain of decisions accompanying a metagenomic project from the viewpoint of a bioinformatician. We guide the reader through a standard workflow for a metagenomic project beginning with pre-sequencing considerations such as community composition and sequence data type that will greatly influence downstream analyses. We proceed with recommendations for sampling and data generation including sample and metadata collection, community profiling, construction of shotgun libraries and sequencing strategies. We then discuss the application of generic sequence processing steps (read preprocessing, assembly, and gene prediction and annotation) to metagenomic datasets by contrast to genome projects. Different types of data analyses particular to metagenomes are then presented including binning, dominant population analysis and gene-centric analysis. Finally data management systems and issues are presented and discussed. We hope that this review will assist bioinformaticians and biologists in making better-informed decisions on their journey during a metagenomic project.

  18. Knowledge discovery: Extracting usable information from large amounts of data

    Energy Technology Data Exchange (ETDEWEB)

    Whiteson, R.

    1998-12-31

    The threat of nuclear weapons proliferation is a problem of world wide concern. Safeguards are the key to nuclear nonproliferation and data is the key to safeguards. The safeguards community has access to a huge and steadily growing volume of data. The advantages of this data rich environment are obvious, there is a great deal of information which can be utilized. The challenge is to effectively apply proven and developing technologies to find and extract usable information from that data. That information must then be assessed and evaluated to produce the knowledge needed for crucial decision making. Efficient and effective analysis of safeguards data will depend on utilizing technologies to interpret the large, heterogeneous data sets that are available from diverse sources. With an order-of-magnitude increase in the amount of data from a wide variety of technical, textual, and historical sources there is a vital need to apply advanced computer technologies to support all-source analysis. There are techniques of data warehousing, data mining, and data analysis that can provide analysts with tools that will expedite their extracting useable information from the huge amounts of data to which they have access. Computerized tools can aid analysts by integrating heterogeneous data, evaluating diverse data streams, automating retrieval of database information, prioritizing inputs, reconciling conflicting data, doing preliminary interpretations, discovering patterns or trends in data, and automating some of the simpler prescreening tasks that are time consuming and tedious. Thus knowledge discovery technologies can provide a foundation of support for the analyst. Rather than spending time sifting through often irrelevant information, analysts could use their specialized skills in a focused, productive fashion. This would allow them to make their analytical judgments with more confidence and spend more of their time doing what they do best.

  19. Knowledge discovery: Extracting usable information from large amounts of data

    International Nuclear Information System (INIS)

    Whiteson, R.

    1998-01-01

    The threat of nuclear weapons proliferation is a problem of world wide concern. Safeguards are the key to nuclear nonproliferation and data is the key to safeguards. The safeguards community has access to a huge and steadily growing volume of data. The advantages of this data rich environment are obvious, there is a great deal of information which can be utilized. The challenge is to effectively apply proven and developing technologies to find and extract usable information from that data. That information must then be assessed and evaluated to produce the knowledge needed for crucial decision making. Efficient and effective analysis of safeguards data will depend on utilizing technologies to interpret the large, heterogeneous data sets that are available from diverse sources. With an order-of-magnitude increase in the amount of data from a wide variety of technical, textual, and historical sources there is a vital need to apply advanced computer technologies to support all-source analysis. There are techniques of data warehousing, data mining, and data analysis that can provide analysts with tools that will expedite their extracting useable information from the huge amounts of data to which they have access. Computerized tools can aid analysts by integrating heterogeneous data, evaluating diverse data streams, automating retrieval of database information, prioritizing inputs, reconciling conflicting data, doing preliminary interpretations, discovering patterns or trends in data, and automating some of the simpler prescreening tasks that are time consuming and tedious. Thus knowledge discovery technologies can provide a foundation of support for the analyst. Rather than spending time sifting through often irrelevant information, analysts could use their specialized skills in a focused, productive fashion. This would allow them to make their analytical judgments with more confidence and spend more of their time doing what they do best

  20. Evolving spectral transformations for multitemporal information extraction using evolutionary computation

    Science.gov (United States)

    Momm, Henrique; Easson, Greg

    2011-01-01

    Remote sensing plays an important role in assessing temporal changes in land features. The challenge often resides in the conversion of large quantities of raw data into actionable information in a timely and cost-effective fashion. To address this issue, research was undertaken to develop an innovative methodology integrating biologically-inspired algorithms with standard image classification algorithms to improve information extraction from multitemporal imagery. Genetic programming was used as the optimization engine to evolve feature-specific candidate solutions in the form of nonlinear mathematical expressions of the image spectral channels (spectral indices). The temporal generalization capability of the proposed system was evaluated by addressing the task of building rooftop identification from a set of images acquired at different dates in a cross-validation approach. The proposed system generates robust solutions (kappa values > 0.75 for stage 1 and > 0.4 for stage 2) despite the statistical differences between the scenes caused by land use and land cover changes coupled with variable environmental conditions, and the lack of radiometric calibration between images. Based on our results, the use of nonlinear spectral indices enhanced the spectral differences between features improving the clustering capability of standard classifiers and providing an alternative solution for multitemporal information extraction.

  1. Exploring antibiotic resistance genes and metal resistance genes in plasmid metagenomes from wastewater treatment plants

    OpenAIRE

    Li, An-Dong; Li, Li-Guan; Zhang, Tong

    2015-01-01

    Plasmids operate as independent genetic elements in microorganism communities. Through horizontal gene transfer, they can provide their host microorganisms with important functions such as antibiotic resistance and heavy metal resistance. In this study, six metagenomic libraries were constructed with plasmid DNA extracted from influent, activated sludge and digested sludge of two wastewater treatment plants. Compared with the metagenomes of the total DNA extracted from the same sectors of the...

  2. Protein structure determination using metagenome sequence data.

    Science.gov (United States)

    Ovchinnikov, Sergey; Park, Hahnbeom; Varghese, Neha; Huang, Po-Ssu; Pavlopoulos, Georgios A; Kim, David E; Kamisetty, Hetunandan; Kyrpides, Nikos C; Baker, David

    2017-01-20

    Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families and that metagenome sequence data more than triple the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact-based structure matching, and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the Protein Data Bank. This approach provides the representative models for large protein families originally envisioned as the goal of the Protein Structure Initiative at a fraction of the cost. Copyright © 2017, American Association for the Advancement of Science.

  3. ONTOGRABBING: Extracting Information from Texts Using Generative Ontologies

    DEFF Research Database (Denmark)

    Nilsson, Jørgen Fischer; Szymczak, Bartlomiej Antoni; Jensen, P.A.

    2009-01-01

    We describe principles for extracting information from texts using a so-called generative ontology in combination with syntactic analysis. Generative ontologies are introduced as semantic domains for natural language phrases. Generative ontologies extend ordinary finite ontologies with rules...... for producing recursively shaped terms representing the ontological content (ontological semantics) of NL noun phrases and other phrases. We focus here on achieving a robust, often only partial, ontology-driven parsing of and ascription of semantics to a sentence in the text corpus. The aim of the ontological...

  4. A High Accuracy Method for Semi-supervised Information Extraction

    Energy Technology Data Exchange (ETDEWEB)

    Tratz, Stephen C.; Sanfilippo, Antonio P.

    2007-04-22

    Customization to specific domains of dis-course and/or user requirements is one of the greatest challenges for today’s Information Extraction (IE) systems. While demonstrably effective, both rule-based and supervised machine learning approaches to IE customization pose too high a burden on the user. Semi-supervised learning approaches may in principle offer a more resource effective solution but are still insufficiently accurate to grant realistic application. We demonstrate that this limitation can be overcome by integrating fully-supervised learning techniques within a semi-supervised IE approach, without increasing resource requirements.

  5. Metagenomic analysis of kimchi, a traditional Korean fermented food.

    Science.gov (United States)

    Jung, Ji Young; Lee, Se Hee; Kim, Jeong Myeong; Park, Moon Su; Bae, Jin-Woo; Hahn, Yoonsoo; Madsen, Eugene L; Jeon, Che Ok

    2011-04-01

    Kimchi, a traditional food in the Korean culture, is made from vegetables by fermentation. In this study, metagenomic approaches were used to monitor changes in bacterial populations, metabolic potential, and overall genetic features of the microbial community during the 29-day fermentation process. Metagenomic DNA was extracted from kimchi samples obtained periodically and was sequenced using a 454 GS FLX Titanium system, which yielded a total of 701,556 reads, with an average read length of 438 bp. Phylogenetic analysis based on 16S rRNA genes from the metagenome indicated that the kimchi microbiome was dominated by members of three genera: Leuconostoc, Lactobacillus, and Weissella. Assignment of metagenomic sequences to SEED categories of the Metagenome Rapid Annotation using Subsystem Technology (MG-RAST) server revealed a genetic profile characteristic of heterotrophic lactic acid fermentation of carbohydrates, which was supported by the detection of mannitol, lactate, acetate, and ethanol as fermentation products. When the metagenomic reads were mapped onto the database of completed genomes, the Leuconostoc mesenteroides subsp. mesenteroides ATCC 8293 and Lactobacillus sakei subsp. sakei 23K genomes were highly represented. These same two genera were confirmed to be important in kimchi fermentation when the majority of kimchi metagenomic sequences showed very high identity to Leuconostoc mesenteroides and Lactobacillus genes. Besides microbial genome sequences, a surprisingly large number of phage DNA sequences were identified from the cellular fractions, possibly indicating that a high proportion of cells were infected by bacteriophages during fermentation. Overall, these results provide insights into the kimchi microbial community and also shed light on fermentation processes carried out broadly by complex microbial communities.

  6. Karst rocky desertification information extraction with EO-1 Hyperion data

    Science.gov (United States)

    Yue, Yuemin; Wang, Kelin; Zhang, Bing; Jiao, Quanjun; Yu, Yizun

    2008-12-01

    Karst rocky desertification is a special kind of land desertification developed under violent human impacts on the vulnerable eco-geo-environment of karst ecosystem. The process of karst rocky desertification results in simultaneous and complex variations of many interrelated soil, rock and vegetation biogeophysical parameters, rendering it difficult to develop simple and robust remote sensing mapping and monitoring approaches. In this study, we aimed to use Earth Observing 1 (EO-1) Hyperion hyperspectral data to extract the karst rocky desertification information. A spectral unmixing model based on Monte Carlo approach, was employed to quantify the fractional cover of photosynthetic vegetation (PV), non-photosynthetic vegetation (NPV) and bare substrates. The results showed that SWIR (1.9-2.35μm) portions of the spectrum were significantly different in PV, NPV and bare rock spectral properties. It has limitations in using full optical range or only SWIR (1.9-2.35μm) region of Hyperion to decompose image into PV, NPV and bare substrates covers. However, when use the tied-SWIR, the sub-pixel fractional covers of PV, NPV and bare substrates were accurately estimated. Our study indicates that the "tied-spectrum" method effectively accentuate the spectral characteristics of materials, while the spectral unmixing model based on Monte Carlo approach is a useful tool to automatically extract mixed ground objects in karst ecosystem. Karst rocky desertification information can be accurately extracted with EO-1 Hyperion. Imaging spectroscopy can provide a powerful methodology toward understanding the extent and spatial pattern of land degradation in karst ecosystem.

  7. Automated extraction of chemical structure information from digital raster images

    Science.gov (United States)

    Park, Jungkap; Rosania, Gus R; Shedden, Kerby A; Nguyen, Mandee; Lyu, Naesung; Saitou, Kazuhiro

    2009-01-01

    Background To search for chemical structures in research articles, diagrams or text representing molecules need to be translated to a standard chemical file format compatible with cheminformatic search engines. Nevertheless, chemical information contained in research articles is often referenced as analog diagrams of chemical structures embedded in digital raster images. To automate analog-to-digital conversion of chemical structure diagrams in scientific research articles, several software systems have been developed. But their algorithmic performance and utility in cheminformatic research have not been investigated. Results This paper aims to provide critical reviews for these systems and also report our recent development of ChemReader – a fully automated tool for extracting chemical structure diagrams in research articles and converting them into standard, searchable chemical file formats. Basic algorithms for recognizing lines and letters representing bonds and atoms in chemical structure diagrams can be independently run in sequence from a graphical user interface-and the algorithm parameters can be readily changed-to facilitate additional development specifically tailored to a chemical database annotation scheme. Compared with existing software programs such as OSRA, Kekule, and CLiDE, our results indicate that ChemReader outperforms other software systems on several sets of sample images from diverse sources in terms of the rate of correct outputs and the accuracy on extracting molecular substructure patterns. Conclusion The availability of ChemReader as a cheminformatic tool for extracting chemical structure information from digital raster images allows research and development groups to enrich their chemical structure databases by annotating the entries with published research articles. Based on its stable performance and high accuracy, ChemReader may be sufficiently accurate for annotating the chemical database with links to scientific research

  8. Automated extraction of chemical structure information from digital raster images

    Directory of Open Access Journals (Sweden)

    Shedden Kerby A

    2009-02-01

    Full Text Available Abstract Background To search for chemical structures in research articles, diagrams or text representing molecules need to be translated to a standard chemical file format compatible with cheminformatic search engines. Nevertheless, chemical information contained in research articles is often referenced as analog diagrams of chemical structures embedded in digital raster images. To automate analog-to-digital conversion of chemical structure diagrams in scientific research articles, several software systems have been developed. But their algorithmic performance and utility in cheminformatic research have not been investigated. Results This paper aims to provide critical reviews for these systems and also report our recent development of ChemReader – a fully automated tool for extracting chemical structure diagrams in research articles and converting them into standard, searchable chemical file formats. Basic algorithms for recognizing lines and letters representing bonds and atoms in chemical structure diagrams can be independently run in sequence from a graphical user interface-and the algorithm parameters can be readily changed-to facilitate additional development specifically tailored to a chemical database annotation scheme. Compared with existing software programs such as OSRA, Kekule, and CLiDE, our results indicate that ChemReader outperforms other software systems on several sets of sample images from diverse sources in terms of the rate of correct outputs and the accuracy on extracting molecular substructure patterns. Conclusion The availability of ChemReader as a cheminformatic tool for extracting chemical structure information from digital raster images allows research and development groups to enrich their chemical structure databases by annotating the entries with published research articles. Based on its stable performance and high accuracy, ChemReader may be sufficiently accurate for annotating the chemical database with links

  9. Random whole metagenomic sequencing for forensic discrimination of soils.

    Science.gov (United States)

    Khodakova, Anastasia S; Smith, Renee J; Burgoyne, Leigh; Abarno, Damien; Linacre, Adrian

    2014-01-01

    Here we assess the ability of random whole metagenomic sequencing approaches to discriminate between similar soils from two geographically distinct urban sites for application in forensic science. Repeat samples from two parklands in residential areas separated by approximately 3 km were collected and the DNA was extracted. Shotgun, whole genome amplification (WGA) and single arbitrarily primed DNA amplification (AP-PCR) based sequencing techniques were then used to generate soil metagenomic profiles. Full and subsampled metagenomic datasets were then annotated against M5NR/M5RNA (taxonomic classification) and SEED Subsystems (metabolic classification) databases. Further comparative analyses were performed using a number of statistical tools including: hierarchical agglomerative clustering (CLUSTER); similarity profile analysis (SIMPROF); non-metric multidimensional scaling (NMDS); and canonical analysis of principal coordinates (CAP) at all major levels of taxonomic and metabolic classification. Our data showed that shotgun and WGA-based approaches generated highly similar metagenomic profiles for the soil samples such that the soil samples could not be distinguished accurately. An AP-PCR based approach was shown to be successful at obtaining reproducible site-specific metagenomic DNA profiles, which in turn were employed for successful discrimination of visually similar soil samples collected from two different locations.

  10. Random whole metagenomic sequencing for forensic discrimination of soils.

    Directory of Open Access Journals (Sweden)

    Anastasia S Khodakova

    Full Text Available Here we assess the ability of random whole metagenomic sequencing approaches to discriminate between similar soils from two geographically distinct urban sites for application in forensic science. Repeat samples from two parklands in residential areas separated by approximately 3 km were collected and the DNA was extracted. Shotgun, whole genome amplification (WGA and single arbitrarily primed DNA amplification (AP-PCR based sequencing techniques were then used to generate soil metagenomic profiles. Full and subsampled metagenomic datasets were then annotated against M5NR/M5RNA (taxonomic classification and SEED Subsystems (metabolic classification databases. Further comparative analyses were performed using a number of statistical tools including: hierarchical agglomerative clustering (CLUSTER; similarity profile analysis (SIMPROF; non-metric multidimensional scaling (NMDS; and canonical analysis of principal coordinates (CAP at all major levels of taxonomic and metabolic classification. Our data showed that shotgun and WGA-based approaches generated highly similar metagenomic profiles for the soil samples such that the soil samples could not be distinguished accurately. An AP-PCR based approach was shown to be successful at obtaining reproducible site-specific metagenomic DNA profiles, which in turn were employed for successful discrimination of visually similar soil samples collected from two different locations.

  11. Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art

    NARCIS (Netherlands)

    Habib, Mena Badieh; van Keulen, Maurice

    2011-01-01

    Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration

  12. Genome signature analysis of thermal virus metagenomes reveals Archaea and thermophilic signatures

    Directory of Open Access Journals (Sweden)

    Pride David T

    2008-09-01

    Full Text Available Abstract Background Metagenomic analysis provides a rich source of biological information for otherwise intractable viral communities. However, study of viral metagenomes has been hampered by its nearly complete reliance on BLAST algorithms for identification of DNA sequences. We sought to develop algorithms for examination of viral metagenomes to identify the origin of sequences independent of BLAST algorithms. We chose viral metagenomes obtained from two hot springs, Bear Paw and Octopus, in Yellowstone National Park, as they represent simple microbial populations where comparatively large contigs were obtained. Thermal spring metagenomes have high proportions of sequences without significant Genbank homology, which has hampered identification of viruses and their linkage with hosts. To analyze each metagenome, we developed a method to classify DNA fragments using genome signature-based phylogenetic classification (GSPC, where metagenomic fragments are compared to a database of oligonucleotide signatures for all previously sequenced Bacteria, Archaea, and viruses. Results From both Bear Paw and Octopus hot springs, each assembled contig had more similarity to other metagenome contigs than to any sequenced microbial genome based on GSPC analysis, suggesting a genome signature common to each of these extreme environments. While viral metagenomes from Bear Paw and Octopus share some similarity, the genome signatures from each locale are largely unique. GSPC using a microbial database predicts most of the Octopus metagenome has archaeal signatures, while bacterial signatures predominate in Bear Paw; a finding consistent with those of Genbank BLAST. When using a viral database, the majority of the Octopus metagenome is predicted to belong to archaeal virus Families Globuloviridae and Fuselloviridae, while none of the Bear Paw metagenome is predicted to belong to archaeal viruses. As expected, when microbial and viral databases are combined, each of

  13. Genome signature analysis of thermal virus metagenomes reveals Archaea and thermophilic signatures.

    Science.gov (United States)

    Pride, David T; Schoenfeld, Thomas

    2008-09-17

    Metagenomic analysis provides a rich source of biological information for otherwise intractable viral communities. However, study of viral metagenomes has been hampered by its nearly complete reliance on BLAST algorithms for identification of DNA sequences. We sought to develop algorithms for examination of viral metagenomes to identify the origin of sequences independent of BLAST algorithms. We chose viral metagenomes obtained from two hot springs, Bear Paw and Octopus, in Yellowstone National Park, as they represent simple microbial populations where comparatively large contigs were obtained. Thermal spring metagenomes have high proportions of sequences without significant Genbank homology, which has hampered identification of viruses and their linkage with hosts. To analyze each metagenome, we developed a method to classify DNA fragments using genome signature-based phylogenetic classification (GSPC), where metagenomic fragments are compared to a database of oligonucleotide signatures for all previously sequenced Bacteria, Archaea, and viruses. From both Bear Paw and Octopus hot springs, each assembled contig had more similarity to other metagenome contigs than to any sequenced microbial genome based on GSPC analysis, suggesting a genome signature common to each of these extreme environments. While viral metagenomes from Bear Paw and Octopus share some similarity, the genome signatures from each locale are largely unique. GSPC using a microbial database predicts most of the Octopus metagenome has archaeal signatures, while bacterial signatures predominate in Bear Paw; a finding consistent with those of Genbank BLAST. When using a viral database, the majority of the Octopus metagenome is predicted to belong to archaeal virus Families Globuloviridae and Fuselloviridae, while none of the Bear Paw metagenome is predicted to belong to archaeal viruses. As expected, when microbial and viral databases are combined, each of the Octopus and Bear Paw metagenomic contigs

  14. Towards standards for human fecal sample processing in metagenomic studies.

    Science.gov (United States)

    Costea, Paul I; Zeller, Georg; Sunagawa, Shinichi; Pelletier, Eric; Alberti, Adriana; Levenez, Florence; Tramontano, Melanie; Driessen, Marja; Hercog, Rajna; Jung, Ferris-Elias; Kultima, Jens Roat; Hayward, Matthew R; Coelho, Luis Pedro; Allen-Vercoe, Emma; Bertrand, Laurie; Blaut, Michael; Brown, Jillian R M; Carton, Thomas; Cools-Portier, Stéphanie; Daigneault, Michelle; Derrien, Muriel; Druesne, Anne; de Vos, Willem M; Finlay, B Brett; Flint, Harry J; Guarner, Francisco; Hattori, Masahira; Heilig, Hans; Luna, Ruth Ann; van Hylckama Vlieg, Johan; Junick, Jana; Klymiuk, Ingeborg; Langella, Philippe; Le Chatelier, Emmanuelle; Mai, Volker; Manichanh, Chaysavanh; Martin, Jennifer C; Mery, Clémentine; Morita, Hidetoshi; O'Toole, Paul W; Orvain, Céline; Patil, Kiran Raosaheb; Penders, John; Persson, Søren; Pons, Nicolas; Popova, Milena; Salonen, Anne; Saulnier, Delphine; Scott, Karen P; Singh, Bhagirath; Slezak, Kathleen; Veiga, Patrick; Versalovic, James; Zhao, Liping; Zoetendal, Erwin G; Ehrlich, S Dusko; Dore, Joel; Bork, Peer

    2017-11-01

    Technical variation in metagenomic analysis must be minimized to confidently assess the contributions of microbiota to human health. Here we tested 21 representative DNA extraction protocols on the same fecal samples and quantified differences in observed microbial community composition. We compared them with differences due to library preparation and sample storage, which we contrasted with observed biological variation within the same specimen or within an individual over time. We found that DNA extraction had the largest effect on the outcome of metagenomic analysis. To rank DNA extraction protocols, we considered resulting DNA quantity and quality, and we ascertained biases in estimates of community diversity and the ratio between Gram-positive and Gram-negative bacteria. We recommend a standardized DNA extraction method for human fecal samples, for which transferability across labs was established and which was further benchmarked using a mock community of known composition. Its adoption will improve comparability of human gut microbiome studies and facilitate meta-analyses.

  15. Assembling large, complex environmental metagenomes

    Energy Technology Data Exchange (ETDEWEB)

    Howe, A. C. [Michigan State Univ., East Lansing, MI (United States). Microbiology and Molecular Genetics, Plant Soil and Microbial Sciences; Jansson, J. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Earth Sciences Division; Malfatti, S. A. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Tringe, S. G. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Tiedje, J. M. [Michigan State Univ., East Lansing, MI (United States). Microbiology and Molecular Genetics, Plant Soil and Microbial Sciences; Brown, C. T. [Michigan State Univ., East Lansing, MI (United States). Microbiology and Molecular Genetics, Computer Science and Engineering

    2012-12-28

    The large volumes of sequencing data required to sample complex environments deeply pose new challenges to sequence analysis approaches. De novo metagenomic assembly effectively reduces the total amount of data to be analyzed but requires significant computational resources. We apply two pre-assembly filtering approaches, digital normalization and partitioning, to make large metagenome assemblies more computationaly tractable. Using a human gut mock community dataset, we demonstrate that these methods result in assemblies nearly identical to assemblies from unprocessed data. We then assemble two large soil metagenomes from matched Iowa corn and native prairie soils. The predicted functional content and phylogenetic origin of the assembled contigs indicate significant taxonomic differences despite similar function. The assembly strategies presented are generic and can be extended to any metagenome; full source code is freely available under a BSD license.

  16. Automated Extraction of Substance Use Information from Clinical Texts.

    Science.gov (United States)

    Wang, Yan; Chen, Elizabeth S; Pakhomov, Serguei; Arsoniadis, Elliot; Carter, Elizabeth W; Lindemann, Elizabeth; Sarkar, Indra Neil; Melton, Genevieve B

    2015-01-01

    Within clinical discourse, social history (SH) includes important information about substance use (alcohol, drug, and nicotine use) as key risk factors for disease, disability, and mortality. In this study, we developed and evaluated a natural language processing (NLP) system for automated detection of substance use statements and extraction of substance use attributes (e.g., temporal and status) based on Stanford Typed Dependencies. The developed NLP system leveraged linguistic resources and domain knowledge from a multi-site social history study, Propbank and the MiPACQ corpus. The system attained F-scores of 89.8, 84.6 and 89.4 respectively for alcohol, drug, and nicotine use statement detection, as well as average F-scores of 82.1, 90.3, 80.8, 88.7, 96.6, and 74.5 respectively for extraction of attributes. Our results suggest that NLP systems can achieve good performance when augmented with linguistic resources and domain knowledge when applied to a wide breadth of substance use free text clinical notes.

  17. Domain-independent information extraction in unstructured text

    Energy Technology Data Exchange (ETDEWEB)

    Irwin, N.H. [Sandia National Labs., Albuquerque, NM (United States). Software Surety Dept.

    1996-09-01

    Extracting information from unstructured text has become an important research area in recent years due to the large amount of text now electronically available. This status report describes the findings and work done during the second year of a two-year Laboratory Directed Research and Development Project. Building on the first-year`s work of identifying important entities, this report details techniques used to group words into semantic categories and to output templates containing selective document content. Using word profiles and category clustering derived during a training run, the time-consuming knowledge-building task can be avoided. Though the output still lacks in completeness when compared to systems with domain-specific knowledge bases, the results do look promising. The two approaches are compatible and could complement each other within the same system. Domain-independent approaches retain appeal as a system that adapts and learns will soon outpace a system with any amount of a priori knowledge.

  18. Querying and Extracting Timeline Information from Road Traffic Sensor Data

    Science.gov (United States)

    Imawan, Ardi; Indikawati, Fitri Indra; Kwon, Joonho; Rao, Praveen

    2016-01-01

    The escalation of traffic congestion in urban cities has urged many countries to use intelligent transportation system (ITS) centers to collect historical traffic sensor data from multiple heterogeneous sources. By analyzing historical traffic data, we can obtain valuable insights into traffic behavior. Many existing applications have been proposed with limited analysis results because of the inability to cope with several types of analytical queries. In this paper, we propose the QET (querying and extracting timeline information) system—a novel analytical query processing method based on a timeline model for road traffic sensor data. To address query performance, we build a TQ-index (timeline query-index) that exploits spatio-temporal features of timeline modeling. We also propose an intuitive timeline visualization method to display congestion events obtained from specified query parameters. In addition, we demonstrate the benefit of our system through a performance evaluation using a Busan ITS dataset and a Seattle freeway dataset. PMID:27563900

  19. Extraction of neutron spectral information from Bonner-Sphere data

    CERN Document Server

    Haney, J H; Zaidins, C S

    1999-01-01

    We have extended a least-squares method of extracting neutron spectral information from Bonner-Sphere data which was previously developed by Zaidins et al. (Med. Phys. 5 (1978) 42). A pulse-height analysis with background stripping is employed which provided a more accurate count rate for each sphere. Newer response curves by Mares and Schraube (Nucl. Instr. and Meth. A 366 (1994) 461) were included for the moderating spheres and the bare detector which comprise the Bonner spectrometer system. Finally, the neutron energy spectrum of interest was divided using the philosophy of fuzzy logic into three trapezoidal regimes corresponding to slow, moderate, and fast neutrons. Spectral data was taken using a PuBe source in two different environments and the analyzed data is presented for these cases as slow, moderate, and fast neutron fluences. (author)

  20. Recovery of a Medieval Brucella melitensis Genome Using Shotgun Metagenomics

    Science.gov (United States)

    Kay, Gemma L.; Sergeant, Martin J.; Giuffra, Valentina; Bandiera, Pasquale; Milanese, Marco; Bramanti, Barbara

    2014-01-01

    ABSTRACT Shotgun metagenomics provides a powerful assumption-free approach to the recovery of pathogen genomes from contemporary and historical material. We sequenced the metagenome of a calcified nodule from the skeleton of a 14th-century middle-aged male excavated from the medieval Sardinian settlement of Geridu. We obtained 6.5-fold coverage of a Brucella melitensis genome. Sequence reads from this genome showed signatures typical of ancient or aged DNA. Despite the relatively low coverage, we were able to use information from single-nucleotide polymorphisms to place the medieval pathogen genome within a clade of B. melitensis strains that included the well-studied Ether strain and two other recent Italian isolates. We confirmed this placement using information from deletions and IS711 insertions. We conclude that metagenomics stands ready to document past and present infections, shedding light on the emergence, evolution, and spread of microbial pathogens. PMID:25028426

  1. Information extraction and knowledge graph construction from geoscience literature

    Science.gov (United States)

    Wang, Chengbin; Ma, Xiaogang; Chen, Jianguo; Chen, Jingwen

    2018-03-01

    Geoscience literature published online is an important part of open data, and brings both challenges and opportunities for data analysis. Compared with studies of numerical geoscience data, there are limited works on information extraction and knowledge discovery from textual geoscience data. This paper presents a workflow and a few empirical case studies for that topic, with a focus on documents written in Chinese. First, we set up a hybrid corpus combining the generic and geology terms from geology dictionaries to train Chinese word segmentation rules of the Conditional Random Fields model. Second, we used the word segmentation rules to parse documents into individual words, and removed the stop-words from the segmentation results to get a corpus constituted of content-words. Third, we used a statistical method to analyze the semantic links between content-words, and we selected the chord and bigram graphs to visualize the content-words and their links as nodes and edges in a knowledge graph, respectively. The resulting graph presents a clear overview of key information in an unstructured document. This study proves the usefulness of the designed workflow, and shows the potential of leveraging natural language processing and knowledge graph technologies for geoscience.

  2. Reconstruction of bacterial and viral genomes from multiple metagenomes

    Directory of Open Access Journals (Sweden)

    Vineet K Sharma

    2016-04-01

    Full Text Available Several metagenomic projects have been accomplished or are in progress. However, in most cases, it is not feasible to generate complete genomic assemblies of species from the metagenomic sequencing of a complex environment. Only a few studies have reported the reconstruction of bacterial genomes from complex metagenomes. In this work, Binning-Assembly approach has been proposed and demonstrated for the reconstruction of bacterial and viral genomes from 72 human gut metagenomic datasets. A total 1,156 bacterial genomes belonging to 219 bacterial families and, 279 viral genomes belonging to 84 viral families could be identified. More than 80% complete draft genome sequences could be reconstructed for a total of 126 bacterial and 11 viral genomes. Selected draft assembled genomes could be validated with 99.8% accuracy using their ORFs. The study provides useful information on the assembly expected for a species given its number of reads and abundance. This approach along with spiking was also demonstrated to be useful in improving the draft assembly of a bacterial genome. The Binning-Assembly approach can be successfully used to reconstruct bacterial and viral genomes from multiple metagenomic datasets obtained from similar environments.

  3. Reconstruction of Bacterial and Viral Genomes from Multiple Metagenomes.

    Science.gov (United States)

    Gupta, Ankit; Kumar, Sanjiv; Prasoodanan, Vishnu P K; Harish, K; Sharma, Ashok K; Sharma, Vineet K

    2016-01-01

    Several metagenomic projects have been accomplished or are in progress. However, in most cases, it is not feasible to generate complete genomic assemblies of species from the metagenomic sequencing of a complex environment. Only a few studies have reported the reconstruction of bacterial genomes from complex metagenomes. In this work, Binning-Assembly approach has been proposed and demonstrated for the reconstruction of bacterial and viral genomes from 72 human gut metagenomic datasets. A total 1156 bacterial genomes belonging to 219 bacterial families and, 279 viral genomes belonging to 84 viral families could be identified. More than 80% complete draft genome sequences could be reconstructed for a total of 126 bacterial and 11 viral genomes. Selected draft assembled genomes could be validated with 99.8% accuracy using their ORFs. The study provides useful information on the assembly expected for a species given its number of reads and abundance. This approach along with spiking was also demonstrated to be useful in improving the draft assembly of a bacterial genome. The Binning-Assembly approach can be successfully used to reconstruct bacterial and viral genomes from multiple metagenomic datasets obtained from similar environments.

  4. MetaStorm: A Public Resource for Customizable Metagenomics Annotation

    Science.gov (United States)

    Arango-Argoty, Gustavo; Singh, Gargi; Heath, Lenwood S.; Pruden, Amy; Xiao, Weidong; Zhang, Liqing

    2016-01-01

    Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/), which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution. PMID:27632579

  5. Multi-Filter String Matching and Human-Centric Entity Matching for Information Extraction

    Science.gov (United States)

    Sun, Chong

    2012-01-01

    More and more information is being generated in text documents, such as Web pages, emails and blogs. To effectively manage this unstructured information, one broadly used approach includes locating relevant content in documents, extracting structured information and integrating the extracted information for querying, mining or further analysis. In…

  6. Metagenomes obtained by "deep sequencing" - what do they tell about the EBPR communities

    DEFF Research Database (Denmark)

    Albertsen, Mads; Saunders, Aaron Marc; Nielsen, Kåre Lehmann

    Albertsen Keywords: Metagenomics; Accumulibacter; Micro-diversity; Enhanced Biological Phosphorus Removal Introduction Metagenomics, or environmental genomics, provides comprehensive information about the entire microbial community of a certain ecosystem, e.g. a wastewater treatment plant. So far......, metagenomic analyses have been hampered by high costs and high level of expertise needed to conduct the investigations, but it is changing now with development of new technologies allowing analyses of billions of DNA sequences (deep-sequencing) and user-friendly pipelines for analyses of the huge data sets...... in Albertsen et al., (2011). Results and Discussion We sequenced two metagenomes from Aalborg East and West EBPR wastewater treatment plants at a depth of 12 and 8 Gb using Illumina short read sequencing. The EBPR plants form a distinct group when compared to metagenomes from a wide range of environments, both...

  7. Data Assimilation to Extract Soil Moisture Information from SMAP Observations

    Directory of Open Access Journals (Sweden)

    Jana Kolassa

    2017-11-01

    Full Text Available This study compares different methods to extract soil moisture information through the assimilation of Soil Moisture Active Passive (SMAP observations. Neural network (NN and physically-based SMAP soil moisture retrievals were assimilated into the National Aeronautics and Space Administration (NASA Catchment model over the contiguous United States for April 2015 to March 2017. By construction, the NN retrievals are consistent with the global climatology of the Catchment model soil moisture. Assimilating the NN retrievals without further bias correction improved the surface and root zone correlations against in situ measurements from 14 SMAP core validation sites (CVS by 0.12 and 0.16, respectively, over the model-only skill, and reduced the surface and root zone unbiased root-mean-square error (ubRMSE by 0.005 m 3 m − 3 and 0.001 m 3 m − 3 , respectively. The assimilation reduced the average absolute surface bias against the CVS measurements by 0.009 m 3 m − 3 , but increased the root zone bias by 0.014 m 3 m − 3 . Assimilating the NN retrievals after a localized bias correction yielded slightly lower surface correlation and ubRMSE improvements, but generally the skill differences were small. The assimilation of the physically-based SMAP Level-2 passive soil moisture retrievals using a global bias correction yielded similar skill improvements, as did the direct assimilation of locally bias-corrected SMAP brightness temperatures within the SMAP Level-4 soil moisture algorithm. The results show that global bias correction methods may be able to extract more independent information from SMAP observations compared to local bias correction methods, but without accurate quality control and observation error characterization they are also more vulnerable to adverse effects from retrieval errors related to uncertainties in the retrieval inputs and algorithm. Furthermore, the results show that using global bias correction approaches without a

  8. Ancient DNA analysis identifies marine mollusc shells as new metagenomic archives of the past.

    Science.gov (United States)

    Der Sarkissian, Clio; Pichereau, Vianney; Dupont, Catherine; Ilsøe, Peter C; Perrigault, Mickael; Butler, Paul; Chauvaud, Laurent; Eiríksson, Jón; Scourse, James; Paillard, Christine; Orlando, Ludovic

    2017-09-01

    Marine mollusc shells enclose a wealth of information on coastal organisms and their environment. Their life history traits as well as (palaeo-) environmental conditions, including temperature, food availability, salinity and pollution, can be traced through the analysis of their shell (micro-) structure and biogeochemical composition. Adding to this list, the DNA entrapped in shell carbonate biominerals potentially offers a novel and complementary proxy both for reconstructing palaeoenvironments and tracking mollusc evolutionary trajectories. Here, we assess this potential by applying DNA extraction, high-throughput shotgun DNA sequencing and metagenomic analyses to marine mollusc shells spanning the last ~7,000 years. We report successful DNA extraction from shells, including a variety of ancient specimens, and find that DNA recovery is highly dependent on their biomineral structure, carbonate layer preservation and disease state. We demonstrate positive taxonomic identification of mollusc species using a combination of mitochondrial DNA genomes, barcodes, genome-scale data and metagenomic approaches. We also find shell biominerals to contain a diversity of microbial DNA from the marine environment. Finally, we reconstruct genomic sequences of organisms closely related to the Vibrio tapetis bacteria from Manila clam shells previously diagnosed with Brown Ring Disease. Our results reveal marine mollusc shells as novel genetic archives of the past, which opens new perspectives in ancient DNA research, with the potential to reconstruct the evolutionary history of molluscs, microbial communities and pathogens in the face of environmental changes. Other future applications include conservation of endangered mollusc species and aquaculture management. © 2017 John Wiley & Sons Ltd.

  9. COGNIZER: A Framework for Functional Annotation of Metagenomic Datasets.

    Science.gov (United States)

    Bose, Tungadri; Haque, Mohammed Monzoorul; Reddy, Cvsk; Mande, Sharmila S

    2015-01-01

    Recent advances in sequencing technologies have resulted in an unprecedented increase in the number of metagenomes that are being sequenced world-wide. Given their volume, functional annotation of metagenomic sequence datasets requires specialized computational tools/techniques. In spite of having high accuracy, existing stand-alone functional annotation tools necessitate end-users to perform compute-intensive homology searches of metagenomic datasets against "multiple" databases prior to functional analysis. Although, web-based functional annotation servers address to some extent the problem of availability of compute resources, uploading and analyzing huge volumes of sequence data on a shared public web-service has its own set of limitations. In this study, we present COGNIZER, a comprehensive stand-alone annotation framework which enables end-users to functionally annotate sequences constituting metagenomic datasets. The COGNIZER framework provides multiple workflow options. A subset of these options employs a novel directed-search strategy which helps in reducing the overall compute requirements for end-users. The COGNIZER framework includes a cross-mapping database that enables end-users to simultaneously derive/infer KEGG, Pfam, GO, and SEED subsystem information from the COG annotations. Validation experiments performed with real-world metagenomes and metatranscriptomes, generated using diverse sequencing technologies, indicate that the novel directed-search strategy employed in COGNIZER helps in reducing the compute requirements without significant loss in annotation accuracy. A comparison of COGNIZER's results with pre-computed benchmark values indicate the reliability of the cross-mapping database employed in COGNIZER. The COGNIZER framework is capable of comprehensively annotating any metagenomic or metatranscriptomic dataset from varied sequencing platforms in functional terms. Multiple search options in COGNIZER provide end-users the flexibility of

  10. COGNIZER: A Framework for Functional Annotation of Metagenomic Datasets.

    Directory of Open Access Journals (Sweden)

    Tungadri Bose

    Full Text Available Recent advances in sequencing technologies have resulted in an unprecedented increase in the number of metagenomes that are being sequenced world-wide. Given their volume, functional annotation of metagenomic sequence datasets requires specialized computational tools/techniques. In spite of having high accuracy, existing stand-alone functional annotation tools necessitate end-users to perform compute-intensive homology searches of metagenomic datasets against "multiple" databases prior to functional analysis. Although, web-based functional annotation servers address to some extent the problem of availability of compute resources, uploading and analyzing huge volumes of sequence data on a shared public web-service has its own set of limitations. In this study, we present COGNIZER, a comprehensive stand-alone annotation framework which enables end-users to functionally annotate sequences constituting metagenomic datasets. The COGNIZER framework provides multiple workflow options. A subset of these options employs a novel directed-search strategy which helps in reducing the overall compute requirements for end-users. The COGNIZER framework includes a cross-mapping database that enables end-users to simultaneously derive/infer KEGG, Pfam, GO, and SEED subsystem information from the COG annotations.Validation experiments performed with real-world metagenomes and metatranscriptomes, generated using diverse sequencing technologies, indicate that the novel directed-search strategy employed in COGNIZER helps in reducing the compute requirements without significant loss in annotation accuracy. A comparison of COGNIZER's results with pre-computed benchmark values indicate the reliability of the cross-mapping database employed in COGNIZER.The COGNIZER framework is capable of comprehensively annotating any metagenomic or metatranscriptomic dataset from varied sequencing platforms in functional terms. Multiple search options in COGNIZER provide end-users the

  11. Extracting critical information from group members' partial knowledge using the Searching Concealed Information Test.

    Science.gov (United States)

    Elaad, Eitan

    2016-12-01

    The Concealed Information Test (CIT) is a psychophysiological method designed to detect information that an individual cannot or does not wish to reveal. The present study used a version of the CIT, the Searching Concealed Information Test (SCIT), to extract information from partial information that participants possessed on a planned jailbreak. In the first experiment, 52 undergraduate students were randomly, but not equally, allocated into 15 different clusters of partial knowledge. In each, participants possessed knowledge about 2 of 6 critical items. Using a lenient decision rule, and a combined measure defined as the mean of 3 individual measures (skin conductance response amplitude, finger pulse, and respiration line length) 5 of the 6 critical items were identified. Experiment 2 extended the first experiment to unequal proportions of critical knowledge. Forty-six undergraduate students were randomly allocated into 25 clusters of partial knowledge in which 0, 1, 2, 3, or 6 pieces of information were known. Using the same lenient decision rule and the combined measure, all 6 items were identified. It was suggested that the Group SCIT is capable of assembling a comprehensive picture out of partial information possessed by informed innocent participants. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  12. The soil microbiome — from metagenomics to metaphenomics

    Energy Technology Data Exchange (ETDEWEB)

    Jansson, Janet K.; Hofmockel, Kirsten S.

    2018-06-01

    Soil microorganisms carry out important processes, including support of plant growth and cycling of carbon and other nutrients. However, the majority of soil microbes have not yet been isolated and their functions are largely unknown. Although metagenomic sequencing reveals microbial identities and functional gene information, it includes DNA from microbes with vastly varying physiological states. Therefore, metagenomics is only predictive of community functional potential. We posit that the next frontier lies in understanding the metaphenome, the product of the combined genetic potential of the microbiome and available resources. Here we describe examples of opportunities towards gaining understanding of the soil metaphenome.

  13. Earth Science Data Analytics: Preparing for Extracting Knowledge from Information

    Science.gov (United States)

    Kempler, Steven; Barbieri, Lindsay

    2016-01-01

    Data analytics is the process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations and other useful information. Data analytics is a broad term that includes data analysis, as well as an understanding of the cognitive processes an analyst uses to understand problems and explore data in meaningful ways. Analytics also include data extraction, transformation, and reduction, utilizing specific tools, techniques, and methods. Turning to data science, definitions of data science sound very similar to those of data analytics (which leads to a lot of the confusion between the two). But the skills needed for both, co-analyzing large amounts of heterogeneous data, understanding and utilizing relevant tools and techniques, and subject matter expertise, although similar, serve different purposes. Data Analytics takes on a practitioners approach to applying expertise and skills to solve issues and gain subject knowledge. Data Science, is more theoretical (research in itself) in nature, providing strategic actionable insights and new innovative methodologies. Earth Science Data Analytics (ESDA) is the process of examining, preparing, reducing, and analyzing large amounts of spatial (multi-dimensional), temporal, or spectral data using a variety of data types to uncover patterns, correlations and other information, to better understand our Earth. The large variety of datasets (temporal spatial differences, data types, formats, etc.) invite the need for data analytics skills that understand the science domain, and data preparation, reduction, and analysis techniques, from a practitioners point of view. The application of these skills to ESDA is the focus of this presentation. The Earth Science Information Partners (ESIP) Federation Earth Science Data Analytics (ESDA) Cluster was created in recognition of the practical need to facilitate the co-analysis of large amounts of data and information for Earth science. Thus, from a to

  14. Extracting information in spike time patterns with wavelets and information theory.

    Science.gov (United States)

    Lopes-dos-Santos, Vítor; Panzeri, Stefano; Kayser, Christoph; Diamond, Mathew E; Quian Quiroga, Rodrigo

    2015-02-01

    We present a new method to assess the information carried by temporal patterns in spike trains. The method first performs a wavelet decomposition of the spike trains, then uses Shannon information to select a subset of coefficients carrying information, and finally assesses timing information in terms of decoding performance: the ability to identify the presented stimuli from spike train patterns. We show that the method allows: 1) a robust assessment of the information carried by spike time patterns even when this is distributed across multiple time scales and time points; 2) an effective denoising of the raster plots that improves the estimate of stimulus tuning of spike trains; and 3) an assessment of the information carried by temporally coordinated spikes across neurons. Using simulated data, we demonstrate that the Wavelet-Information (WI) method performs better and is more robust to spike time-jitter, background noise, and sample size than well-established approaches, such as principal component analysis, direct estimates of information from digitized spike trains, or a metric-based method. Furthermore, when applied to real spike trains from monkey auditory cortex and from rat barrel cortex, the WI method allows extracting larger amounts of spike timing information. Importantly, the fact that the WI method incorporates multiple time scales makes it robust to the choice of partly arbitrary parameters such as temporal resolution, response window length, number of response features considered, and the number of available trials. These results highlight the potential of the proposed method for accurate and objective assessments of how spike timing encodes information. Copyright © 2015 the American Physiological Society.

  15. Network construction and structure detection with metagenomic count data.

    Science.gov (United States)

    Liu, Zhenqiu; Lin, Shili; Piantadosi, Steven

    2015-01-01

    The human microbiome plays a critical role in human health. Massive amounts of metagenomic data have been generated with advances in next-generation sequencing technologies that characterize microbial communities via direct isolation and sequencing. How to extract, analyze, and transform these vast amounts of data into useful knowledge is a great challenge to bioinformaticians. Microbial biodiversity research has focused primarily on taxa composition and abundance and less on the co-occurrences among different taxa. However, taxa co-occurrences and their relationships to environmental and clinical conditions are important because network structure may help to understand how microbial taxa function together. We propose a systematic robust approach for bacteria network construction and structure detection using metagenomic count data. Pairwise similarity/distance measures between taxa are proposed by adapting distance measures for samples in ecology. We also extend the sparse inverse covariance approach to a sparse inverse of a similarity matrix from count data for network construction. Our approach is efficient for large metagenomic count data with thousands of bacterial taxa. We evaluate our method with real and simulated data. Our method identifies true and biologically significant network structures efficiently. Network analysis is crucial for detecting subnetwork structures with metagenomic count data. We developed a software tool in MATLAB for network construction and biologically significant module detection. Software MetaNet can be downloaded from http://biostatistics.csmc.edu/MetaNet/.

  16. A COMPREHENSIVE STUDY ON TEXT INFORMATION EXTRACTION FROM NATURAL SCENE IMAGES

    OpenAIRE

    Anit V. Manjaly; B. Shanmuga Priya

    2016-01-01

    In Text Information Extraction (TIE) process, the text regions are localized and extracted from the images. It is an active research problem in computer vision applications. Diversity in text is due to the differences in size, style, orientation, alignment of text, low image contrast and complex backgrounds. The semantic information provided by an image can be used in different applications such as content based image retrieval, sign board identification etc. Text information extraction compr...

  17. Gene prediction in metagenomic fragments: a large scale machine learning approach.

    Science.gov (United States)

    Hoff, Katharina J; Tech, Maike; Lingner, Thomas; Daniel, Rolf; Morgenstern, Burkhard; Meinicke, Peter

    2008-04-28

    Metagenomics is an approach to the characterization of microbial genomes via the direct isolation of genomic sequences from the environment without prior cultivation. The amount of metagenomic sequence data is growing fast while computational methods for metagenome analysis are still in their infancy. In contrast to genomic sequences of single species, which can usually be assembled and analyzed by many available methods, a large proportion of metagenome data remains as unassembled anonymous sequencing reads. One of the aims of all metagenomic sequencing projects is the identification of novel genes. Short length, for example, Sanger sequencing yields on average 700 bp fragments, and unknown phylogenetic origin of most fragments require approaches to gene prediction that are different from the currently available methods for genomes of single species. In particular, the large size of metagenomic samples requires fast and accurate methods with small numbers of false positive predictions. We introduce a novel gene prediction algorithm for metagenomic fragments based on a two-stage machine learning approach. In the first stage, we use linear discriminants for monocodon usage, dicodon usage and translation initiation sites to extract features from DNA sequences. In the second stage, an artificial neural network combines these features with open reading frame length and fragment GC-content to compute the probability that this open reading frame encodes a protein. This probability is used for the classification and scoring of gene candidates. With large scale training, our method provides fast single fragment predictions with good sensitivity and specificity on artificially fragmented genomic DNA. Additionally, this method is able to predict translation initiation sites accurately and distinguishes complete from incomplete genes with high reliability. Large scale machine learning methods are well-suited for gene prediction in metagenomic DNA fragments. In particular, the

  18. Gene prediction in metagenomic fragments: A large scale machine learning approach

    Directory of Open Access Journals (Sweden)

    Morgenstern Burkhard

    2008-04-01

    Full Text Available Abstract Background Metagenomics is an approach to the characterization of microbial genomes via the direct isolation of genomic sequences from the environment without prior cultivation. The amount of metagenomic sequence data is growing fast while computational methods for metagenome analysis are still in their infancy. In contrast to genomic sequences of single species, which can usually be assembled and analyzed by many available methods, a large proportion of metagenome data remains as unassembled anonymous sequencing reads. One of the aims of all metagenomic sequencing projects is the identification of novel genes. Short length, for example, Sanger sequencing yields on average 700 bp fragments, and unknown phylogenetic origin of most fragments require approaches to gene prediction that are different from the currently available methods for genomes of single species. In particular, the large size of metagenomic samples requires fast and accurate methods with small numbers of false positive predictions. Results We introduce a novel gene prediction algorithm for metagenomic fragments based on a two-stage machine learning approach. In the first stage, we use linear discriminants for monocodon usage, dicodon usage and translation initiation sites to extract features from DNA sequences. In the second stage, an artificial neural network combines these features with open reading frame length and fragment GC-content to compute the probability that this open reading frame encodes a protein. This probability is used for the classification and scoring of gene candidates. With large scale training, our method provides fast single fragment predictions with good sensitivity and specificity on artificially fragmented genomic DNA. Additionally, this method is able to predict translation initiation sites accurately and distinguishes complete from incomplete genes with high reliability. Conclusion Large scale machine learning methods are well-suited for gene

  19. Medicaid Analytic eXtract (MAX) General Information

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Medicaid Analytic eXtract (MAX) data is a set of person-level data files on Medicaid eligibility, service utilization, and payments. The MAX data are created to...

  20. Automating Information Extraction from 3-D Scan Data

    National Research Council Canada - National Science Library

    Bradtmiller, Bruce

    1998-01-01

    ... and 7.2, newly developed software for extracting body measurements from 3-D scans. Investigators used traditional methods to measure 123 male and female subjects for 21 dimensions associated with the sizing and design of military clothing...

  1. Fizzy: feature subset selection for metagenomics.

    Science.gov (United States)

    Ditzler, Gregory; Morrison, J Calvin; Lan, Yemin; Rosen, Gail L

    2015-11-04

    Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α- & β-diversity. Feature subset selection--a sub-field of machine learning--can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate between age groups in the human gut microbiome. We have developed a new Python command line tool, which is compatible with the widely adopted BIOM format, for microbial ecologists that implements information-theoretic subset selection methods for biological data formats. We demonstrate the software tools capabilities on publicly available datasets. We have made the software implementation of Fizzy available to the public under the GNU GPL license. The standalone implementation can be found at http://github.com/EESI/Fizzy.

  2. The extraction of coastal windbreak forest information based on UAV remote sensing images

    Science.gov (United States)

    Shang, Weitao; Gao, Zhiqiang; Jiang, Xiaopeng; Chen, Maosi

    2017-09-01

    Unmanned aerial vehicle(UAV) have been increasingly used for natural resource applications in recent years as a result of their greater availability, the miniaturization of sensors, and the ability to deploy UAV relatively quickly and repeatedly at low altitudes. UAV remote sensing offer rich contextual information, including spatial, spectral and contextual information. In order to extract the information from these UAV remote sensing images, we need to utilize the spatial and contextual information of an object and its surroundings. If pixel based approaches are applied to extract information from such remotely sensed data, only spectral information is used. Thereby, in Pixel based approaches, information extraction is based exclusively on the gray level thresholding methods. To extract the certain features only from UAV remote sensing images, this situation becomes worse. To overcome this situation an object-oriented approach is implemented. By object-oriented thought, the coastal windbreak forest information are extracted by the use of UAV remote sensing images. Firstly, the images are segmented. And then the spectral information and object geometry information of images objects are comprehensively applied to build the coastal windbreak forest extraction knowledge base. Thirdly, the results of coastal windbreak forest extraction are improved and completed. The results show that better accuracy of coastal windbreak forest extraction can be obtained by the proposed method, in contrast to the pixel-oriented method. In this study, the overall accuracy of classified image is 0.94 and Kappa accuracy is 0.92.

  3. Captured metagenomics: large-scale targeting of genes based on ‘sequence capture’ reveals functional diversity in soils

    Science.gov (United States)

    Manoharan, Lokeshwaran; Kushwaha, Sandeep K.; Hedlund, Katarina; Ahrén, Dag

    2015-01-01

    Microbial enzyme diversity is a key to understand many ecosystem processes. Whole metagenome sequencing (WMG) obtains information on functional genes, but it is costly and inefficient due to large amount of sequencing that is required. In this study, we have applied a captured metagenomics technique for functional genes in soil microorganisms, as an alternative to WMG. Large-scale targeting of functional genes, coding for enzymes related to organic matter degradation, was applied to two agricultural soil communities through captured metagenomics. Captured metagenomics uses custom-designed, hybridization-based oligonucleotide probes that enrich functional genes of interest in metagenomic libraries where only probe-bound DNA fragments are sequenced. The captured metagenomes were highly enriched with targeted genes while maintaining their target diversity and their taxonomic distribution correlated well with the traditional ribosomal sequencing. The captured metagenomes were highly enriched with genes related to organic matter degradation; at least five times more than similar, publicly available soil WMG projects. This target enrichment technique also preserves the functional representation of the soils, thereby facilitating comparative metagenomics projects. Here, we present the first study that applies the captured metagenomics approach in large scale, and this novel method allows deep investigations of central ecosystem processes by studying functional gene abundances. PMID:26490729

  4. Captured metagenomics: large-scale targeting of genes based on 'sequence capture' reveals functional diversity in soils.

    Science.gov (United States)

    Manoharan, Lokeshwaran; Kushwaha, Sandeep K; Hedlund, Katarina; Ahrén, Dag

    2015-12-01

    Microbial enzyme diversity is a key to understand many ecosystem processes. Whole metagenome sequencing (WMG) obtains information on functional genes, but it is costly and inefficient due to large amount of sequencing that is required. In this study, we have applied a captured metagenomics technique for functional genes in soil microorganisms, as an alternative to WMG. Large-scale targeting of functional genes, coding for enzymes related to organic matter degradation, was applied to two agricultural soil communities through captured metagenomics. Captured metagenomics uses custom-designed, hybridization-based oligonucleotide probes that enrich functional genes of interest in metagenomic libraries where only probe-bound DNA fragments are sequenced. The captured metagenomes were highly enriched with targeted genes while maintaining their target diversity and their taxonomic distribution correlated well with the traditional ribosomal sequencing. The captured metagenomes were highly enriched with genes related to organic matter degradation; at least five times more than similar, publicly available soil WMG projects. This target enrichment technique also preserves the functional representation of the soils, thereby facilitating comparative metagenomics projects. Here, we present the first study that applies the captured metagenomics approach in large scale, and this novel method allows deep investigations of central ecosystem processes by studying functional gene abundances. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  5. Metagenomic Diagnosis of Bacterial Infections

    Science.gov (United States)

    Nakamura, Shota; Maeda, Norihiro; Miron, Ionut Mihai; Yoh, Myonsun; Izutsu, Kaori; Kataoka, Chidoh; Honda, Takeshi; Yasunaga, Teruo; Nakaya, Takaaki; Kawai, Jun; Hayashizaki, Yoshihide; Horii, Toshihiro

    2008-01-01

    To test the ability of high-throughput DNA sequencing to detect bacterial pathogens, we used it on DNA from a patient’s feces during and after diarrheal illness. Sequences showing best matches for Campylobacter jejuni were detected only in the illness sample. Various bacteria may be detectable with this metagenomic approach. PMID:18976571

  6. Information Extraction with Character-level Neural Networks and Free Noisy Supervision

    OpenAIRE

    Meerkamp, Philipp; Zhou, Zhengyi

    2016-01-01

    We present an architecture for information extraction from text that augments an existing parser with a character-level neural network. The network is trained using a measure of consistency of extracted data with existing databases as a form of noisy supervision. Our architecture combines the ability of constraint-based information extraction systems to easily incorporate domain knowledge and constraints with the ability of deep neural networks to leverage large amounts of data to learn compl...

  7. Towards an information extraction and knowledge formation framework based on Shannon entropy

    Directory of Open Access Journals (Sweden)

    Iliescu Dragoș

    2017-01-01

    Full Text Available Information quantity subject is approached in this paperwork, considering the specific domain of nonconforming product management as information source. This work represents a case study. Raw data were gathered from a heavy industrial works company, information extraction and knowledge formation being considered herein. Involved method for information quantity estimation is based on Shannon entropy formula. Information and entropy spectrum are decomposed and analysed for extraction of specific information and knowledge-that formation. The result of the entropy analysis point out the information needed to be acquired by the involved organisation, this being presented as a specific knowledge type.

  8. Marine Metagenome as A Resource for Novel Enzymes

    KAUST Repository

    Alma’abadi, Amani D.

    2015-11-10

    More than 99% of identified prokaryotes, including many from the marine environment, cannot be cultured in the laboratory. This lack of capability restricts our knowledge of microbial genetics and community ecology. Metagenomics, the culture-independent cloning of environmental DNAs that are isolated directly from an environmental sample, has already provided a wealth of information about the uncultured microbial world. It has also facilitated the discovery of novel biocatalysts by allowing researchers to probe directly into a huge diversity of enzymes within natural microbial communities. Recent advances in these studies have led to great interest in recruiting microbial enzymes for the development of environmentally-friendly industry. Although the metagenomics approach has many limitations, it is expected to provide not only scientific insights but also economic benefits, especially in industry. This review highlights the importance of metagenomics in mining microbial lipases, as an example, by using high-throughput techniques. In addition, we discuss challenges in the metagenomics as an important part of bioinformatics analysis in big data.

  9. Marine Metagenome as A Resource for Novel Enzymes

    Directory of Open Access Journals (Sweden)

    Amani D. Alma’abadi

    2015-10-01

    Full Text Available More than 99% of identified prokaryotes, including many from the marine environment, cannot be cultured in the laboratory. This lack of capability restricts our knowledge of microbial genetics and community ecology. Metagenomics, the culture-independent cloning of environmental DNAs that are isolated directly from an environmental sample, has already provided a wealth of information about the uncultured microbial world. It has also facilitated the discovery of novel biocatalysts by allowing researchers to probe directly into a huge diversity of enzymes within natural microbial communities. Recent advances in these studies have led to a great interest in recruiting microbial enzymes for the development of environmentally-friendly industry. Although the metagenomics approach has many limitations, it is expected to provide not only scientific insights but also economic benefits, especially in industry. This review highlights the importance of metagenomics in mining microbial lipases, as an example, by using high-throughput techniques. In addition, we discuss challenges in the metagenomics as an important part of bioinformatics analysis in big data.

  10. Deployment and Preparation of Metagenomic Analysis on the EELA Grid

    International Nuclear Information System (INIS)

    Aparicio, G.; Blanquer, I.; Hernandez, V.; Pignatelli, M.; Tamames, J.

    2007-01-01

    In many cases, the sequencing of the DNA of many microorganisms is hindered by the impossibility of growing significant samples of isolated specimens. Many bacteria cannot survive alone, and require the interaction with other organisms. In such cases, the information of the DNA available belongs to different kinds of organisms. Metagenomic studies aim at processing samples of multiple specimens to extract the genes and proteins that belong to the different species. This can be achieved through a process of extraction of fragment, comparison and analysis of the function. By the comparison to existing chains, whose function is well known, fragments can be classified. This process is computationally expensive and requires several iterations of alignment and phylogeny classification steps. Source samples reach several millions of sequences, which could reach up to thousands of nucleotides each. These sequences are compared to a selected part of the N on-redundant d atabase which only implies the information from eukaryotic species. From this first analysis, a refining process is performed and alignment analysis is restarted from the results. This process implies several CPU years. An environment has been developed to fragment, automate and check the above operations. This environment has been tuned-up from an experimental study which has tested the most efficient and reliable resources, the optimal job size, and the data transference and database reindexation overhead. The environment should re-submit faulty jobs, detect endless tasks and ensure that the results are correctly retrieved and work flow synchronised. The paper will give an outline on the structure of the system, and the preparation steps performed to deal with this experiment. (Author)

  11. Accurate and Automatic Building Roof Extraction Using Neighborhood Information of Point Clouds

    Directory of Open Access Journals (Sweden)

    ZHAO Chuan

    2017-09-01

    Full Text Available High accuracy building roof extraction from LiDAR data is the key to build topological relationship of building roofs and reconstruct buildings. Aiming at the poor adaptation and low extraction precision of existing roof extraction methods for complex building, an accurate and automatic building roof extraction method using neighborhood information of point clouds is proposed. Point clouds features are calculated by principle component analysis, and reliable seed points are selected after feature histogram construction. Initial roof surfaces are extracted quickly and precisely by the proposed local normal vector distribution density-based spatial clustering of applications with noise (LNVD-DBSCAN. Roof competition problem is solved effectively by the poll model based on neighborhood information. Experimental results show that the proposed method can extract building roofs automatically and precisely, and has preferable adaptation to buildings with different complexity, which is able to provide reliable roof information for building reconstruction.

  12. Extracting local information from crowds through betting markets

    Science.gov (United States)

    Weijs, Steven

    2015-04-01

    In this research, a set-up is considered in which users can bet against a forecasting agency to challenge their probabilistic forecasts. From an information theory standpoint, a reward structure is considered that either provides the forecasting agency with better information, paying the successful providers of information for their winning bets, or funds excellent forecasting agencies through users that think they know better. Especially for local forecasts, the approach may help to diagnose model biases and to identify local predictive information that can be incorporated in the models. The challenges and opportunities for implementing such a system in practice are also discussed.

  13. Tentacle: distributed quantification of genes in metagenomes

    OpenAIRE

    Boulund, Fredrik; Sjögren, Anders; Kristiansson, Erik

    2015-01-01

    Background In metagenomics, microbial communities are sequenced at increasingly high resolution, generating datasets with billions of DNA fragments. Novel methods that can efficiently process the growing volumes of sequence data are necessary for the accurate analysis and interpretation of existing and upcoming metagenomes. Findings Here we present Tentacle, which is a novel framework that uses distributed computational resources for gene quantification in metagenomes. Tentacle is implemented...

  14. A statistical toolbox for metagenomics: assessing functional diversity in microbial communities

    Directory of Open Access Journals (Sweden)

    Handelsman Jo

    2008-01-01

    Full Text Available Abstract Background The 99% of bacteria in the environment that are recalcitrant to culturing have spurred the development of metagenomics, a culture-independent approach to sample and characterize microbial genomes. Massive datasets of metagenomic sequences have been accumulated, but analysis of these sequences has focused primarily on the descriptive comparison of the relative abundance of proteins that belong to specific functional categories. More robust statistical methods are needed to make inferences from metagenomic data. In this study, we developed and applied a suite of tools to describe and compare the richness, membership, and structure of microbial communities using peptide fragment sequences extracted from metagenomic sequence data. Results Application of these tools to acid mine drainage, soil, and whale fall metagenomic sequence collections revealed groups of peptide fragments with a relatively high abundance and no known function. When combined with analysis of 16S rRNA gene fragments from the same communities these tools enabled us to demonstrate that although there was no overlap in the types of 16S rRNA gene sequence observed, there was a core collection of operational protein families that was shared among the three environments. Conclusion The results of comparisons between the three habitats were surprising considering the relatively low overlap of membership and the distinctively different characteristics of the three habitats. These tools will facilitate the use of metagenomics to pursue statistically sound genome-based ecological analyses.

  15. Information extraction from topographic map using colour and shape ...

    Indian Academy of Sciences (India)

    Scanning of topographic map provides a comprehensive digital repository of topographic maps developed by Survey of ... Govern- ment and agencies capture digital information or convert existing analogous map information. For instance, maps .... transformations from one pattern to another. An ontology driven pattern ...

  16. Metagenomics and the protein universe

    Science.gov (United States)

    Godzik, Adam

    2011-01-01

    Metagenomics sequencing projects have dramatically increased our knowledge of the protein universe and provided over one-half of currently known protein sequences; they have also introduced a much broader phylogenetic diversity into the protein databases. The full analysis of metagenomic datasets is only beginning, but it has already led to the discovery of thousands of new protein families, likely representing novel functions specific to given environments. At the same time, a deeper analysis of such novel families, including experimental structure determination of some representatives, suggests that most of them represent distant homologs of already characterized protein families, and thus most of the protein diversity present in the new environments are due to functional divergence of the known protein families rather than the emergence of new ones. PMID:21497084

  17. Spoken Language Understanding Systems for Extracting Semantic Information from Speech

    CERN Document Server

    Tur, Gokhan

    2011-01-01

    Spoken language understanding (SLU) is an emerging field in between speech and language processing, investigating human/ machine and human/ human communication by leveraging technologies from signal processing, pattern recognition, machine learning and artificial intelligence. SLU systems are designed to extract the meaning from speech utterances and its applications are vast, from voice search in mobile devices to meeting summarization, attracting interest from both commercial and academic sectors. Both human/machine and human/human communications can benefit from the application of SLU, usin

  18. Sifting Through Chaos: Extracting Information from Unstructured Legal Opinions.

    Science.gov (United States)

    Oliveira, Bruno Miguel; Guimarães, Rui Vasconcellos; Antunes, Luís; Rodrigues, Pedro Pereira

    2018-01-01

    Abiding to the law is, in some cases, a delicate balance between the rights of different players. Re-using health records is such a case. While the law grants reuse rights to public administration documents, in which health records produced in public health institutions are included, it also grants privacy to personal records. To safeguard a correct usage of data, public hospitals in Portugal employ jurists that are responsible for allowing or withholding access rights to health records. To help decision making, these jurists can consult the legal opinions issued by the national committee on public administration documents usage. While these legal opinions are of undeniable value, due to their doctrine contribution, they are only available in a format best suited from printing, forcing individual consultation of each document, with no option, whatsoever of clustered search, filtering or indexing, which are standard operations nowadays in a document management system. When having to decide on tens of data requests a day, it becomes unfeasible to consult the hundreds of legal opinions already available. With the objective to create a modern document management system, we devised an open, platform agnostic system that extracts and compiles the legal opinions, ex-tracts its contents and produces metadata, allowing for a fast searching and filtering of said legal opinions.

  19. Integrative Workflows for Metagenomic Analysis

    Directory of Open Access Journals (Sweden)

    Efthymios eLadoukakis

    2014-11-01

    Full Text Available The rapid evolution of all sequencing technologies, described by the term Next Generation Sequencing (NGS, have revolutionized metagenomic analysis. They constitute a combination of high-throughput analytical protocols, coupled to delicate measuring techniques, in order to potentially discover, properly assemble and map allelic sequences to the correct genomes, achieving particularly high yields for only a fraction of the cost of traditional processes (i.e. Sanger. From a bioinformatic perspective, this boils down to many gigabytes of data being generated from each single sequencing experiment, rendering the management or even the storage, critical bottlenecks with respect to the overall analytical endeavor. The enormous complexity is even more aggravated by the versatility of the processing steps available, represented by the numerous bioinformatic tools that are essential, for each analytical task, in order to fully unveil the genetic content of a metagenomic dataset. These disparate tasks range from simple, nonetheless non-trivial, quality control of raw data to exceptionally complex protein annotation procedures, requesting a high level of expertise for their proper application or the neat implementation of the whole workflow. Furthermore, a bioinformatic analysis of such scale, requires grand computational resources, imposing as the sole realistic solution, the utilization of cloud computing infrastructures. In this review article we discuss different, integrative, bioinformatic solutions available, which address the aforementioned issues, by performing a critical assessment of the available automated pipelines for data management, quality control and annotation of metagenomic data, embracing various, major sequencing technologies and applications.

  20. Extraction of Data from a Hospital Information System to Perform Process Mining.

    Science.gov (United States)

    Neira, Ricardo Alfredo Quintano; de Vries, Gert-Jan; Caffarel, Jennifer; Stretton, Erin

    2017-01-01

    The aim of this work is to share our experience in relevant data extraction from a hospital information system in preparation for a research study using process mining techniques. The steps performed were: research definition, mapping the normative processes, identification of tables and fields names of the database, and extraction of data. We then offer lessons learned during data extraction phase. Any errors made in the extraction phase will propagate and have implications on subsequent analyses. Thus, it is essential to take the time needed and devote sufficient attention to detail to perform all activities with the goal of ensuring high quality of the extracted data. We hope this work will be informative for other researchers to plan and execute extraction of data for process mining research studies.

  1. Metagenomics as a tool to obtain full genomes of process-critical bacteria in engineered systems

    DEFF Research Database (Denmark)

    Albertsen, Mads; Hugenholtz, Philip; Tyson, Gene W.

    parameters with functions of specific bacteria within the ecosystems in order to decipher principles that might be used to control and predict ecosystem performance. The main bottleneck in obtaining genomes from the environment is that the vast majority of bacteria are not readily cultured. Metagenomics...... sequenced two metagenomes from the same environmental sample, but using two independent DNA extraction methods, which resulted in different population abundances. This allowed sequence-composition independent binning of numerous high quality draft genomes from both high and low abundant members...... of the bacteria, including time-series. Using more than two metagenomes increases the binning resolution and hence the number of genomes that can be extracted. We are currently at a tipping point in microbial ecology – in the future it will be fast, cheap and easy to obtain genomes directly from the environment...

  2. [Metagenomics and biodiversity of sphagnum bogs].

    Science.gov (United States)

    Rusin, L Yu

    2016-01-01

    Biodiversity of sphagnum bogs is one of the richest and less studied, while these ecosystems are among the top ones in ecological, conservation, and economic value. Recent studies focused on the prokaryotic consortia associated with sphagnum mosses, and revealed the factors that maintain sustainability and productivity of bog ecosystems. High-throughput sequencing technologies provided insight into functional diversity of moss microbial communities (microbiomes), and helped to identify the biochemical pathways and gene families that facilitate the spectrum of adaptive strategies and largely foster the very successful colonization of the Northern hemisphere by sphagnum mosses. Rich and valuable information obtained on microbiomes of peat bogs sets off the paucity of evidence on their eukaryotic diversity. Prospects and expectations of reliable assessment of taxonomic profiles, relative abundance of taxa, and hidden biodiversity of microscopic eukaryotes in sphagnum bog ecosystems are briefly outlined in the context of today's metagenomics.

  3. Extraction of Graph Information Based on Image Contents and the Use of Ontology

    Science.gov (United States)

    Kanjanawattana, Sarunya; Kimura, Masaomi

    2016-01-01

    A graph is an effective form of data representation used to summarize complex information. Explicit information such as the relationship between the X- and Y-axes can be easily extracted from a graph by applying human intelligence. However, implicit knowledge such as information obtained from other related concepts in an ontology also resides in…

  4. Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures.

    Science.gov (United States)

    Humphreys, K; Demetriou, G; Gaizauskas, R

    2000-01-01

    Information extraction technology, as defined and developed through the U.S. DARPA Message Understanding Conferences (MUCs), has proved successful at extracting information primarily from newswire texts and primarily in domains concerned with human activity. In this paper we consider the application of this technology to the extraction of information from scientific journal papers in the area of molecular biology. In particular, we describe how an information extraction system designed to participate in the MUC exercises has been modified for two bioinformatics applications: EMPathIE, concerned with enzyme and metabolic pathways; and PASTA, concerned with protein structure. Progress to date provides convincing grounds for believing that IE techniques will deliver novel and effective ways for scientists to make use of the core literature which defines their disciplines.

  5. Analysis of space-borne data for coastal zone information extraction of Goa Coast, India

    Digital Repository Service at National Institute of Oceanography (India)

    Kunte, P.D.; Wagle, B.G.

    Space-borne data covering the coastal zone of Goa State were processed using digital and visual image-processing techniques to extract information about the coastal zone. Digital image processing of thematic data included principal component...

  6. ADVANCED EXTRACTION OF SPATIAL INFORMATION FROM HIGH RESOLUTION SATELLITE DATA

    Directory of Open Access Journals (Sweden)

    T. Pour

    2016-06-01

    Full Text Available In this paper authors processed five satellite image of five different Middle-European cities taken by five different sensors. The aim of the paper was to find methods and approaches leading to evaluation and spatial data extraction from areas of interest. For this reason, data were firstly pre-processed using image fusion, mosaicking and segmentation processes. Results going into the next step were two polygon layers; first one representing single objects and the second one representing city blocks. In the second step, polygon layers were classified and exported into Esri shapefile format. Classification was partly hierarchical expert based and partly based on the tool SEaTH used for separability distinction and thresholding. Final results along with visual previews were attached to the original thesis. Results are evaluated visually and statistically in the last part of the paper. In the discussion author described difficulties of working with data of large size, taken by different sensors and different also thematically.

  7. Extracting information of fixational eye movements through pupil tracking

    Science.gov (United States)

    Xiao, JiangWei; Qiu, Jian; Luo, Kaiqin; Peng, Li; Han, Peng

    2018-01-01

    Human eyes are never completely static even when they are fixing a stationary point. These irregular, small movements, which consist of micro-tremors, micro-saccades and drifts, can prevent the fading of the images that enter our eyes. The importance of researching the fixational eye movements has been experimentally demonstrated recently. However, the characteristics of fixational eye movements and their roles in visual process have not been explained clearly, because these signals can hardly be completely extracted by now. In this paper, we developed a new eye movement detection device with a high-speed camera. This device includes a beam splitter mirror, an infrared light source and a high-speed digital video camera with a frame rate of 200Hz. To avoid the influence of head shaking, we made the device wearable by fixing the camera on a safety helmet. Using this device, the experiments of pupil tracking were conducted. By localizing the pupil center and spectrum analysis, the envelope frequency spectrum of micro-saccades, micro-tremors and drifts are shown obviously. The experimental results show that the device is feasible and effective, so that the device can be applied in further characteristic analysis.

  8. Financial Information Extraction Using Pre-defined and User-definable Templates in the LOLITA System

    OpenAIRE

    Costantino, Marco; Morgan, Richard G.; Collingham, Russell J.

    1996-01-01

    This paper addresses the issue of information extraction in the financial domain within the framework of a large Natural Language Processing system: LOLITA. The LOLITA system, Large-scale Object-based Linguistic Interactor Translator and Analyser, is a general purpose natural language processing system. Different kinds of applications have been built around the system's core. One of these is the financial information extraction application, which has been designed in close contact with expert...

  9. Lithium NLP: A System for Rich Information Extraction from Noisy User Generated Text on Social Media

    OpenAIRE

    Bhargava, Preeti; Spasojevic, Nemanja; Hu, Guoning

    2017-01-01

    In this paper, we describe the Lithium Natural Language Processing (NLP) system - a resource-constrained, high- throughput and language-agnostic system for information extraction from noisy user generated text on social media. Lithium NLP extracts a rich set of information including entities, topics, hashtags and sentiment from text. We discuss several real world applications of the system currently incorporated in Lithium products. We also compare our system with existing commercial and acad...

  10. Metagenomic data analysis : computational methods and applications

    NARCIS (Netherlands)

    Gori, F.

    2013-01-01

    Metagenomics is the study of the genomic content of microbial communities, acquired through DNA sequencing technology. The main advantage of metagenomics is that it can overcome the limitations of individual genome sequencing, that can work only on the few culturable microbes. Unfortunately, the

  11. Back to the Future of Soil Metagenomics.\

    Czech Academy of Sciences Publication Activity Database

    Nesme J, J.; Achouak, W.; Agathos SN, S.N.; Bailey, M.; Baldrian, Petr; Brunel, D.; Frostegård, Å.; Heulin, T.; Jansson JK, J.K.; Jurkevitch, E.; Kruus, K.L.; Kowalchuk, G.A.; Lagares, A.; Lapin-Scott, H.M.; Lemanceau, P.; Le Paslier, D.; Mandic-Mulec, I.; Murrell, J.C.; Myrold, D.D.; Nalin, R.; Nannipieri, P.; Neufeld, J.D.; O'Gara, F.; Parnell, J.J.; Pühler, A.; Pylro, V.; Ramos, J.L.; Roesch, L.F.; Schloter, M.; Schleper, C.; Sczyrba, A.; Sessitsch, A.; Sjöling, S.; Sørensen, J.; Sørensen, S.J.; Tebbe, C.C.; Topp, E.; Tsiamis, G.; van Elsas, J.D.; van Keulen, G.; Widmer, F.; Wagner, M.; Zhang, T.; Zhang, X.; Zhao, L; Zhu, Y-G.; Vogel, T.M.; Simonet, P.

    2016-01-01

    Roč. 7, FEB 10 (2016), s. 73 ISSN 1664-302X Institutional support: RVO:61388971 Keywords : metagenomic * soil microbiology; terrestrial microbiology * metagenomic; soil microbiology; terrestrial microbiology Subject RIV: EE - Microbiology, Virology Impact factor: 4.076, year: 2016

  12. Culture-independent discovery of natural products from soil metagenomes.

    Science.gov (United States)

    Katz, Micah; Hover, Bradley M; Brady, Sean F

    2016-03-01

    Bacterial natural products have proven to be invaluable starting points in the development of many currently used therapeutic agents. Unfortunately, traditional culture-based methods for natural product discovery have been deemphasized by pharmaceutical companies due in large part to high rediscovery rates. Culture-independent, or "metagenomic," methods, which rely on the heterologous expression of DNA extracted directly from environmental samples (eDNA), have the potential to provide access to metabolites encoded by a large fraction of the earth's microbial biosynthetic diversity. As soil is both ubiquitous and rich in bacterial diversity, it is an appealing starting point for culture-independent natural product discovery efforts. This review provides an overview of the history of soil metagenome-driven natural product discovery studies and elaborates on the recent development of new tools for sequence-based, high-throughput profiling of environmental samples used in discovering novel natural product biosynthetic gene clusters. We conclude with several examples of these new tools being employed to facilitate the recovery of novel secondary metabolite encoding gene clusters from soil metagenomes and the subsequent heterologous expression of these clusters to produce bioactive small molecules.

  13. Cyclodipeptides from metagenomic library of a japanese marine sponge

    International Nuclear Information System (INIS)

    He, Rui; Wang, Bochu; Zhub, Liancai; Wang, Manyuan; Wakimoto, Toshiyuki; Abe, Ikuro

    2013-01-01

    Culture-independent metagenomics is an attractive and promising approach to explore unique bioactive small molecules from marine sponges harboring uncultured symbiotic microbes. Therefore, we conducted functional screening of the metagenomic library constructed from the Japanese marine sponge Discodermia calyx. Bioassay-guided fractionation of plate culture extract of antibacterial clone pDC113 afforded eleven cyclodipeptides: Cyclo(l-Thr-l-Leu) (1), Cyclo(l-Val-d-Pro) (2), Cyclo(l-Ile-d-Pro) (3), Cyclo(l-Leu-l-Pro) (4), Cyclo(l-Val-l-Leu) (5), Cyclo(l-Leu-l-Ile) (6), Cyclo(l-Leu-l-Leu) (7), Cyclo(l-Phe-l-Tyr) (8), Cyclo(l-Trp-l-Pro) (9), Cyclo(l-Val-l-Trp) (10) and Cyclo(l-Ile-l-Trp) (11). To the best of our knowledge, these are first cyclodepeptides isolated from metagenomic library. Sequence analysis suggested that isolated cyclodipeptides were not synthesized by nonribosomal peptide synthetases and there was no significant indication of cyclodipeptide synthetases. (author)

  14. MOCAT: a metagenomics assembly and gene prediction toolkit.

    Directory of Open Access Journals (Sweden)

    Jens Roat Kultima

    Full Text Available MOCAT is a highly configurable, modular pipeline for fast, standardized processing of single or paired-end sequencing data generated by the Illumina platform. The pipeline uses state-of-the-art programs to quality control, map, and assemble reads from metagenomic samples sequenced at a depth of several billion base pairs, and predict protein-coding genes on assembled metagenomes. Mapping against reference databases allows for read extraction or removal, as well as abundance calculations. Relevant statistics for each processing step can be summarized into multi-sheet Excel documents and queryable SQL databases. MOCAT runs on UNIX machines and integrates seamlessly with the SGE and PBS queuing systems, commonly used to process large datasets. The open source code and modular architecture allow users to modify or exchange the programs that are utilized in the various processing steps. Individual processing steps and parameters were benchmarked and tested on artificial, real, and simulated metagenomes resulting in an improvement of selected quality metrics. MOCAT can be freely downloaded at http://www.bork.embl.de/mocat/.

  15. BeerDeCoded: the open beer metagenome project

    Science.gov (United States)

    Sobel, Jonathan; Henry, Luc; Rotman, Nicolas; Rando, Gianpaolo

    2017-01-01

    Next generation sequencing has radically changed research in the life sciences, in both academic and corporate laboratories. The potential impact is tremendous, yet a majority of citizens have little or no understanding of the technological and ethical aspects of this widespread adoption. We designed BeerDeCoded as a pretext to discuss the societal issues related to genomic and metagenomic data with fellow citizens, while advancing scientific knowledge of the most popular beverage of all. In the spirit of citizen science, sample collection and DNA extraction were carried out with the participation of non-scientists in the community laboratory of Hackuarium, a not-for-profit organisation that supports unconventional research and promotes the public understanding of science. The dataset presented herein contains the targeted metagenomic profile of 39 bottled beers from 5 countries, based on internal transcribed spacer (ITS) sequencing of fungal species. A preliminary analysis reveals the presence of a large diversity of wild yeast species in commercial brews. With this project, we demonstrate that coupling simple laboratory procedures that can be carried out in a non-professional environment with state-of-the-art sequencing technologies and targeted metagenomic analyses, can lead to the detection and identification of the microbial content in bottled beer. PMID:29123645

  16. Cyclodipeptides from metagenomic library of a japanese marine sponge

    Energy Technology Data Exchange (ETDEWEB)

    He, Rui; Wang, Bochu; Zhub, Liancai, E-mail: wangbc2000@126.com [Bioengineering College, Chongqing University, Chongqing, (China); Wang, Manyuan [School of Traditional Chinese Medicine, Capital University of Medical Sciences, Beijing (China); Wakimoto, Toshiyuki; Abe, Ikuro, E-mail: abei@mol.f.u-tokyo.ac.jp [Graduate School of Pharmaceutical Sciences, The University of Tokyo, Tokyo (Japan)

    2013-12-01

    Culture-independent metagenomics is an attractive and promising approach to explore unique bioactive small molecules from marine sponges harboring uncultured symbiotic microbes. Therefore, we conducted functional screening of the metagenomic library constructed from the Japanese marine sponge Discodermia calyx. Bioassay-guided fractionation of plate culture extract of antibacterial clone pDC113 afforded eleven cyclodipeptides: Cyclo(l-Thr-l-Leu) (1), Cyclo(l-Val-d-Pro) (2), Cyclo(l-Ile-d-Pro) (3), Cyclo(l-Leu-l-Pro) (4), Cyclo(l-Val-l-Leu) (5), Cyclo(l-Leu-l-Ile) (6), Cyclo(l-Leu-l-Leu) (7), Cyclo(l-Phe-l-Tyr) (8), Cyclo(l-Trp-l-Pro) (9), Cyclo(l-Val-l-Trp) (10) and Cyclo(l-Ile-l-Trp) (11). To the best of our knowledge, these are first cyclodepeptides isolated from metagenomic library. Sequence analysis suggested that isolated cyclodipeptides were not synthesized by nonribosomal peptide synthetases and there was no significant indication of cyclodipeptide synthetases. (author)

  17. BeerDeCoded: the open beer metagenome project.

    Science.gov (United States)

    Sobel, Jonathan; Henry, Luc; Rotman, Nicolas; Rando, Gianpaolo

    2017-01-01

    Next generation sequencing has radically changed research in the life sciences, in both academic and corporate laboratories. The potential impact is tremendous, yet a majority of citizens have little or no understanding of the technological and ethical aspects of this widespread adoption. We designed BeerDeCoded as a pretext to discuss the societal issues related to genomic and metagenomic data with fellow citizens, while advancing scientific knowledge of the most popular beverage of all. In the spirit of citizen science, sample collection and DNA extraction were carried out with the participation of non-scientists in the community laboratory of Hackuarium, a not-for-profit organisation that supports unconventional research and promotes the public understanding of science. The dataset presented herein contains the targeted metagenomic profile of 39 bottled beers from 5 countries, based on internal transcribed spacer (ITS) sequencing of fungal species. A preliminary analysis reveals the presence of a large diversity of wild yeast species in commercial brews. With this project, we demonstrate that coupling simple laboratory procedures that can be carried out in a non-professional environment with state-of-the-art sequencing technologies and targeted metagenomic analyses, can lead to the detection and identification of the microbial content in bottled beer.

  18. Data-Driven Information Extraction from Chinese Electronic Medical Records.

    Directory of Open Access Journals (Sweden)

    Dong Xu

    Full Text Available This study aims to propose a data-driven framework that takes unstructured free text narratives in Chinese Electronic Medical Records (EMRs as input and converts them into structured time-event-description triples, where the description is either an elaboration or an outcome of the medical event.Our framework uses a hybrid approach. It consists of constructing cross-domain core medical lexica, an unsupervised, iterative algorithm to accrue more accurate terms into the lexica, rules to address Chinese writing conventions and temporal descriptors, and a Support Vector Machine (SVM algorithm that innovatively utilizes Normalized Google Distance (NGD to estimate the correlation between medical events and their descriptions.The effectiveness of the framework was demonstrated with a dataset of 24,817 de-identified Chinese EMRs. The cross-domain medical lexica were capable of recognizing terms with an F1-score of 0.896. 98.5% of recorded medical events were linked to temporal descriptors. The NGD SVM description-event matching achieved an F1-score of 0.874. The end-to-end time-event-description extraction of our framework achieved an F1-score of 0.846.In terms of named entity recognition, the proposed framework outperforms state-of-the-art supervised learning algorithms (F1-score: 0.896 vs. 0.886. In event-description association, the NGD SVM is superior to SVM using only local context and semantic features (F1-score: 0.874 vs. 0.838.The framework is data-driven, weakly supervised, and robust against the variations and noises that tend to occur in a large corpus. It addresses Chinese medical writing conventions and variations in writing styles through patterns used for discovering new terms and rules for updating the lexica.

  19. Extracting information from textual documents in the electronic health record: a review of recent research.

    Science.gov (United States)

    Meystre, S M; Savova, G K; Kipper-Schuler, K C; Hurdle, J F

    2008-01-01

    We examine recent published research on the extraction of information from textual documents in the Electronic Health Record (EHR). Literature review of the research published after 1995, based on PubMed, conference proceedings, and the ACM Digital Library, as well as on relevant publications referenced in papers already included. 174 publications were selected and are discussed in this review in terms of methods used, pre-processing of textual documents, contextual features detection and analysis, extraction of information in general, extraction of codes and of information for decision-support and enrichment of the EHR, information extraction for surveillance, research, automated terminology management, and data mining, and de-identification of clinical text. Performance of information extraction systems with clinical text has improved since the last systematic review in 1995, but they are still rarely applied outside of the laboratory they have been developed in. Competitive challenges for information extraction from clinical text, along with the availability of annotated clinical text corpora, and further improvements in system performance are important factors to stimulate advances in this field and to increase the acceptance and usage of these systems in concrete clinical and biomedical research contexts.

  20. Omnidirectional vision systems calibration, feature extraction and 3D information

    CERN Document Server

    Puig, Luis

    2013-01-01

    This work focuses on central catadioptric systems, from the early step of calibration to high-level tasks such as 3D information retrieval. The book opens with a thorough introduction to the sphere camera model, along with an analysis of the relation between this model and actual central catadioptric systems. Then, a new approach to calibrate any single-viewpoint catadioptric camera is described.  This is followed by an analysis of existing methods for calibrating central omnivision systems, and a detailed examination of hybrid two-view relations that combine images acquired with uncalibrated

  1. Extracting spatial information from large aperture exposures of diffuse sources

    Science.gov (United States)

    Clarke, J. T.; Moos, H. W.

    1981-01-01

    The spatial properties of large aperture exposures of diffuse emission can be used both to investigate spatial variations in the emission and to filter out camera noise in exposures of weak emission sources. Spatial imaging can be accomplished both parallel and perpendicular to dispersion with a resolution of 5-6 arc sec, and a narrow median filter running perpendicular to dispersion across a diffuse image selectively filters out point source features, such as reseaux marks and fast particle hits. Spatial information derived from observations of solar system objects is presented.

  2. Metagenomic applications in environmental monitoring and bioremediation.

    Science.gov (United States)

    Techtmann, Stephen M; Hazen, Terry C

    2016-10-01

    With the rapid advances in sequencing technology, the cost of sequencing has dramatically dropped and the scale of sequencing projects has increased accordingly. This has provided the opportunity for the routine use of sequencing techniques in the monitoring of environmental microbes. While metagenomic applications have been routinely applied to better understand the ecology and diversity of microbes, their use in environmental monitoring and bioremediation is increasingly common. In this review we seek to provide an overview of some of the metagenomic techniques used in environmental systems biology, addressing their application and limitation. We will also provide several recent examples of the application of metagenomics to bioremediation. We discuss examples where microbial communities have been used to predict the presence and extent of contamination, examples of how metagenomics can be used to characterize the process of natural attenuation by unculturable microbes, as well as examples detailing the use of metagenomics to understand the impact of biostimulation on microbial communities.

  3. Metagenomic analysis of microbial communities and beyond

    DEFF Research Database (Denmark)

    Schreiber, Lars

    2014-01-01

    From small clone libraries to large next-generation sequencing datasets – the field of community genomics or metagenomics has developed tremendously within the last years. This chapter will summarize some of these developments and will also highlight pitfalls of current metagenomic analyses. It w...... heterologous expression of metagenomic DNA fragments to discover novel metabolic functions. Lastly, the chapter will shortly discuss the meta-analysis of gene expression of microbial communities, more precisely metatranscriptomics and metaproteomics.......From small clone libraries to large next-generation sequencing datasets – the field of community genomics or metagenomics has developed tremendously within the last years. This chapter will summarize some of these developments and will also highlight pitfalls of current metagenomic analyses...

  4. Metagenomics: The Next Culture-Independent Game Changer

    Directory of Open Access Journals (Sweden)

    Jessica D. Forbes

    2017-07-01

    Full Text Available A trend towards the abandonment of obtaining pure culture isolates in frontline laboratories is at a crossroads with the ability of public health agencies to perform their basic mandate of foodborne disease surveillance and response. The implementation of culture-independent diagnostic tests (CIDTs including nucleic acid and antigen-based assays for acute gastroenteritis is leaving public health agencies without laboratory evidence to link clinical cases to each other and to food or environmental substances. This limits the efficacy of public health epidemiology and surveillance as well as outbreak detection and investigation. Foodborne outbreaks have the potential to remain undetected or have insufficient evidence to support source attribution and may inadvertently increase the incidence of foodborne diseases. Next-generation sequencing of pure culture isolates in clinical microbiology laboratories has the potential to revolutionize the fields of food safety and public health. Metagenomics and other ‘omics’ disciplines could provide the solution to a cultureless future in clinical microbiology, food safety and public health. Data mining of information obtained from metagenomics assays can be particularly useful for the identification of clinical causative agents or foodborne contamination, detection of AMR and/or virulence factors, in addition to providing high-resolution subtyping data. Thus, metagenomics assays may provide a universal test for clinical diagnostics, foodborne pathogen detection, subtyping and investigation. This information has the potential to reform the field of enteric disease diagnostics and surveillance and also infectious diseases as a whole. The aim of this review will be to present the current state of CIDTs in diagnostic and public health laboratories as they relate to foodborne illness and food safety. Moreover, we will also discuss the diagnostic and subtyping utility and concomitant bias limitations of

  5. Statistical techniques to extract information during SMAP soil moisture assimilation

    Science.gov (United States)

    Kolassa, J.; Reichle, R. H.; Liu, Q.; Alemohammad, S. H.; Gentine, P.

    2017-12-01

    Statistical techniques permit the retrieval of soil moisture estimates in a model climatology while retaining the spatial and temporal signatures of the satellite observations. As a consequence, the need for bias correction prior to an assimilation of these estimates is reduced, which could result in a more effective use of the independent information provided by the satellite observations. In this study, a statistical neural network (NN) retrieval algorithm is calibrated using SMAP brightness temperature observations and modeled soil moisture estimates (similar to those used to calibrate the SMAP Level 4 DA system). Daily values of surface soil moisture are estimated using the NN and then assimilated into the NASA Catchment model. The skill of the assimilation estimates is assessed based on a comprehensive comparison to in situ measurements from the SMAP core and sparse network sites as well as the International Soil Moisture Network. The NN retrieval assimilation is found to significantly improve the model skill, particularly in areas where the model does not represent processes related to agricultural practices. Additionally, the NN method is compared to assimilation experiments using traditional bias correction techniques. The NN retrieval assimilation is found to more effectively use the independent information provided by SMAP resulting in larger model skill improvements than assimilation experiments using traditional bias correction techniques.

  6. Research on Crowdsourcing Emergency Information Extraction of Based on Events' Frame

    Science.gov (United States)

    Yang, Bo; Wang, Jizhou; Ma, Weijun; Mao, Xi

    2018-01-01

    At present, the common information extraction method cannot extract the structured emergency event information accurately; the general information retrieval tool cannot completely identify the emergency geographic information; these ways also do not have an accurate assessment of these results of distilling. So, this paper proposes an emergency information collection technology based on event framework. This technique is to solve the problem of emergency information picking. It mainly includes emergency information extraction model (EIEM), complete address recognition method (CARM) and the accuracy evaluation model of emergency information (AEMEI). EIEM can be structured to extract emergency information and complements the lack of network data acquisition in emergency mapping. CARM uses a hierarchical model and the shortest path algorithm and allows the toponomy pieces to be joined as a full address. AEMEI analyzes the results of the emergency event and summarizes the advantages and disadvantages of the event framework. Experiments show that event frame technology can solve the problem of emergency information drawing and provides reference cases for other applications. When the emergency disaster is about to occur, the relevant departments query emergency's data that has occurred in the past. They can make arrangements ahead of schedule which defense and reducing disaster. The technology decreases the number of casualties and property damage in the country and world. This is of great significance to the state and society.

  7. A Delphi Technology Foresight Study: Mapping Social Construction of Scientific Evidence on Metagenomics Tests for Water Safety.

    Directory of Open Access Journals (Sweden)

    Stanislav Birko

    Full Text Available Access to clean water is a grand challenge in the 21st century. Water safety testing for pathogens currently depends on surrogate measures such as fecal indicator bacteria (e.g., E. coli. Metagenomics concerns high-throughput, culture-independent, unbiased shotgun sequencing of DNA from environmental samples that might transform water safety by detecting waterborne pathogens directly instead of their surrogates. Yet emerging innovations such as metagenomics are often fiercely contested. Innovations are subject to shaping/construction not only by technology but also social systems/values in which they are embedded, such as experts' attitudes towards new scientific evidence. We conducted a classic three-round Delphi survey, comprised of 107 questions. A multidisciplinary expert panel (n = 24 representing the continuum of discovery scientists and policymakers evaluated the emergence of metagenomics tests. To the best of our knowledge, we report here the first Delphi foresight study of experts' attitudes on (1 the top 10 priority evidentiary criteria for adoption of metagenomics tests for water safety, (2 the specific issues critical to governance of metagenomics innovation trajectory where there is consensus or dissensus among experts, (3 the anticipated time lapse from discovery to practice of metagenomics tests, and (4 the role and timing of public engagement in development of metagenomics tests. The ability of a test to distinguish between harmful and benign waterborne organisms, analytical/clinical sensitivity, and reproducibility were the top three evidentiary criteria for adoption of metagenomics. Experts agree that metagenomic testing will provide novel information but there is dissensus on whether metagenomics will replace the current water safety testing methods or impact the public health end points (e.g., reduction in boil water advisories. Interestingly, experts view the publics relevant in a "downstream capacity" for adoption of

  8. A Delphi Technology Foresight Study: Mapping Social Construction of Scientific Evidence on Metagenomics Tests for Water Safety.

    Science.gov (United States)

    Birko, Stanislav; Dove, Edward S; Özdemir, Vural

    2015-01-01

    Access to clean water is a grand challenge in the 21st century. Water safety testing for pathogens currently depends on surrogate measures such as fecal indicator bacteria (e.g., E. coli). Metagenomics concerns high-throughput, culture-independent, unbiased shotgun sequencing of DNA from environmental samples that might transform water safety by detecting waterborne pathogens directly instead of their surrogates. Yet emerging innovations such as metagenomics are often fiercely contested. Innovations are subject to shaping/construction not only by technology but also social systems/values in which they are embedded, such as experts' attitudes towards new scientific evidence. We conducted a classic three-round Delphi survey, comprised of 107 questions. A multidisciplinary expert panel (n = 24) representing the continuum of discovery scientists and policymakers evaluated the emergence of metagenomics tests. To the best of our knowledge, we report here the first Delphi foresight study of experts' attitudes on (1) the top 10 priority evidentiary criteria for adoption of metagenomics tests for water safety, (2) the specific issues critical to governance of metagenomics innovation trajectory where there is consensus or dissensus among experts, (3) the anticipated time lapse from discovery to practice of metagenomics tests, and (4) the role and timing of public engagement in development of metagenomics tests. The ability of a test to distinguish between harmful and benign waterborne organisms, analytical/clinical sensitivity, and reproducibility were the top three evidentiary criteria for adoption of metagenomics. Experts agree that metagenomic testing will provide novel information but there is dissensus on whether metagenomics will replace the current water safety testing methods or impact the public health end points (e.g., reduction in boil water advisories). Interestingly, experts view the publics relevant in a "downstream capacity" for adoption of metagenomics rather

  9. A novel bioinformatics strategy for searching industrially useful genome resources from metagenomic sequence libraries.

    Science.gov (United States)

    Uehara, Hiroshi; Iwasaki, Yuki; Wada, Chieko; Ikemura, Toshimichi; Abe, Takashi

    2011-01-01

    Although remarkable progress in metagenomic sequencing of various environmental samples has been made, large numbers of fragment sequences have been registered in the international DNA databanks, primarily without information on gene function and phylotype, and thus with limited usefulness. Industrial useful biological activity is often carried out by a set of genes, such as those constituting an operon. In this connection, metagenomic approaches have a weakness because sets of the genes are usually split up, since the sequences obtained by metagenome analyses are fragmented into 1-kb or much shorter segments. Therefore, even when a set of genes responsible for an industrially useful function is found in one metagenome library, it is usually difficult to know whether a single genome harbors the entire gene set or whether different genomes have individual genes. By modifying Self-Organizing Map (SOM), we previously developed BLSOM for oligonucleotide composition, which allowed classification (self-organization) of sequence fragments according to genomes. Because BLSOM could reassociate genomic fragments according to genomes, BLSOM may ameliorate the abovementioned weakness of metagenome analyses. Here, we have developed a strategy for clustering of metagenomic sequences according to phylotypes and genomes, by testing a gene set contributing to environment preservation.

  10. Vascular Extraction Using MRA Statistics and Gradient Information

    Directory of Open Access Journals (Sweden)

    Shifeng Zhao

    2018-01-01

    Full Text Available Brain vessel segmentation is a fundamental component of cerebral disease screening systems. However, detecting vessels is still a challenging task owing to their complex appearance and thinning geometry as well as the contrast decrease from the root of the vessel to its thin branches. We present a method for segmentation of the vasculature in Magnetic Resonance Angiography (MRA images. First, we apply volume projection, 2D segmentation, and back-projection procedures for first stage of background subtraction and vessel reservation. Those labeled as background or vessel voxels are excluded from consideration in later computation. Second, stochastic expectation maximization algorithm (SEM is used to estimate the probability density function (PDF of the remaining voxels, which are assumed to be mixture of one Rayleigh and two Gaussian distributions. These voxels can then be classified into background, middle region, or vascular structure. Third, we adapt the K-means method which is based on the gradient of remaining voxels to effectively detect true positives around boundaries of vessels. Experimental results on clinical cerebral data demonstrate that using gradient information as a further step improves the mixture model based segmentation of cerebral vasculature, in particular segmentation of the low contrast vasculature.

  11. A viral metagenomic approach on a non-metagenomic experiment: Mining next generation sequencing datasets from pig DNA identified several porcine parvoviruses for a retrospective evaluation of viral infections.

    Directory of Open Access Journals (Sweden)

    Samuele Bovo

    Full Text Available Shot-gun next generation sequencing (NGS on whole DNA extracted from specimens collected from mammals often produces reads that are not mapped (i.e. unmapped reads on the host reference genome and that are usually discarded as by-products of the experiments. In this study, we mined Ion Torrent reads obtained by sequencing DNA isolated from archived blood samples collected from 100 performance tested Italian Large White pigs. Two reduced representation libraries were prepared from two DNA pools constructed each from 50 equimolar DNA samples. Bioinformatic analyses were carried out to mine unmapped reads on the reference pig genome that were obtained from the two NGS datasets. In silico analyses included read mapping and sequence assembly approaches for a viral metagenomic analysis using the NCBI Viral Genome Resource. Our approach identified sequences matching several viruses of the Parvoviridae family: porcine parvovirus 2 (PPV2, PPV4, PPV5 and PPV6 and porcine bocavirus 1-H18 isolate (PBoV1-H18. The presence of these viruses was confirmed by PCR and Sanger sequencing of individual DNA samples. PPV2, PPV4, PPV5, PPV6 and PBoV1-H18 were all identified in samples collected in 1998-2007, 1998-2000, 1997-2000, 1998-2004 and 2003, respectively. For most of these viruses (PPV4, PPV5, PPV6 and PBoV1-H18 previous studies reported their first occurrence much later (from 5 to more than 10 years than our identification period and in different geographic areas. Our study provided a retrospective evaluation of apparently asymptomatic parvovirus infected pigs providing information that could be important to define occurrence and prevalence of different parvoviruses in South Europe. This study demonstrated the potential of mining NGS datasets non-originally derived by metagenomics experiments for viral metagenomics analyses in a livestock species.

  12. A viral metagenomic approach on a non-metagenomic experiment: Mining next generation sequencing datasets from pig DNA identified several porcine parvoviruses for a retrospective evaluation of viral infections.

    Science.gov (United States)

    Bovo, Samuele; Mazzoni, Gianluca; Ribani, Anisa; Utzeri, Valerio Joe; Bertolini, Francesca; Schiavo, Giuseppina; Fontanesi, Luca

    2017-01-01

    Shot-gun next generation sequencing (NGS) on whole DNA extracted from specimens collected from mammals often produces reads that are not mapped (i.e. unmapped reads) on the host reference genome and that are usually discarded as by-products of the experiments. In this study, we mined Ion Torrent reads obtained by sequencing DNA isolated from archived blood samples collected from 100 performance tested Italian Large White pigs. Two reduced representation libraries were prepared from two DNA pools constructed each from 50 equimolar DNA samples. Bioinformatic analyses were carried out to mine unmapped reads on the reference pig genome that were obtained from the two NGS datasets. In silico analyses included read mapping and sequence assembly approaches for a viral metagenomic analysis using the NCBI Viral Genome Resource. Our approach identified sequences matching several viruses of the Parvoviridae family: porcine parvovirus 2 (PPV2), PPV4, PPV5 and PPV6 and porcine bocavirus 1-H18 isolate (PBoV1-H18). The presence of these viruses was confirmed by PCR and Sanger sequencing of individual DNA samples. PPV2, PPV4, PPV5, PPV6 and PBoV1-H18 were all identified in samples collected in 1998-2007, 1998-2000, 1997-2000, 1998-2004 and 2003, respectively. For most of these viruses (PPV4, PPV5, PPV6 and PBoV1-H18) previous studies reported their first occurrence much later (from 5 to more than 10 years) than our identification period and in different geographic areas. Our study provided a retrospective evaluation of apparently asymptomatic parvovirus infected pigs providing information that could be important to define occurrence and prevalence of different parvoviruses in South Europe. This study demonstrated the potential of mining NGS datasets non-originally derived by metagenomics experiments for viral metagenomics analyses in a livestock species.

  13. Metagenomic approaches to understanding phylogenetic diversity in quorum sensing.

    Science.gov (United States)

    Kimura, Nobutada

    2014-04-01

    Quorum sensing, a form of cell-cell communication among bacteria, allows bacteria to synchronize their behaviors at the population level in order to control behaviors such as luminescence, biofilm formation, signal turnover, pigment production, antibiotics production, swarming, and virulence. A better understanding of quorum-sensing systems will provide us with greater insight into the complex interaction mechanisms used widely in the Bacteria and even the Archaea domain in the environment. Metagenomics, the use of culture-independent sequencing to study the genomic material of microorganisms, has the potential to provide direct information about the quorum-sensing systems in uncultured bacteria. This article provides an overview of the current knowledge of quorum sensing focused on phylogenetic diversity, and presents examples of studies that have used metagenomic techniques. Future technologies potentially related to quorum-sensing systems are also discussed.

  14. Visualization of health information with predications extracted using natural language processing and filtered using the UMLS.

    Science.gov (United States)

    Miller, Trudi; Leroy, Gondy

    2008-11-06

    Increased availability of and reliance on written health information can tax the abilities of unskilled readers. We are developing a system that uses natural language processing to extract phrases, identify medical terms using the UMLS, and visualize the propositions. This system substantially reduces the amount of information a consumer must read, while providing an alternative to traditional prose based text.

  15. Information analysis of iris biometrics for the needs of cryptology key extraction

    Directory of Open Access Journals (Sweden)

    Adamović Saša

    2013-01-01

    Full Text Available The paper presents a rigorous analysis of iris biometric information for the synthesis of an optimized system for the extraction of a high quality cryptology key. Estimations of local entropy and mutual information were identified as segments of the iris most suitable for this purpose. In order to optimize parameters, corresponding wavelets were transformed, in order to obtain the highest possible entropy and mutual information lower in the transformation domain, which set frameworks for the synthesis of systems for the extraction of truly random sequences of iris biometrics, without compromising authentication properties. [Projekat Ministarstva nauke Republike Srbije, br. TR32054 i br. III44006

  16. Bioprospecting metagenomes: glycosyl hydrolases for converting biomass

    Directory of Open Access Journals (Sweden)

    Monchy Sebastien

    2009-05-01

    Full Text Available Abstract Throughout immeasurable time, microorganisms evolved and accumulated remarkable physiological and functional heterogeneity, and now constitute the major reserve for genetic diversity on earth. Using metagenomics, namely genetic material recovered directly from environmental samples, this biogenetic diversification can be accessed without the need to cultivate cells. Accordingly, microbial communities and their metagenomes, isolated from biotopes with high turnover rates of recalcitrant biomass, such as lignocellulosic plant cell walls, have become a major resource for bioprospecting; furthermore, this material is a major asset in the search for new biocatalytics (enzymes for various industrial processes, including the production of biofuels from plant feedstocks. However, despite the contributions from metagenomics technologies consequent upon the discovery of novel enzymes, this relatively new enterprise requires major improvements. In this review, we compare function-based metagenome screening and sequence-based metagenome data mining, discussing the advantages and limitations of both methods. We also describe the unusual enzymes discovered via metagenomics approaches, and discuss the future prospects for metagenome technologies.

  17. Human milk metagenome: a functional capacity analysis

    Science.gov (United States)

    2013-01-01

    Background Human milk contains a diverse population of bacteria that likely influences colonization of the infant gastrointestinal tract. Recent studies, however, have been limited to characterization of this microbial community by 16S rRNA analysis. In the present study, a metagenomic approach using Illumina sequencing of a pooled milk sample (ten donors) was employed to determine the genera of bacteria and the types of bacterial open reading frames in human milk that may influence bacterial establishment and stability in this primal food matrix. The human milk metagenome was also compared to that of breast-fed and formula-fed infants’ feces (n = 5, each) and mothers’ feces (n = 3) at the phylum level and at a functional level using open reading frame abundance. Additionally, immune-modulatory bacterial-DNA motifs were also searched for within human milk. Results The bacterial community in human milk contained over 360 prokaryotic genera, with sequences aligning predominantly to the phyla of Proteobacteria (65%) and Firmicutes (34%), and the genera of Pseudomonas (61.1%), Staphylococcus (33.4%) and Streptococcus (0.5%). From assembled human milk-derived contigs, 30,128 open reading frames were annotated and assigned to functional categories. When compared to the metagenome of infants’ and mothers’ feces, the human milk metagenome was less diverse at the phylum level, and contained more open reading frames associated with nitrogen metabolism, membrane transport and stress response (P milk metagenome also contained a similar occurrence of immune-modulatory DNA motifs to that of infants’ and mothers’ fecal metagenomes. Conclusions Our results further expand the complexity of the human milk metagenome and enforce the benefits of human milk ingestion on the microbial colonization of the infant gut and immunity. Discovery of immune-modulatory motifs in the metagenome of human milk indicates more exhaustive analyses of the functionality of the human

  18. High-resolution metagenomics targets major functional types in complex microbial communities

    Energy Technology Data Exchange (ETDEWEB)

    Kalyuzhnaya, Marina G.; Lapidus, Alla; Ivanova, Natalia; Copeland, Alex C.; McHardy, Alice C.; Szeto, Ernest; Salamov, Asaf; Grigoriev, Igor V.; Suciu, Dominic; Levine, Samuel R.; Markowitz, Victor M.; Rigoutsos, Isidore; Tringe, Susannah G.; Bruce, David C.; Richardson, Paul M.; Lidstrom, Mary E.; Chistoserdova, Ludmila

    2009-08-01

    Most microbes in the biosphere remain uncultured and unknown. Whole genome shotgun (WGS) sequencing of environmental DNA (metagenomics) allows glimpses into genetic and metabolic potentials of natural microbial communities. However, in communities of high complexity metagenomics fail to link specific microbes to specific ecological functions. To overcome this limitation, we selectively targeted populations involved in oxidizing single-carbon (C{sub 1}) compounds in Lake Washington (Seattle, USA) by labeling their DNA via stable isotope probing (SIP), followed by WGS sequencing. Metagenome analysis demonstrated specific sequence enrichments in response to different C{sub 1} substrates, highlighting ecological roles of individual phylotypes. We further demonstrated the utility of our approach by extracting a nearly complete genome of a novel methylotroph Methylotenera mobilis, reconstructing its metabolism and conducting genome-wide analyses. This approach allowing high-resolution genomic analysis of ecologically relevant species has the potential to be applied to a wide variety of ecosystems.

  19. Extraction of Hidden Social Networks from Wiki-Environment Involved in Information Conflict

    OpenAIRE

    Rasim M. Alguliyev; Ramiz M. Aliguliyev; Irada Y. Alakbarova

    2016-01-01

    Social network analysis is a widely used technique to analyze relationships among wiki-users in Wikipedia. In this paper the method to identify hidden social networks participating in information conflicts in wiki-environment is proposed. In particular, we describe how text clustering techniques can be used for extraction of hidden social networks of wiki-users caused information conflict. By clustering unstructured text articles caused information conflict we ...

  20. [Land salinization information extraction method based on HSI hyperspectral and TM imagery].

    Science.gov (United States)

    Li, Jin; Zhao, Geng-Xing; Chang, Chun-Yan; Liu, Hai-Teng

    2014-02-01

    This paper chose the typical salinization area in Kenli County of the Yellow River Delta as the study area, selected HJ-1A satellite HSI image at March 15, 2011 and TM image at March 22, 2011 as source of information, and pre-processed these data by image cropping, geometric correction and atmospheric correction. Spectral characteristics of main land use types including different degree of salinization lands, water and shoals were analyzed to find distinct bands for information extraction Land use information extraction model was built by adopting the quantitative and qualitative rules combining the spectral characteristics and the content of soil salinity. Land salinization information was extracted via image classification using decision tree method. The remote sensing image interpretation accuracy was verified by land salinization degree, which was determined through soil salinity chemical analysis of soil sampling points. In addition, classification accuracy between the hyperspectral and multi-spectral images were analyzed and compared. The results showed that the overall image classification accuracy of HSI was 96.43%, Kappa coefficient was 95.59%; while the overall image classification accuracy of TM was 89.17%, Kappa coefficient was 86.74%. Therefore, compared to multi-spectral TM data, the hyperspectral imagery could be more accurate and efficient for land salinization information extraction. Also, the classification map showed that the soil salinity distinction degree of hyperspectral image was higher than that of multi-spectral image. This study explored the land salinization information extraction techniques from hyperspectral imagery, extracted the spatial distribution and area ratio information of different degree of salinization land, and provided decision-making basis for the scientific utilization and management of coastal salinization land resources in the Yellow River Delta.

  1. EBI metagenomics--a new resource for the analysis and archiving of metagenomic data.

    Science.gov (United States)

    Hunter, Sarah; Corbett, Matthew; Denise, Hubert; Fraser, Matthew; Gonzalez-Beltran, Alejandra; Hunter, Christopher; Jones, Philip; Leinonen, Rasko; McAnulla, Craig; Maguire, Eamonn; Maslen, John; Mitchell, Alex; Nuka, Gift; Oisel, Arnaud; Pesseat, Sebastien; Radhakrishnan, Rajesh; Rocca-Serra, Philippe; Scheremetjew, Maxim; Sterk, Peter; Vaughan, Daniel; Cochrane, Guy; Field, Dawn; Sansone, Susanna-Assunta

    2014-01-01

    Metagenomics is a relatively recently established but rapidly expanding field that uses high-throughput next-generation sequencing technologies to characterize the microbial communities inhabiting different ecosystems (including oceans, lakes, soil, tundra, plants and body sites). Metagenomics brings with it a number of challenges, including the management, analysis, storage and sharing of data. In response to these challenges, we have developed a new metagenomics resource (http://www.ebi.ac.uk/metagenomics/) that allows users to easily submit raw nucleotide reads for functional and taxonomic analysis by a state-of-the-art pipeline, and have them automatically stored (together with descriptive, standards-compliant metadata) in the European Nucleotide Archive.

  2. Reconstruction of ribosomal RNA genes from metagenomic data.

    Directory of Open Access Journals (Sweden)

    Lu Fan

    Full Text Available Direct sequencing of environmental DNA (metagenomics has a great potential for describing the 16S rRNA gene diversity of microbial communities. However current approaches using this 16S rRNA gene information to describe community diversity suffer from low taxonomic resolution or chimera problems. Here we describe a new strategy that involves stringent assembly and data filtering to reconstruct full-length 16S rRNA genes from metagenomicpyrosequencing data. Simulations showed that reconstructed 16S rRNA genes provided a true picture of the community diversity, had minimal rates of chimera formation and gave taxonomic resolution down to genus level. The strategy was furthermore compared to PCR-based methods to determine the microbial diversity in two marine sponges. This showed that about 30% of the abundant phylotypes reconstructed from metagenomic data failed to be amplified by PCR. Our approach is readily applicable to existing metagenomic datasets and is expected to lead to the discovery of new microbial phylotypes.

  3. MedTime: a temporal information extraction system for clinical narratives.

    Science.gov (United States)

    Lin, Yu-Kai; Chen, Hsinchun; Brown, Randall A

    2013-12-01

    Temporal information extraction from clinical narratives is of critical importance to many clinical applications. We participated in the EVENT/TIMEX3 track of the 2012 i2b2 clinical temporal relations challenge, and presented our temporal information extraction system, MedTime. MedTime comprises a cascade of rule-based and machine-learning pattern recognition procedures. It achieved a micro-averaged f-measure of 0.88 in both the recognitions of clinical events and temporal expressions. We proposed and evaluated three time normalization strategies to normalize relative time expressions in clinical texts. The accuracy was 0.68 in normalizing temporal expressions of dates, times, durations, and frequencies. This study demonstrates and evaluates the integration of rule-based and machine-learning-based approaches for high performance temporal information extraction from clinical narratives. Copyright © 2013 Elsevier Inc. All rights reserved.

  4. Automatically extracting clinically useful sentences from UpToDate to support clinicians' information needs.

    Science.gov (United States)

    Mishra, Rashmi; Del Fiol, Guilherme; Kilicoglu, Halil; Jonnalagadda, Siddhartha; Fiszman, Marcelo

    2013-01-01

    Clinicians raise several information needs in the course of care. Most of these needs can be met by online health knowledge resources such as UpToDate. However, finding relevant information in these resources often requires significant time and cognitive effort. To design and assess algorithms for extracting from UpToDate the sentences that represent the most clinically useful information for patient care decision making. We developed algorithms based on semantic predications extracted with SemRep, a semantic natural language processing parser. Two algorithms were compared against a gold standard composed of UpToDate sentences rated in terms of clinical usefulness. Clinically useful sentences were strongly correlated with predication frequency (correlation= 0.95). The two algorithms did not differ in terms of top ten precision (53% vs. 49%; p=0.06). Semantic predications may serve as the basis for extracting clinically useful sentences. Future research is needed to improve the algorithms.

  5. Extracting and standardizing medication information in clinical text - the MedEx-UIMA system.

    Science.gov (United States)

    Jiang, Min; Wu, Yonghui; Shah, Anushi; Priyanka, Priyanka; Denny, Joshua C; Xu, Hua

    2014-01-01

    Extraction of medication information embedded in clinical text is important for research using electronic health records (EHRs). However, most of current medication information extraction systems identify drug and signature entities without mapping them to standard representation. In this study, we introduced the open source Java implementation of MedEx, an existing high-performance medication information extraction system, based on the Unstructured Information Management Architecture (UIMA) framework. In addition, we developed new encoding modules in the MedEx-UIMA system, which mapped an extracted drug name/dose/form to both generalized and specific RxNorm concepts and translated drug frequency information to ISO standard. We processed 826 documents by both systems and verified that MedEx-UIMA and MedEx (the Python version) performed similarly by comparing both results. Using two manually annotated test sets that contained 300 drug entries from medication list and 300 drug entries from narrative reports, the MedEx-UIMA system achieved F-measures of 98.5% and 97.5% respectively for encoding drug names to corresponding RxNorm generic drug ingredients, and F-measures of 85.4% and 88.1% respectively for mapping drug names/dose/form to the most specific RxNorm concepts. It also achieved an F-measure of 90.4% for normalizing frequency information to ISO standard. The open source MedEx-UIMA system is freely available online at http://code.google.com/p/medex-uima/.

  6. Information Extraction of High-Resolution Remotely Sensed Image Based on Multiresolution Segmentation

    Directory of Open Access Journals (Sweden)

    Peng Shao

    2014-08-01

    Full Text Available The principle of multiresolution segmentation was represented in detail in this study, and the canny algorithm was applied for edge-detection of a remotely sensed image based on this principle. The target image was divided into regions based on object-oriented multiresolution segmentation and edge-detection. Furthermore, object hierarchy was created, and a series of features (water bodies, vegetation, roads, residential areas, bare land and other information were extracted by the spectral and geometrical features. The results indicate that the edge-detection has a positive effect on multiresolution segmentation, and overall accuracy of information extraction reaches to 94.6% by the confusion matrix.

  7. [Pathology and viral metagenomics, a recent history].

    Science.gov (United States)

    Bernardo, Pauline; Albina, Emmanuel; Eloit, Marc; Roumagnac, Philippe

    2013-05-01

    Human, animal and plant viral diseases have greatly benefited from recent metagenomics developments. Viral metagenomics is a culture-independent approach used to investigate the complete viral genetic populations of a sample. During the last decade, metagenomics concepts and techniques that were first used by ecologists progressively spread into the scientific field of viral pathology. The sample, which was first for ecologists a fraction of ecosystem, became for pathologists an organism that hosts millions of microbes and viruses. This new approach, providing without a priori high resolution qualitative and quantitative data on the viral diversity, is now revolutionizing the way pathologists decipher viral diseases. This review describes the very last improvements of the high throughput next generation sequencing methods and discusses the applications of viral metagenomics in viral pathology, including discovery of novel viruses, viral surveillance and diagnostic, large-scale molecular epidemiology, and viral evolution. © 2013 médecine/sciences – Inserm.

  8. Comparative metagenomics of the Red Sea

    KAUST Repository

    Mineta, Katsuhiko

    2016-01-26

    Metagenome produces a tremendous amount of data that comes from the organisms living in the environments. This big data enables us to examine not only microbial genes but also the community structure, interaction and adaptation mechanisms at the specific location and condition. The Red Sea has several unique characteristics such as high salinity, high temperature and low nutrition. These features must contribute to form the unique microbial community during the evolutionary process. Since 2014, we started monthly samplings of the metagenomes in the Red Sea under KAUST-CCF project. In collaboration with Kitasato University, we also collected the metagenome data from the ocean in Japan, which shows contrasting features to the Red Sea. Therefore, the comparative metagenomics of those data provides a comprehensive view of the Red Sea microbes, leading to identify key microbes, genes and networks related to those environmental differences.

  9. Using text mining techniques to extract phenotypic information from the PhenoCHF corpus.

    Science.gov (United States)

    Alnazzawi, Noha; Thompson, Paul; Batista-Navarro, Riza; Ananiadou, Sophia

    2015-01-01

    Phenotypic information locked away in unstructured narrative text presents significant barriers to information accessibility, both for clinical practitioners and for computerised applications used for clinical research purposes. Text mining (TM) techniques have previously been applied successfully to extract different types of information from text in the biomedical domain. They have the potential to be extended to allow the extraction of information relating to phenotypes from free text. To stimulate the development of TM systems that are able to extract phenotypic information from text, we have created a new corpus (PhenoCHF) that is annotated by domain experts with several types of phenotypic information relating to congestive heart failure. To ensure that systems developed using the corpus are robust to multiple text types, it integrates text from heterogeneous sources, i.e., electronic health records (EHRs) and scientific articles from the literature. We have developed several different phenotype extraction methods to demonstrate the utility of the corpus, and tested these methods on a further corpus, i.e., ShARe/CLEF 2013. Evaluation of our automated methods showed that PhenoCHF can facilitate the training of reliable phenotype extraction systems, which are robust to variations in text type. These results have been reinforced by evaluating our trained systems on the ShARe/CLEF corpus, which contains clinical records of various types. Like other studies within the biomedical domain, we found that solutions based on conditional random fields produced the best results, when coupled with a rich feature set. PhenoCHF is the first annotated corpus aimed at encoding detailed phenotypic information. The unique heterogeneous composition of the corpus has been shown to be advantageous in the training of systems that can accurately extract phenotypic information from a range of different text types. Although the scope of our annotation is currently limited to a single

  10. Challenges and Opportunities of Airborne Metagenomics

    KAUST Repository

    Behzad, H.

    2015-05-06

    Recent metagenomic studies of environments, such as marine and soil, have significantly enhanced our understanding of the diverse microbial communities living in these habitats and their essential roles in sustaining vast ecosystems. The increase in the number of publications related to soil and marine metagenomics is in sharp contrast to those of air, yet airborne microbes are thought to have significant impacts on many aspects of our lives from their potential roles in atmospheric events such as cloud formation, precipitation, and atmospheric chemistry to their major impact on human health. In this review, we will discuss the current progress in airborne metagenomics, with a special focus on exploring the challenges and opportunities of undertaking such studies. The main challenges of conducting metagenomic studies of airborne microbes are as follows: 1) Low density of microorganisms in the air, 2) efficient retrieval of microorganisms from the air, 3) variability in airborne microbial community composition, 4) the lack of standardized protocols and methodologies, and 5) DNA sequencing and bioinformatics-related challenges. Overcoming these challenges could provide the groundwork for comprehensive analysis of airborne microbes and their potential impact on the atmosphere, global climate, and our health. Metagenomic studies offer a unique opportunity to examine viral and bacterial diversity in the air and monitor their spread locally or across the globe, including threats from pathogenic microorganisms. Airborne metagenomic studies could also lead to discoveries of novel genes and metabolic pathways relevant to meteorological and industrial applications, environmental bioremediation, and biogeochemical cycles.

  11. Captured metagenomics: large-scale targeting of genes based on ?sequence capture? reveals functional diversity in soils

    OpenAIRE

    Manoharan, Lokeshwaran; Kushwaha, Sandeep K.; Hedlund, Katarina; Ahr?n, Dag

    2015-01-01

    Microbial enzyme diversity is a key to understand many ecosystem processes. Whole metagenome sequencing (WMG) obtains information on functional genes, but it is costly and inefficient due to large amount of sequencing that is required. In this study, we have applied a captured metagenomics technique for functional genes in soil microorganisms, as an alternative to WMG. Large-scale targeting of functional genes, coding for enzymes related to organic matter degradation, was applied to two agric...

  12. Approaching the largest ‘API’: extracting information from the Internet with Python

    OpenAIRE

    Jonathan E. Germann

    2018-01-01

    This article explores the need for libraries to algorithmically access and manipulate the world’s largest API: the Internet. The billions of pages on the ‘Internet API’ (HTTP, HTML, CSS, XPath, DOM, etc.) are easily accessible and manipulable. Libraries can assist in creating meaning through the datafication of information on the world wide web. Because most information is created for human consumption, some programming is required for automated extraction. Python is an easy-to-learn progra...

  13. Information Extraction to Generate Visual Simulations of Car Accidents from Written Descriptions

    NARCIS (Netherlands)

    Nugues, P.; Dupuy, S.; Egges, A.

    2003-01-01

    This paper describes a system to create animated 3D scenes of car accidents from written reports. The text-to-scene conversion process consists of two stages. An information extraction module creates a tabular description of the accident and a visual simulator generates and animates the scene. We

  14. Network and Ensemble Enabled Entity Extraction in Informal Text (NEEEEIT) final report

    Energy Technology Data Exchange (ETDEWEB)

    Kegelmeyer, Philip W. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Shead, Timothy M. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Dunlavy, Daniel M. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2013-09-01

    This SAND report summarizes the activities and outcomes of the Network and Ensemble Enabled Entity Extraction in Information Text (NEEEEIT) LDRD project, which addressed improving the accuracy of conditional random fields for named entity recognition through the use of ensemble methods.

  15. End-to-end information extraction without token-level supervision

    DEFF Research Database (Denmark)

    Palm, Rasmus Berg; Hovy, Dirk; Laws, Florian

    2017-01-01

    and output text. We evaluate our model on the ATIS data set, MIT restaurant corpus and the MIT movie corpus and compare to neural baselines that do use token-level labels. We achieve competitive results, within a few percentage points of the baselines, showing the feasibility of E2E information extraction...

  16. A construction scheme of web page comment information extraction system based on frequent subtree mining

    Science.gov (United States)

    Zhang, Xiaowen; Chen, Bingfeng

    2017-08-01

    Based on the frequent sub-tree mining algorithm, this paper proposes a construction scheme of web page comment information extraction system based on frequent subtree mining, referred to as FSM system. The entire system architecture and the various modules to do a brief introduction, and then the core of the system to do a detailed description, and finally give the system prototype.

  17. Extracting information from two-dimensional electrophoresis gels by partial least squares regression

    DEFF Research Database (Denmark)

    Jessen, Flemming; Lametsch, R.; Bendixen, E.

    2002-01-01

    Two-dimensional gel electrophoresis (2-DE) produces large amounts of data and extraction of relevant information from these data demands a cautious and time consuming process of spot pattern matching between gels. The classical approach of data analysis is to detect protein markers that appear...

  18. Binning sequences using very sparse labels within a metagenome

    Directory of Open Access Journals (Sweden)

    Halgamuge Saman K

    2008-04-01

    Full Text Available Abstract Background In metagenomic studies, a process called binning is necessary to assign contigs that belong to multiple species to their respective phylogenetic groups. Most of the current methods of binning, such as BLAST, k-mer and PhyloPythia, involve assigning sequence fragments by comparing sequence similarity or sequence composition with already-sequenced genomes that are still far from comprehensive. We propose a semi-supervised seeding method for binning that does not depend on knowledge of completed genomes. Instead, it extracts the flanking sequences of highly conserved 16S rRNA from the metagenome and uses them as seeds (labels to assign other reads based on their compositional similarity. Results The proposed seeding method is implemented on an unsupervised Growing Self-Organising Map (GSOM, and called Seeded GSOM (S-GSOM. We compared it with four well-known semi-supervised learning methods in a preliminary test, separating random-length prokaryotic sequence fragments sampled from the NCBI genome database. We identified the flanking sequences of the highly conserved 16S rRNA as suitable seeds that could be used to group the sequence fragments according to their species. S-GSOM showed superior performance compared to the semi-supervised methods tested. Additionally, S-GSOM may also be used to visually identify some species that do not have seeds. The proposed method was then applied to simulated metagenomic datasets using two different confidence threshold settings and compared with PhyloPythia, k-mer and BLAST. At the reference taxonomic level Order, S-GSOM outperformed all k-mer and BLAST results and showed comparable results with PhyloPythia for each of the corresponding confidence settings, where S-GSOM performed better than PhyloPythia in the ≥ 10 reads datasets and comparable in the ≥ 8 kb benchmark tests. Conclusion In the task of binning using semi-supervised learning methods, results indicate S-GSOM to be the best of

  19. Explaining diversity in metagenomic datasets by phylogenetic-based feature weighting.

    Science.gov (United States)

    Albanese, Davide; De Filippo, Carlotta; Cavalieri, Duccio; Donati, Claudio

    2015-03-01

    Metagenomics is revolutionizing our understanding of microbial communities, showing that their structure and composition have profound effects on the ecosystem and in a variety of health and disease conditions. Despite the flourishing of new analysis methods, current approaches based on statistical comparisons between high-level taxonomic classes often fail to identify the microbial taxa that are differentially distributed between sets of samples, since in many cases the taxonomic schema do not allow an adequate description of the structure of the microbiota. This constitutes a severe limitation to the use of metagenomic data in therapeutic and diagnostic applications. To provide a more robust statistical framework, we introduce a class of feature-weighting algorithms that discriminate the taxa responsible for the classification of metagenomic samples. The method unambiguously groups the relevant taxa into clades without relying on pre-defined taxonomic categories, thus including in the analysis also those sequences for which a taxonomic classification is difficult. The phylogenetic clades are weighted and ranked according to their abundance measuring their contribution to the differentiation of the classes of samples, and a criterion is provided to define a reduced set of most relevant clades. Applying the method to public datasets, we show that the data-driven definition of relevant phylogenetic clades accomplished by our ranking strategy identifies features in the samples that are lost if phylogenetic relationships are not considered, improving our ability to mine metagenomic datasets. Comparison with supervised classification methods currently used in metagenomic data analysis highlights the advantages of using phylogenetic information.

  20. Metagenomic Taxonomy-Guided Database-Searching Strategy for Improving Metaproteomic Analysis.

    Science.gov (United States)

    Xiao, Jinqiu; Tanca, Alessandro; Jia, Ben; Yang, Runqing; Wang, Bo; Zhang, Yu; Li, Jing

    2018-02-26

    Metaproteomics provides a direct measure of the functional information by investigating all proteins expressed by a microbiota. However, due to the complexity and heterogeneity of microbial communities, it is very hard to construct a sequence database suitable for a metaproteomic study. Using a public database, researchers might not be able to identify proteins from poorly characterized microbial species, while a sequencing-based metagenomic database may not provide adequate coverage for all potentially expressed protein sequences. To address this challenge, we propose a metagenomic taxonomy-guided database-search strategy (MT), in which a merged database is employed, consisting of both taxonomy-guided reference protein sequences from public databases and proteins from metagenome assembly. By applying our MT strategy to a mock microbial mixture, about two times as many peptides were detected as with the metagenomic database only. According to the evaluation of the reliability of taxonomic attribution, the rate of misassignments was comparable to that obtained using an a priori matched database. We also evaluated the MT strategy with a human gut microbial sample, and we found 1.7 times as many peptides as using a standard metagenomic database. In conclusion, our MT strategy allows the construction of databases able to provide high sensitivity and precision in peptide identification in metaproteomic studies, enabling the detection of proteins from poorly characterized species within the microbiota.

  1. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Yu-Wei [Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Simmons, Blake A. [Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Singer, Steven W. [Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

    2015-10-29

    The recovery of genomes from metagenomic datasets is a critical step to defining the functional roles of the underlying uncultivated populations. We previously developed MaxBin, an automated binning approach for high-throughput recovery of microbial genomes from metagenomes. Here, we present an expanded binning algorithm, MaxBin 2.0, which recovers genomes from co-assembly of a collection of metagenomic datasets. Tests on simulated datasets revealed that MaxBin 2.0 is highly accurate in recovering individual genomes, and the application of MaxBin 2.0 to several metagenomes from environmental samples demonstrated that it could achieve two complementary goals: recovering more bacterial genomes compared to binning a single sample as well as comparing the microbial community composition between different sampling environments. Availability and implementation: MaxBin 2.0 is freely available at http://sourceforge.net/projects/maxbin/ under BSD license. Supplementary information: Supplementary data are available at Bioinformatics online.

  2. Polarity Assignment to Causal Information Extracted from Financial Articles Concerning Business Performance of Companies

    Science.gov (United States)

    Sakai, Hiroyuki; Masuyama, Shigeru

    We propose a method of assigning polarity to causal information extracted from Japanese financial articles concerning business performance of companies. Our method assigns polarity (positive or negative) according to business performance to causal information, e.g. "zidousya no uriage ga koutyou: (Sales of cars are good)" (The polarity positive is assigned in this example.). First, our method classifies articles concerning business performance into positive articles and negative articles. Using this classified sets of articles, our method assigns polarity (positive or negative) to causal information extracted from the set of articles concerning business performance. We evaluated our method and it attained 75.3% precision and 47.9% recall of assigning polarity positive, and 77.0% precision and 58.5% recall of assigning polarity negative, respectively.

  3. Information extraction from full text scientific articles: Where are the keywords?

    Directory of Open Access Journals (Sweden)

    Perez-Iratxeta Carolina

    2003-05-01

    Full Text Available Abstract Background To date, many of the methods for information extraction of biological information from scientific articles are restricted to the abstract of the article. However, full text articles in electronic version, which offer larger sources of data, are currently available. Several questions arise as to whether the effort of scanning full text articles is worthy, or whether the information that can be extracted from the different sections of an article can be relevant. Results In this work we addressed those questions showing that the keyword content of the different sections of a standard scientific article (abstract, introduction, methods, results, and discussion is very heterogeneous. Conclusions Although the abstract contains the best ratio of keywords per total of words, other sections of the article may be a better source of biologically relevant data.

  4. Integrating semantic information into multiple kernels for protein-protein interaction extraction from biomedical literatures.

    Directory of Open Access Journals (Sweden)

    Lishuang Li

    Full Text Available Protein-Protein Interaction (PPI extraction is an important task in the biomedical information extraction. Presently, many machine learning methods for PPI extraction have achieved promising results. However, the performance is still not satisfactory. One reason is that the semantic resources were basically ignored. In this paper, we propose a multiple-kernel learning-based approach to extract PPIs, combining the feature-based kernel, tree kernel and semantic kernel. Particularly, we extend the shortest path-enclosed tree kernel (SPT by a dynamic extended strategy to retrieve the richer syntactic information. Our semantic kernel calculates the protein-protein pair similarity and the context similarity based on two semantic resources: WordNet and Medical Subject Heading (MeSH. We evaluate our method with Support Vector Machine (SVM and achieve an F-score of 69.40% and an AUC of 92.00%, which show that our method outperforms most of the state-of-the-art systems by integrating semantic information.

  5. A Method for Extracting Road Boundary Information from Crowdsourcing Vehicle GPS Trajectories

    Directory of Open Access Journals (Sweden)

    Wei Yang

    2018-04-01

    Full Text Available Crowdsourcing trajectory data is an important approach for accessing and updating road information. In this paper, we present a novel approach for extracting road boundary information from crowdsourcing vehicle traces based on Delaunay triangulation (DT. First, an optimization and interpolation method is proposed to filter abnormal trace segments from raw global positioning system (GPS traces and interpolate the optimization segments adaptively to ensure there are enough tracking points. Second, constructing the DT and the Voronoi diagram within interpolated tracking lines to calculate road boundary descriptors using the area of Voronoi cell and the length of triangle edge. Then, the road boundary detection model is established integrating the boundary descriptors and trajectory movement features (e.g., direction by DT. Third, using the boundary detection model to detect road boundary from the DT constructed by trajectory lines, and a regional growing method based on seed polygons is proposed to extract the road boundary. Experiments were conducted using the GPS traces of taxis in Beijing, China, and the results show that the proposed method is suitable for extracting the road boundary from low-frequency GPS traces, multi-type road structures, and different time intervals. Compared with two existing methods, the automatically extracted boundary information was proved to be of higher quality.

  6. The metagenomic data life-cycle: standards and best practices

    Energy Technology Data Exchange (ETDEWEB)

    ten Hoopen, Petra; Finn, Robert D.; Bongo, Lars Ailo; Corre, Erwan; Fosso, Bruno; Meyer, Folker; Mitchell, Alex; Pelletier, Eric; Pesole, Graziano; Santamaria, Monica; Willassen, Nils Peder; Cochrane, Guy

    2017-06-16

    Metagenomics data analyses from independent studies can only be compared if the analysis workflows are described in a harmonised way. In this overview, we have mapped the landscape of data standards available for the description of essential steps in metagenomics: (1) material sampling, (2) material sequencing (3) data analysis and (4) data archiving & publishing. Taking examples from marine research, we summarise essential variables used to describe material sampling processes and sequencing procedures in a metagenomics experiment. These aspects of metagenomics dataset generation have been to some extent addressed by the scientific community but greater awareness and adoption is still needed. We emphasise the lack of standards relating to reporting how metagenomics datasets are analysed and how the metagenomics data analysis outputs should be archived and published. We propose best practice as a foundation for a community standard to enable reproducibility and better sharing of metagenomics datasets, leading ultimately to greater metagenomics data reuse and repurposing.

  7. Studies on methods and techniques of weak information extraction and integrated evaluation for sandstone-type uranium deposits

    International Nuclear Information System (INIS)

    Han Shaoyang; Ke Dan; Hou Huiqun; Hu Shuiqing

    2004-01-01

    Weak information extraction and integrated evaluation for sandstone-type uranium deposits are currently one of the important research contents in uranium exploration. Through several years researches, the authors put forward the meaning of aeromagnetic and aeroradioactive weak information extraction, study the formation theories of aeromagnetic and aeroradioactive weak information and establish effective mathematic models for weak information extraction. Based on GIS software, models of weak information extraction are actualized and the expert-grading model for integrated evaluation is developed. The trial of aeromagnetic and aeroradioactive weak information and integrated evaluation of uranium resources are completed by using GIS software in the study area. The researchful results prove that techniques of weak information extraction and integrated evaluation may further delineate the prospective areas of sandstone-type uranium deposits rapidly and improve the predicitive precision. (authors)

  8. [Extraction of spectral information of additives from activated manganese dioxide products using adaptive kernel independent component analysis].

    Science.gov (United States)

    Wang, Guo-qing; Peng, Yang; Liu, Shao-wen; Sun, Xiao-li; Zhao, Jian-bo; Sun, Yu-an; Liu, Ying-fan

    2011-05-01

    The additives were abstracted from the manganese dioxide products with four kinds of organic solvents, ether, acetone, chloroform and toluene. The extracts were then baked and their attenuated total reflectance (ATR) FTIR spectra were measured using liquid membrane method. The number of chemical components of the additives was determined by median absolute deviation (MAD), and the spectral information of the pure component was extracted by kernel independent component analysis (KICA). The extracted spectral information of the additives is accordant to that of the practically used compounds. An adaptive kernel independent component analysis (AKICA) was proposed for directive extraction of spectral information from chemical mixtures. The results demonstrated that the AKICA method provides an alternative approach to extracting spectral information from the chemical mixtures without previously chemical or physical preseparation for direct extracting spectral information of pure components in the mixed system.

  9. Viral Metagenomics: MetaView Software

    Energy Technology Data Exchange (ETDEWEB)

    Zhou, C; Smith, J

    2007-10-22

    The purpose of this report is to design and develop a tool for analysis of raw sequence read data from viral metagenomics experiments. The tool should compare read sequences of known viral nucleic acid sequence data and enable a user to attempt to determine, with some degree of confidence, what virus groups may be present in the sample. This project was conducted in two phases. In phase 1 we surveyed the literature and examined existing metagenomics tools to educate ourselves and to more precisely define the problem of analyzing raw read data from viral metagenomic experiments. In phase 2 we devised an approach and built a prototype code and database. This code takes viral metagenomic read data in fasta format as input and accesses all complete viral genomes from Kpath for sequence comparison. The system executes at the UNIX command line, producing output that is stored in an Oracle relational database. We provide here a description of the approach we came up with for handling un-assembled, short read data sets from viral metagenomics experiments. We include a discussion of the current MetaView code capabilities and additional functionality that we believe should be added, should additional funding be acquired to continue the work.

  10. Congestive heart failure information extraction framework for automated treatment performance measures assessment.

    Science.gov (United States)

    Meystre, Stéphane M; Kim, Youngjun; Gobbel, Glenn T; Matheny, Michael E; Redd, Andrew; Bray, Bruce E; Garvin, Jennifer H

    2017-04-01

    This paper describes a new congestive heart failure (CHF) treatment performance measure information extraction system - CHIEF - developed as part of the Automated Data Acquisition for Heart Failure project, a Veterans Health Administration project aiming at improving the detection of patients not receiving recommended care for CHF. CHIEF is based on the Apache Unstructured Information Management Architecture framework, and uses a combination of rules, dictionaries, and machine learning methods to extract left ventricular function mentions and values, CHF medications, and documented reasons for a patient not receiving these medications. The training and evaluation of CHIEF were based on subsets of a reference standard of various clinical notes from 1083 Veterans Health Administration patients. Domain experts manually annotated these notes to create our reference standard. Metrics used included recall, precision, and the F 1 -measure. In general, CHIEF extracted CHF medications with high recall (>0.990) and good precision (0.960-0.978). Mentions of Left Ventricular Ejection Fraction were also extracted with high recall (0.978-0.986) and precision (0.986-0.994), and quantitative values of Left Ventricular Ejection Fraction were found with 0.910-0.945 recall and with high precision (0.939-0.976). Reasons for not prescribing CHF medications were more difficult to extract, only reaching fair accuracy with about 0.310-0.400 recall and 0.250-0.320 precision. This study demonstrated that applying natural language processing to unlock the rich and detailed clinical information found in clinical narrative text notes makes fast and scalable quality improvement approaches possible, eventually improving management and outpatient treatment of patients suffering from CHF. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  11. Automatically extracting sentences from Medline citations to support clinicians' information needs.

    Science.gov (United States)

    Jonnalagadda, Siddhartha Reddy; Del Fiol, Guilherme; Medlin, Richard; Weir, Charlene; Fiszman, Marcelo; Mostafa, Javed; Liu, Hongfang

    2013-01-01

    Online health knowledge resources contain answers to most of the information needs raised by clinicians in the course of care. However, significant barriers limit the use of these resources for decision-making, especially clinicians' lack of time. In this study we assessed the feasibility of automatically generating knowledge summaries for a particular clinical topic composed of relevant sentences extracted from Medline citations. The proposed approach combines information retrieval and semantic information extraction techniques to identify relevant sentences from Medline abstracts. We assessed this approach in two case studies on the treatment alternatives for depression and Alzheimer's disease. A total of 515 of 564 (91.3%) sentences retrieved in the two case studies were relevant to the topic of interest. About one-third of the relevant sentences described factual knowledge or a study conclusion that can be used for supporting information needs at the point of care. The high rate of relevant sentences is desirable, given that clinicians' lack of time is one of the main barriers to using knowledge resources at the point of care. Sentence rank was not significantly associated with relevancy, possibly due to most sentences being highly relevant. Sentences located closer to the end of the abstract and sentences with treatment and comparative predications were likely to be conclusive sentences. Our proposed technical approach to helping clinicians meet their information needs is promising. The approach can be extended for other knowledge resources and information need types.

  12. From remote sensing data about information extraction for 3D geovisualization - Development of a workflow

    International Nuclear Information System (INIS)

    Tiede, D.

    2010-01-01

    With an increased availability of high (spatial) resolution remote sensing imagery since the late nineties, the need to develop operative workflows for the automated extraction, provision and communication of information from such data has grown. Monitoring requirements, aimed at the implementation of environmental or conservation targets, management of (environmental-) resources, and regional planning as well as international initiatives, especially the joint initiative of the European Commission and ESA (European Space Agency) for Global Monitoring for Environment and Security (GMES) play also a major part. This thesis addresses the development of an integrated workflow for the automated provision of information derived from remote sensing data. Considering applied data and fields of application, this work aims to design the workflow as generic as possible. Following research questions are discussed: What are the requirements of a workflow architecture that seamlessly links the individual workflow elements in a timely manner and secures accuracy of the extracted information effectively? How can the workflow retain its efficiency if mounds of data are processed? How can the workflow be improved with regards to automated object-based image analysis (OBIA)? Which recent developments could be of use? What are the limitations or which workarounds could be applied in order to generate relevant results? How can relevant information be prepared target-oriented and communicated effectively? How can the more recently developed freely available virtual globes be used for the delivery of conditioned information under consideration of the third dimension as an additional, explicit carrier of information? Based on case studies comprising different data sets and fields of application it is demonstrated how methods to extract and process information as well as to effectively communicate results can be improved and successfully combined within one workflow. It is shown that (1

  13. Extracting breathing rate information from a wearable reflectance pulse oximeter sensor.

    Science.gov (United States)

    Johnston, W S; Mendelson, Y

    2004-01-01

    The integration of multiple vital physiological measurements could help combat medics and field commanders to better predict a soldier's health condition and enhance their ability to perform remote triage procedures. In this paper we demonstrate the feasibility of extracting accurate breathing rate information from a photoplethysmographic signal that was recorded by a reflectance pulse oximeter sensor mounted on the forehead and subsequently processed by a simple time domain filtering and frequency domain Fourier analysis.

  14. A catalog of the mouse gut metagenome.

    Science.gov (United States)

    Xiao, Liang; Feng, Qiang; Liang, Suisha; Sonne, Si Brask; Xia, Zhongkui; Qiu, Xinmin; Li, Xiaoping; Long, Hua; Zhang, Jianfeng; Zhang, Dongya; Liu, Chuan; Fang, Zhiwei; Chou, Joyce; Glanville, Jacob; Hao, Qin; Kotowska, Dorota; Colding, Camilla; Licht, Tine Rask; Wu, Donghai; Yu, Jun; Sung, Joseph Jao Yiu; Liang, Qiaoyi; Li, Junhua; Jia, Huijue; Lan, Zhou; Tremaroli, Valentina; Dworzynski, Piotr; Nielsen, H Bjørn; Bäckhed, Fredrik; Doré, Joël; Le Chatelier, Emmanuelle; Ehrlich, S Dusko; Lin, John C; Arumugam, Manimozhiyan; Wang, Jun; Madsen, Lise; Kristiansen, Karsten

    2015-10-01

    We established a catalog of the mouse gut metagenome comprising ∼2.6 million nonredundant genes by sequencing DNA from fecal samples of 184 mice. To secure high microbiome diversity, we used mouse strains of diverse genetic backgrounds, from different providers, kept in different housing laboratories and fed either a low-fat or high-fat diet. Similar to the human gut microbiome, >99% of the cataloged genes are bacterial. We identified 541 metagenomic species and defined a core set of 26 metagenomic species found in 95% of the mice. The mouse gut microbiome is functionally similar to its human counterpart, with 95.2% of its Kyoto Encyclopedia of Genes and Genomes (KEGG) orthologous groups in common. However, only 4.0% of the mouse gut microbial genes were shared (95% identity, 90% coverage) with those of the human gut microbiome. This catalog provides a useful reference for future studies.

  15. KneeTex: an ontology-driven system for information extraction from MRI reports.

    Science.gov (United States)

    Spasić, Irena; Zhao, Bo; Jones, Christopher B; Button, Kate

    2015-01-01

    In the realm of knee pathology, magnetic resonance imaging (MRI) has the advantage of visualising all structures within the knee joint, which makes it a valuable tool for increasing diagnostic accuracy and planning surgical treatments. Therefore, clinical narratives found in MRI reports convey valuable diagnostic information. A range of studies have proven the feasibility of natural language processing for information extraction from clinical narratives. However, no study focused specifically on MRI reports in relation to knee pathology, possibly due to the complexity of knee anatomy and a wide range of conditions that may be associated with different anatomical entities. In this paper we describe KneeTex, an information extraction system that operates in this domain. As an ontology-driven information extraction system, KneeTex makes active use of an ontology to strongly guide and constrain text analysis. We used automatic term recognition to facilitate the development of a domain-specific ontology with sufficient detail and coverage for text mining applications. In combination with the ontology, high regularity of the sublanguage used in knee MRI reports allowed us to model its processing by a set of sophisticated lexico-semantic rules with minimal syntactic analysis. The main processing steps involve named entity recognition combined with coordination, enumeration, ambiguity and co-reference resolution, followed by text segmentation. Ontology-based semantic typing is then used to drive the template filling process. We adopted an existing ontology, TRAK (Taxonomy for RehAbilitation of Knee conditions), for use within KneeTex. The original TRAK ontology expanded from 1,292 concepts, 1,720 synonyms and 518 relationship instances to 1,621 concepts, 2,550 synonyms and 560 relationship instances. This provided KneeTex with a very fine-grained lexico-semantic knowledge base, which is highly attuned to the given sublanguage. Information extraction results were evaluated

  16. Automatically extracting clinically useful sentences from UpToDate to support clinicians’ information needs

    Science.gov (United States)

    Mishra, Rashmi; Fiol, Guilherme Del; Kilicoglu, Halil; Jonnalagadda, Siddhartha; Fiszman, Marcelo

    2013-01-01

    Clinicians raise several information needs in the course of care. Most of these needs can be met by online health knowledge resources such as UpToDate. However, finding relevant information in these resources often requires significant time and cognitive effort. Objective: To design and assess algorithms for extracting from UpToDate the sentences that represent the most clinically useful information for patient care decision making. Methods: We developed algorithms based on semantic predications extracted with SemRep, a semantic natural language processing parser. Two algorithms were compared against a gold standard composed of UpToDate sentences rated in terms of clinical usefulness. Results: Clinically useful sentences were strongly correlated with predication frequency (correlation= 0.95). The two algorithms did not differ in terms of top ten precision (53% vs. 49%; p=0.06). Conclusions: Semantic predications may serve as the basis for extracting clinically useful sentences. Future research is needed to improve the algorithms. PMID:24551389

  17. Challenges and opportunities of airborne metagenomics.

    Science.gov (United States)

    Behzad, Hayedeh; Gojobori, Takashi; Mineta, Katsuhiko

    2015-05-06

    Recent metagenomic studies of environments, such as marine and soil, have significantly enhanced our understanding of the diverse microbial communities living in these habitats and their essential roles in sustaining vast ecosystems. The increase in the number of publications related to soil and marine metagenomics is in sharp contrast to those of air, yet airborne microbes are thought to have significant impacts on many aspects of our lives from their potential roles in atmospheric events such as cloud formation, precipitation, and atmospheric chemistry to their major impact on human health. In this review, we will discuss the current progress in airborne metagenomics, with a special focus on exploring the challenges and opportunities of undertaking such studies. The main challenges of conducting metagenomic studies of airborne microbes are as follows: 1) Low density of microorganisms in the air, 2) efficient retrieval of microorganisms from the air, 3) variability in airborne microbial community composition, 4) the lack of standardized protocols and methodologies, and 5) DNA sequencing and bioinformatics-related challenges. Overcoming these challenges could provide the groundwork for comprehensive analysis of airborne microbes and their potential impact on the atmosphere, global climate, and our health. Metagenomic studies offer a unique opportunity to examine viral and bacterial diversity in the air and monitor their spread locally or across the globe, including threats from pathogenic microorganisms. Airborne metagenomic studies could also lead to discoveries of novel genes and metabolic pathways relevant to meteorological and industrial applications, environmental bioremediation, and biogeochemical cycles. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  18. Comparison of methods of extracting information for meta-analysis of observational studies in nutritional epidemiology

    Directory of Open Access Journals (Sweden)

    Jong-Myon Bae

    2016-01-01

    Full Text Available OBJECTIVES: A common method for conducting a quantitative systematic review (QSR for observational studies related to nutritional epidemiology is the “highest versus lowest intake” method (HLM, in which only the information concerning the effect size (ES of the highest category of a food item is collected on the basis of its lowest category. However, in the interval collapsing method (ICM, a method suggested to enable a maximum utilization of all available information, the ES information is collected by collapsing all categories into a single category. This study aimed to compare the ES and summary effect size (SES between the HLM and ICM. METHODS: A QSR for evaluating the citrus fruit intake and risk of pancreatic cancer and calculating the SES by using the HLM was selected. The ES and SES were estimated by performing a meta-analysis using the fixed-effect model. The directionality and statistical significance of the ES and SES were used as criteria for determining the concordance between the HLM and ICM outcomes. RESULTS: No significant differences were observed in the directionality of SES extracted by using the HLM or ICM. The application of the ICM, which uses a broader information base, yielded more-consistent ES and SES, and narrower confidence intervals than the HLM. CONCLUSIONS: The ICM is advantageous over the HLM owing to its higher statistical accuracy in extracting information for QSR on nutritional epidemiology. The application of the ICM should hence be recommended for future studies.

  19. Gene Prediction in Metagenomic Fragments with Deep Learning

    Directory of Open Access Journals (Sweden)

    Shao-Wu Zhang

    2017-01-01

    Full Text Available Next generation sequencing technologies used in metagenomics yield numerous sequencing fragments which come from thousands of different species. Accurately identifying genes from metagenomics fragments is one of the most fundamental issues in metagenomics. In this article, by fusing multifeatures (i.e., monocodon usage, monoamino acid usage, ORF length coverage, and Z-curve features and using deep stacking networks learning model, we present a novel method (called Meta-MFDL to predict the metagenomic genes. The results with 10 CV and independent tests show that Meta-MFDL is a powerful tool for identifying genes from metagenomic fragments.

  20. A retrospective metagenomics approach to studying Blastocystis

    DEFF Research Database (Denmark)

    Andersen, Lee O'Brien; Bonde, Ida; Nielsen, Henrik Bjørn

    2015-01-01

    Blastocystis is a common single-celled intestinal parasitic genus, comprising several subtypes. Here, we screened data obtained by metagenomic analysis of faecal DNA for Blastocystis by searching for subtype-specific genes in coabundance gene groups, which are groups of genes that covary across......- and Prevotella-driven enterotypes. This is the first study to investigate the relationship between Blastocystis and communities of gut bacteria using a metagenomics approach. The study serves as an example of how it is possible to retrospectively investigate microbial eukaryotic communities in the gut using...

  1. Comparison of metagenomic samples using sequence signatures

    Directory of Open Access Journals (Sweden)

    Jiang Bai

    2012-12-01

    Full Text Available Abstract Background Sequence signatures, as defined by the frequencies of k-tuples (or k-mers, k-grams, have been used extensively to compare genomic sequences of individual organisms, to identify cis-regulatory modules, and to study the evolution of regulatory sequences. Recently many next-generation sequencing (NGS read data sets of metagenomic samples from a variety of different environments have been generated. The assembly of these reads can be difficult and analysis methods based on mapping reads to genes or pathways are also restricted by the availability and completeness of existing databases. Sequence-signature-based methods, however, do not need the complete genomes or existing databases and thus, can potentially be very useful for the comparison of metagenomic samples using NGS read data. Still, the applications of sequence signature methods for the comparison of metagenomic samples have not been well studied. Results We studied several dissimilarity measures, including d2, d2* and d2S recently developed from our group, a measure (hereinafter noted as Hao used in CVTree developed from Hao’s group (Qi et al., 2004, measures based on relative di-, tri-, and tetra-nucleotide frequencies as in Willner et al. (2009, as well as standard lp measures between the frequency vectors, for the comparison of metagenomic samples using sequence signatures. We compared their performance using a series of extensive simulations and three real next-generation sequencing (NGS metagenomic datasets: 39 fecal samples from 33 mammalian host species, 56 marine samples across the world, and 13 fecal samples from human individuals. Results showed that the dissimilarity measure d2S can achieve superior performance when comparing metagenomic samples by clustering them into different groups as well as recovering environmental gradients affecting microbial samples. New insights into the environmental factors affecting microbial compositions in metagenomic samples

  2. Metagenomic Systems Biology of the Human Microbiome

    DEFF Research Database (Denmark)

    Bonde, Ida

    , nose and oral cavity has been analyzed. The central method has been a co-abundance clustering method, which separates genes from metagenomics data under the assumption that genes originating from the same DNA (e.g. a bacterial genome, a phage or a plasmid) will co-vary across samples. Thus, co...... to previous Blastocystis prevalence studies. Moreover, it was found that individuals with a Bacteroides-driven enterotype were less prone to harbor the Blastocystis parasite. Finally, the CAG clustering method was applied to metagenomics data from the human nose- and oral-cavity. It was concluded...

  3. Metabolic reconstruction for metagenomic data and its application to the human microbiome.

    Directory of Open Access Journals (Sweden)

    Sahar Abubucker

    Full Text Available Microbial communities carry out the majority of the biochemical activity on the planet, and they play integral roles in processes including metabolism and immune homeostasis in the human microbiome. Shotgun sequencing of such communities' metagenomes provides information complementary to organismal abundances from taxonomic markers, but the resulting data typically comprise short reads from hundreds of different organisms and are at best challenging to assemble comparably to single-organism genomes. Here, we describe an alternative approach to infer the functional and metabolic potential of a microbial community metagenome. We determined the gene families and pathways present or absent within a community, as well as their relative abundances, directly from short sequence reads. We validated this methodology using a collection of synthetic metagenomes, recovering the presence and abundance both of large pathways and of small functional modules with high accuracy. We subsequently applied this method, HUMAnN, to the microbial communities of 649 metagenomes drawn from seven primary body sites on 102 individuals as part of the Human Microbiome Project (HMP. This provided a means to compare functional diversity and organismal ecology in the human microbiome, and we determined a core of 24 ubiquitously present modules. Core pathways were often implemented by different enzyme families within different body sites, and 168 functional modules and 196 metabolic pathways varied in metagenomic abundance specifically to one or more niches within the microbiome. These included glycosaminoglycan degradation in the gut, as well as phosphate and amino acid transport linked to host phenotype (vaginal pH in the posterior fornix. An implementation of our methodology is available at http://huttenhower.sph.harvard.edu/humann. This provides a means to accurately and efficiently characterize microbial metabolic pathways and functional modules directly from high

  4. Metagenomics as a Tool for Enzyme Discovery: Hydrolytic Enzymes from Marine-Related Metagenomes.

    Science.gov (United States)

    Popovic, Ana; Tchigvintsev, Anatoly; Tran, Hai; Chernikova, Tatyana N; Golyshina, Olga V; Yakimov, Michail M; Golyshin, Peter N; Yakunin, Alexander F

    2015-01-01

    This chapter discusses metagenomics and its application for enzyme discovery, with a focus on hydrolytic enzymes from marine metagenomic libraries. With less than one percent of culturable microorganisms in the environment, metagenomics, or the collective study of community genetics, has opened up a rich pool of uncharacterized metabolic pathways, enzymes, and adaptations. This great untapped pool of genes provides the particularly exciting potential to mine for new biochemical activities or novel enzymes with activities tailored to peculiar sets of environmental conditions. Metagenomes also represent a huge reservoir of novel enzymes for applications in biocatalysis, biofuels, and bioremediation. Here we present the results of enzyme discovery for four enzyme activities, of particular industrial or environmental interest, including esterase/lipase, glycosyl hydrolase, protease and dehalogenase.

  5. The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata

    Science.gov (United States)

    Pagani, Ioanna; Liolios, Konstantinos; Jansson, Jakob; Chen, I-Min A.; Smirnova, Tatyana; Nosrat, Bahador; Markowitz, Victor M.; Kyrpides, Nikos C.

    2012-01-01

    The Genomes OnLine Database (GOLD, http://www.genomesonline.org/) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2011, GOLD, now on version 4.0, contains information for 11 472 sequencing projects, of which 2907 have been completed and their sequence data has been deposited in a public repository. Out of these complete projects, 1918 are finished and 989 are permanent drafts. Moreover, GOLD contains information for 340 metagenome studies associated with 1927 metagenome samples. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about any (x) Sequence specification and beyond. PMID:22135293

  6. Infrared and visual image fusion through infrared feature extraction and visual information preservation

    Science.gov (United States)

    Zhang, Yu; Zhang, Lijia; Bai, Xiangzhi; Zhang, Li

    2017-06-01

    The ideal fusion of the infrared image and visual image should integrate the important bright features of the infrared image, and preserve much original visual information of the visual image. To achieve this purpose, we propose a simple, fast yet effective infrared and visual image fusion algorithm through infrared feature extraction and visual information preservation. Firstly, we take advantage of quadtree decomposition and B e ´ zier interpolation to reconstruct the infrared background. Secondly, the infrared bright features are extracted by subtracting the reconstructed background from the infrared image and then refined by reducing the redundant background information. To inhibit the over-exposure problem, the refined infrared features are adaptively suppressed and then added on the visual image to achieve the final fusion image. In this way, the fusion image could not only reveal the invisible but important infrared objects by integrating the infrared bright features, but also show good visual quality by preserving much original visual information. Experiments performed on the commonly used image sets validate that the proposed algorithm outperforms several representative image fusion algorithms in most of the cases.

  7. An Accurate Integral Method for Vibration Signal Based on Feature Information Extraction

    Directory of Open Access Journals (Sweden)

    Yong Zhu

    2015-01-01

    Full Text Available After summarizing the advantages and disadvantages of current integral methods, a novel vibration signal integral method based on feature information extraction was proposed. This method took full advantage of the self-adaptive filter characteristic and waveform correction feature of ensemble empirical mode decomposition in dealing with nonlinear and nonstationary signals. This research merged the superiorities of kurtosis, mean square error, energy, and singular value decomposition on signal feature extraction. The values of the four indexes aforementioned were combined into a feature vector. Then, the connotative characteristic components in vibration signal were accurately extracted by Euclidean distance search, and the desired integral signals were precisely reconstructed. With this method, the interference problem of invalid signal such as trend item and noise which plague traditional methods is commendably solved. The great cumulative error from the traditional time-domain integral is effectively overcome. Moreover, the large low-frequency error from the traditional frequency-domain integral is successfully avoided. Comparing with the traditional integral methods, this method is outstanding at removing noise and retaining useful feature information and shows higher accuracy and superiority.

  8. The Feature Extraction Based on Texture Image Information for Emotion Sensing in Speech

    Directory of Open Access Journals (Sweden)

    Kun-Ching Wang

    2014-09-01

    Full Text Available In this paper, we present a novel texture image feature for Emotion Sensing in Speech (ESS. This idea is based on the fact that the texture images carry emotion-related information. The feature extraction is derived from time-frequency representation of spectrogram images. First, we transform the spectrogram as a recognizable image. Next, we use a cubic curve to enhance the image contrast. Then, the texture image information (TII derived from the spectrogram image can be extracted by using Laws’ masks to characterize emotional state. In order to evaluate the effectiveness of the proposed emotion recognition in different languages, we use two open emotional databases including the Berlin Emotional Speech Database (EMO-DB and eNTERFACE corpus and one self-recorded database (KHUSC-EmoDB, to evaluate the performance cross-corpora. The results of the proposed ESS system are presented using support vector machine (SVM as a classifier. Experimental results show that the proposed TII-based feature extraction inspired by visual perception can provide significant classification for ESS systems. The two-dimensional (2-D TII feature can provide the discrimination between different emotions in visual expressions except for the conveyance pitch and formant tracks. In addition, the de-noising in 2-D images can be more easily completed than de-noising in 1-D speech.

  9. A weighted information criterion for multiple minor components and its adaptive extraction algorithms.

    Science.gov (United States)

    Gao, Yingbin; Kong, Xiangyu; Zhang, Huihui; Hou, Li'an

    2017-05-01

    Minor component (MC) plays an important role in signal processing and data analysis, so it is a valuable work to develop MC extraction algorithms. Based on the concepts of weighted subspace and optimum theory, a weighted information criterion is proposed for searching the optimum solution of a linear neural network. This information criterion exhibits a unique global minimum attained if and only if the state matrix is composed of the desired MCs of an autocorrelation matrix of an input signal. By using gradient ascent method and recursive least square (RLS) method, two algorithms are developed for multiple MCs extraction. The global convergences of the proposed algorithms are also analyzed by the Lyapunov method. The proposed algorithms can extract the multiple MCs in parallel and has advantage in dealing with high dimension matrices. Since the weighted matrix does not require an accurate value, it facilitates the system design of the proposed algorithms for practical applications. The speed and computation advantages of the proposed algorithms are verified through simulations. Copyright © 2017 Elsevier Ltd. All rights reserved.

  10. Assessment of commercial NLP engines for medication information extraction from dictated clinical notes.

    Science.gov (United States)

    Jagannathan, V; Mullett, Charles J; Arbogast, James G; Halbritter, Kevin A; Yellapragada, Deepthi; Regulapati, Sushmitha; Bandaru, Pavani

    2009-04-01

    We assessed the current state of commercial natural language processing (NLP) engines for their ability to extract medication information from textual clinical documents. Two thousand de-identified discharge summaries and family practice notes were submitted to four commercial NLP engines with the request to extract all medication information. The four sets of returned results were combined to create a comparison standard which was validated against a manual, physician-derived gold standard created from a subset of 100 reports. Once validated, the individual vendor results for medication names, strengths, route, and frequency were compared against this automated standard with precision, recall, and F measures calculated. Compared with the manual, physician-derived gold standard, the automated standard was successful at accurately capturing medication names (F measure=93.2%), but performed less well with strength (85.3%) and route (80.3%), and relatively poorly with dosing frequency (48.3%). Moderate variability was seen in the strengths of the four vendors. The vendors performed better with the structured discharge summaries than with the clinic notes in an analysis comparing the two document types. Although automated extraction may serve as the foundation for a manual review process, it is not ready to automate medication lists without human intervention.

  11. Extracting genetic alteration information for personalized cancer therapy from ClinicalTrials.gov.

    Science.gov (United States)

    Xu, Jun; Lee, Hee-Jin; Zeng, Jia; Wu, Yonghui; Zhang, Yaoyun; Huang, Liang-Chin; Johnson, Amber; Holla, Vijaykumar; Bailey, Ann M; Cohen, Trevor; Meric-Bernstam, Funda; Bernstam, Elmer V; Xu, Hua

    2016-07-01

    Clinical trials investigating drugs that target specific genetic alterations in tumors are important for promoting personalized cancer therapy. The goal of this project is to create a knowledge base of cancer treatment trials with annotations about genetic alterations from ClinicalTrials.gov. We developed a semi-automatic framework that combines advanced text-processing techniques with manual review to curate genetic alteration information in cancer trials. The framework consists of a document classification system to identify cancer treatment trials from ClinicalTrials.gov and an information extraction system to extract gene and alteration pairs from the Title and Eligibility Criteria sections of clinical trials. By applying the framework to trials at ClinicalTrials.gov, we created a knowledge base of cancer treatment trials with genetic alteration annotations. We then evaluated each component of the framework against manually reviewed sets of clinical trials and generated descriptive statistics of the knowledge base. The automated cancer treatment trial identification system achieved a high precision of 0.9944. Together with the manual review process, it identified 20 193 cancer treatment trials from ClinicalTrials.gov. The automated gene-alteration extraction system achieved a precision of 0.8300 and a recall of 0.6803. After validation by manual review, we generated a knowledge base of 2024 cancer trials that are labeled with specific genetic alteration information. Analysis of the knowledge base revealed the trend of increased use of targeted therapy for cancer, as well as top frequent gene-alteration pairs of interest. We expect this knowledge base to be a valuable resource for physicians and patients who are seeking information about personalized cancer therapy. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  12. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata

    Energy Technology Data Exchange (ETDEWEB)

    Liolios, Konstantinos; Chen, Amy; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Phil; Markowitz, Victor; Kyrpides, Nikos C.

    2009-09-01

    The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification.

  13. System architecture for temporal information extraction, representation and reasoning in clinical narrative reports.

    Science.gov (United States)

    Zhou, Li; Friedman, Carol; Parsons, Simon; Hripcsak, George

    2005-01-01

    Exploring temporal information in narrative Electronic Medical Records (EMRs) is essential and challenging. We propose an architecture for an integrated approach to process temporal information in clinical narrative reports. The goal is to initiate and build a foundation that supports applications which assist healthcare practice and research by including the ability to determine the time of clinical events (e.g., past vs. present). Key components include: (1) an annotation schema for temporal expressions and the development of an associated tagger; (2) a natural language processing (NLP) system for encoding and extracting medical events and associating them with formalized temporal data; (3) a post-processor, with a knowledge-based subsystem to help discover implicit information, that resolves temporal expressions and deals with issues such as granularity and vagueness; and (4) a reasoning mechanism which models clinical reports as Simple Temporal Problems (STPs).

  14. A catalog of the mouse gut metagenome

    DEFF Research Database (Denmark)

    Xiao, Liang; Feng, Qiang; Liang, Suisha

    2015-01-01

    We established a catalog of the mouse gut metagenome comprising ∼2.6 million nonredundant genes by sequencing DNA from fecal samples of 184 mice. To secure high microbiome diversity, we used mouse strains of diverse genetic backgrounds, from different providers, kept in different housing laborato......We established a catalog of the mouse gut metagenome comprising ∼2.6 million nonredundant genes by sequencing DNA from fecal samples of 184 mice. To secure high microbiome diversity, we used mouse strains of diverse genetic backgrounds, from different providers, kept in different housing...... laboratories and fed either a low-fat or high-fat diet. Similar to the human gut microbiome, >99% of the cataloged genes are bacterial. We identified 541 metagenomic species and defined a core set of 26 metagenomic species found in 95% of the mice. The mouse gut microbiome is functionally similar to its human...... counterpart, with 95.2% of its Kyoto Encyclopedia of Genes and Genomes (KEGG) orthologous groups in common. However, only 4.0% of the mouse gut microbial genes were shared (95% identity, 90% coverage) with those of the human gut microbiome. This catalog provides a useful reference for future studies....

  15. Snowball: Strain aware gene assembly of Metagenomes

    NARCIS (Netherlands)

    I. Gregor; A. Schönhuth (Alexander); A.C. McHardy (Alice)

    2015-01-01

    htmlabstractGene assembly is an important step in functional analysis of shotgun metagenomic data. Nonetheless, strain aware assembly remains a challenging task, as current assembly tools often fail to distinguish among strain variants or require closely related reference genomes of the studied

  16. Tentacle: distributed quantification of genes in metagenomes.

    Science.gov (United States)

    Boulund, Fredrik; Sjögren, Anders; Kristiansson, Erik

    2015-01-01

    In metagenomics, microbial communities are sequenced at increasingly high resolution, generating datasets with billions of DNA fragments. Novel methods that can efficiently process the growing volumes of sequence data are necessary for the accurate analysis and interpretation of existing and upcoming metagenomes. Here we present Tentacle, which is a novel framework that uses distributed computational resources for gene quantification in metagenomes. Tentacle is implemented using a dynamic master-worker approach in which DNA fragments are streamed via a network and processed in parallel on worker nodes. Tentacle is modular, extensible, and comes with support for six commonly used sequence aligners. It is easy to adapt Tentacle to different applications in metagenomics and easy to integrate into existing workflows. Evaluations show that Tentacle scales very well with increasing computing resources. We illustrate the versatility of Tentacle on three different use cases. Tentacle is written for Linux in Python 2.7 and is published as open source under the GNU General Public License (v3). Documentation, tutorials, installation instructions, and the source code are freely available online at: http://bioinformatics.math.chalmers.se/tentacle.

  17. Snowball: strain aware gene assembly of metagenomes

    NARCIS (Netherlands)

    I. Gregor; A. Schönhuth (Alexander); A.C. McHardy (Alice)

    2016-01-01

    textabstractMotivation: Gene assembly is an important step in functional analysis of shotgun metagenomic data. Nonetheless, strain aware assembly remains a challenging task, as current assembly tools often fail to distinguish among strain variants or require closely related reference genomes of the

  18. Clustering metagenomic sequences with interpolated Markov models

    Directory of Open Access Journals (Sweden)

    Kelley David R

    2010-11-01

    Full Text Available Abstract Background Sequencing of environmental DNA (often called metagenomics has shown tremendous potential to uncover the vast number of unknown microbes that cannot be cultured and sequenced by traditional methods. Because the output from metagenomic sequencing is a large set of reads of unknown origin, clustering reads together that were sequenced from the same species is a crucial analysis step. Many effective approaches to this task rely on sequenced genomes in public databases, but these genomes are a highly biased sample that is not necessarily representative of environments interesting to many metagenomics projects. Results We present SCIMM (Sequence Clustering with Interpolated Markov Models, an unsupervised sequence clustering method. SCIMM achieves greater clustering accuracy than previous unsupervised approaches. We examine the limitations of unsupervised learning on complex datasets, and suggest a hybrid of SCIMM and supervised learning method Phymm called PHYSCIMM that performs better when evolutionarily close training genomes are available. Conclusions SCIMM and PHYSCIMM are highly accurate methods to cluster metagenomic sequences. SCIMM operates entirely unsupervised, making it ideal for environments containing mostly novel microbes. PHYSCIMM uses supervised learning to improve clustering in environments containing microbial strains from well-characterized genera. SCIMM and PHYSCIMM are available open source from http://www.cbcb.umd.edu/software/scimm.

  19. Knowledge Author: facilitating user-driven, domain content development to support clinical information extraction.

    Science.gov (United States)

    Scuba, William; Tharp, Melissa; Mowery, Danielle; Tseytlin, Eugene; Liu, Yang; Drews, Frank A; Chapman, Wendy W

    2016-06-23

    Clinical Natural Language Processing (NLP) systems require a semantic schema comprised of domain-specific concepts, their lexical variants, and associated modifiers to accurately extract information from clinical texts. An NLP system leverages this schema to structure concepts and extract meaning from the free texts. In the clinical domain, creating a semantic schema typically requires input from both a domain expert, such as a clinician, and an NLP expert who will represent clinical concepts created from the clinician's domain expertise into a computable format usable by an NLP system. The goal of this work is to develop a web-based tool, Knowledge Author, that bridges the gap between the clinical domain expert and the NLP system development by facilitating the development of domain content represented in a semantic schema for extracting information from clinical free-text. Knowledge Author is a web-based, recommendation system that supports users in developing domain content necessary for clinical NLP applications. Knowledge Author's schematic model leverages a set of semantic types derived from the Secondary Use Clinical Element Models and the Common Type System to allow the user to quickly create and modify domain-related concepts. Features such as collaborative development and providing domain content suggestions through the mapping of concepts to the Unified Medical Language System Metathesaurus database further supports the domain content creation process. Two proof of concept studies were performed to evaluate the system's performance. The first study evaluated Knowledge Author's flexibility to create a broad range of concepts. A dataset of 115 concepts was created of which 87 (76 %) were able to be created using Knowledge Author. The second study evaluated the effectiveness of Knowledge Author's output in an NLP system by extracting concepts and associated modifiers representing a clinical element, carotid stenosis, from 34 clinical free-text radiology

  20. THE EXTRACTION OF INDOOR BUILDING INFORMATION FROM BIM TO OGC INDOORGML

    Directory of Open Access Journals (Sweden)

    T.-A. Teo

    2017-07-01

    Full Text Available Indoor Spatial Data Infrastructure (indoor-SDI is an important SDI for geosptial analysis and location-based services. Building Information Model (BIM has high degree of details in geometric and semantic information for building. This study proposed direct conversion schemes to extract indoor building information from BIM to OGC IndoorGML. The major steps of the research include (1 topological conversion from building model into indoor network model; and (2 generation of IndoorGML. The topological conversion is a major process of generating and mapping nodes and edges from IFC to indoorGML. Node represents every space (e.g. IfcSpace and objects (e.g. IfcDoor in the building while edge shows the relationships between nodes. According to the definition of IndoorGML, the topological model in the dual space is also represented as a set of nodes and edges. These definitions of IndoorGML are the same as in the indoor network. Therefore, we can extract the necessary data in the indoor network and easily convert them into IndoorGML based on IndoorGML Schema. The experiment utilized a real BIM model to examine the proposed method. The experimental results indicated that the 3D indoor model (i.e. IndoorGML model can be automatically imported from IFC model by the proposed procedure. In addition, the geometric and attribute of building elements are completely and correctly converted from BIM to indoor-SDI.

  1. Information extraction approaches to unconventional data sources for "Injury Surveillance System": the case of newspapers clippings.

    Science.gov (United States)

    Berchialla, Paola; Scarinzi, Cecilia; Snidero, Silvia; Rahim, Yousif; Gregori, Dario

    2012-04-01

    Injury Surveillance Systems based on traditional hospital records or clinical data have the advantage of being a well established, highly reliable source of information for making an active surveillance on specific injuries, like choking in children. However, they suffer the drawback of delays in making data available to the analysis, due to inefficiencies in data collection procedures. In this sense, the integration of clinical based registries with unconventional data sources like newspaper articles has the advantage of making the system more useful for early alerting. Usage of such sources is difficult since information is only available in the form of free natural-language documents rather than structured databases as required by traditional data mining techniques. Information Extraction (IE) addresses the problem of transforming a corpus of textual documents into a more structured database. In this paper, on a corpora of Italian newspapers articles related to choking in children due to ingestion/inhalation of foreign body we compared the performance of three IE algorithms- (a) a classical rule based system which requires a manual annotation of the rules; (ii) a rule based system which allows for the automatic building of rules; (b) a machine learning method based on Support Vector Machine. Although some useful indications are extracted from the newspaper clippings, this approach is at the time far from being routinely implemented for injury surveillance purposes.

  2. Automated DICOM metadata and volumetric anatomical information extraction for radiation dosimetry

    Science.gov (United States)

    Papamichail, D.; Ploussi, A.; Kordolaimi, S.; Karavasilis, E.; Papadimitroulas, P.; Syrgiamiotis, V.; Efstathopoulos, E.

    2015-09-01

    Patient-specific dosimetry calculations based on simulation techniques have as a prerequisite the modeling of the modality system and the creation of voxelized phantoms. This procedure requires the knowledge of scanning parameters and patients’ information included in a DICOM file as well as image segmentation. However, the extraction of this information is complicated and time-consuming. The objective of this study was to develop a simple graphical user interface (GUI) to (i) automatically extract metadata from every slice image of a DICOM file in a single query and (ii) interactively specify the regions of interest (ROI) without explicit access to the radiology information system. The user-friendly application developed in Matlab environment. The user can select a series of DICOM files and manage their text and graphical data. The metadata are automatically formatted and presented to the user as a Microsoft Excel file. The volumetric maps are formed by interactively specifying the ROIs and by assigning a specific value in every ROI. The result is stored in DICOM format, for data and trend analysis. The developed GUI is easy, fast and and constitutes a very useful tool for individualized dosimetry. One of the future goals is to incorporate a remote access to a PACS server functionality.

  3. Metagenomic analyses of bacteria on human hairs: a qualitative assessment for applications in forensic science.

    Science.gov (United States)

    Tridico, Silvana R; Murray, Dáithí C; Addison, Jayne; Kirkbride, Kenneth P; Bunce, Michael

    2014-01-01

    Mammalian hairs are one of the most ubiquitous types of trace evidence collected in the course of forensic investigations. However, hairs that are naturally shed or that lack roots are problematic substrates for DNA profiling; these hair types often contain insufficient nuclear DNA to yield short tandem repeat (STR) profiles. Whilst there have been a number of initial investigations evaluating the value of metagenomics analyses for forensic applications (e.g. examination of computer keyboards), there have been no metagenomic evaluations of human hairs-a substrate commonly encountered during forensic practice. This present study attempts to address this forensic capability gap, by conducting a qualitative assessment into the applicability of metagenomic analyses of human scalp and pubic hair. Forty-two DNA extracts obtained from human scalp and pubic hairs generated a total of 79,766 reads, yielding 39,814 reads post control and abundance filtering. The results revealed the presence of unique combinations of microbial taxa that can enable discrimination between individuals and signature taxa indigenous to female pubic hairs. Microbial data from a single co-habiting couple added an extra dimension to the study by suggesting that metagenomic analyses might be of evidentiary value in sexual assault cases when other associative evidence is not present. Of all the data generated in this study, the next-generation sequencing (NGS) data generated from pubic hair held the most potential for forensic applications. Metagenomic analyses of human hairs may provide independent data to augment other forensic results and possibly provide association between victims of sexual assault and offender when other associative evidence is absent. Based on results garnered in the present study, we believe that with further development, bacterial profiling of hair will become a valuable addition to the forensic toolkit.

  4. Metagenomic Analysis of Koumiss in Kazakhstan

    Directory of Open Access Journals (Sweden)

    Samat Kozhakhmetov

    2014-12-01

    Full Text Available Introduction. Koumiss is a low-alcohol product made from fermented mare's milk, which is popular in Kazakhstan, Russia, and other countries of Central Asia, China, and Mongolia. Natural mare's milk is fermented in symbiosis of two types of microorganisms (lactobacteria and yeast. Koumiss’s microbial composition varies depending on the geographical, climatic, and cultural conditions. Based on a phenotypic characteristic from samples, Wu, R. and colleagues identified the following bacteria isolated in inner Mongolia, an autonomous region of China: L.casei, L.helveticus, L.plantarum, L.coryniformis subsp. coryniformis, L.paracasei, L.kefiranofaciens, L.curvatus, L.fermentum, and W.kandleri. Studies of the yeast composition in koumiss also showed significant variations. Thus, there were Saccharomyces unisporus related 48.3% of isolates, to Kluyveromyces marxianus (27.6%, Pichia membranaefaciens (15.0%, and Saccharomyces cerevisiae (9.2% from 87 isolated yeast cultures. The purpose of this study was to examine the bacterial composition in koumiss.Methods. To extract DNA, 1.8 ml of fermented milk was centrifuged to generate a pellet, which was suspended in 450 µl of lysis buffer P1 from the Powerfood Microbial DNA Isolation kit (MoBio Laboratories Inc, USA. Amplification of the microflora was used to determine the composition of a fragment of the gene 16S rRNA and ITS1. Plasmid library with target insertion was obtained on the basis of height copy plasmid vectors producing high pGem-T. The definition of direct nucleotide sequencing was performed by the method of Sanger using a set of "BigDye Terminanor v 3.1 Cycle sequencing Kit with automatic genetic analyzer ABI 3730xl  (Applied Biosystems, USA.  Informax Vector NTI Suite 9, Sequence Scanner v 1.0  software package used for the analysis.Results. Our studies showed that in the most samples of koumiss isolated from Akmola region (Central Kazakhstan prevailed the following bacteria species

  5. Metagenomic characterization of viral communities in Goseong Bay, Korea

    Science.gov (United States)

    Hwang, Jinik; Park, So Yun; Park, Mirye; Lee, Sukchan; Jo, Yeonhwa; Cho, Won Kyong; Lee, Taek-Kyun

    2016-12-01

    In this study, seawater samples were collected from Goseong Bay, Korea in March 2014 and viral populations were examined by metagenomics assembly. Enrichment of marine viral particles using FeCl3 followed by next-generation sequencing produced numerous sequences. De novo assembly and BLAST search showed that most of the obtained contigs were unknown sequences and only 0.74% of sequences were associated with known viruses. As a result, 138 viruses, including bacteriophages (87%), viruses infecting algae and others (13%) were identified. The identified 138 viruses were divided into 11 orders, 14 families, 34 genera, and 133 species. The dominant viruses were Pelagibacter phage HTVC010P and Roseobacter phage SIO1. The viruses infecting algae, including the Ostreococcus species, accounted for 9.4% of total identified viruses. In addition, we identified pathogenic herpes viruses infecting fishes and giant viruses infecting parasitic acanthamoeba species. This is a comprehensive study to reveal the viral populations in the Goseong Bay using metagenomics. The information associated with the marine viral community in Goseong Bay, Korea will be useful for comparative analysis in other marine viral communities.

  6. Accurate facade feature extraction method for buildings from three-dimensional point cloud data considering structural information

    Science.gov (United States)

    Wang, Yongzhi; Ma, Yuqing; Zhu, A.-xing; Zhao, Hui; Liao, Lixia

    2018-05-01

    Facade features represent segmentations of building surfaces and can serve as a building framework. Extracting facade features from three-dimensional (3D) point cloud data (3D PCD) is an efficient method for 3D building modeling. By combining the advantages of 3D PCD and two-dimensional optical images, this study describes the creation of a highly accurate building facade feature extraction method from 3D PCD with a focus on structural information. The new extraction method involves three major steps: image feature extraction, exploration of the mapping method between the image features and 3D PCD, and optimization of the initial 3D PCD facade features considering structural information. Results show that the new method can extract the 3D PCD facade features of buildings more accurately and continuously. The new method is validated using a case study. In addition, the effectiveness of the new method is demonstrated by comparing it with the range image-extraction method and the optical image-extraction method in the absence of structural information. The 3D PCD facade features extracted by the new method can be applied in many fields, such as 3D building modeling and building information modeling.

  7. Extracting Low-Frequency Information from Time Attenuation in Elastic Waveform Inversion

    Science.gov (United States)

    Guo, Xuebao; Liu, Hong; Shi, Ying; Wang, Weihong

    2017-03-01

    Low-frequency information is crucial for recovering background velocity, but the lack of low-frequency information in field data makes inversion impractical without accurate initial models. Laplace-Fourier domain waveform inversion can recover a smooth model from real data without low-frequency information, which can be used for subsequent inversion as an ideal starting model. In general, it also starts with low frequencies and includes higher frequencies at later inversion stages, while the difference is that its ultralow frequency information comes from the Laplace-Fourier domain. Meanwhile, a direct implementation of the Laplace-transformed wavefield using frequency domain inversion is also very convenient. However, because broad frequency bands are often used in the pure time domain waveform inversion, it is difficult to extract the wavefields dominated by low frequencies in this case. In this paper, low-frequency components are constructed by introducing time attenuation into the recorded residuals, and the rest of the method is identical to the traditional time domain inversion. Time windowing and frequency filtering are also applied to mitigate the ambiguity of the inverse problem. Therefore, we can start at low frequencies and to move to higher frequencies. The experiment shows that the proposed method can achieve a good inversion result in the presence of a linear initial model and records without low-frequency information.

  8. DEVELOPMENT OF AUTOMATIC EXTRACTION METHOD FOR ROAD UPDATE INFORMATION BASED ON PUBLIC WORK ORDER OUTLOOK

    Science.gov (United States)

    Sekimoto, Yoshihide; Nakajo, Satoru; Minami, Yoshitaka; Yamaguchi, Syohei; Yamada, Harutoshi; Fuse, Takashi

    Recently, disclosure of statistic data, representing financial effects or burden for public work, through each web site of national or local government, enables us to discuss macroscopic financial trends. However, it is still difficult to grasp a basic property nationwide how each spot was changed by public work. In this research, our research purpose is to collect road update information reasonably which various road managers provide, in order to realize efficient updating of various maps such as car navigation maps. In particular, we develop the system extracting public work concerned and registering summary including position information to database automatically from public work order outlook, released by each local government, combinating some web mining technologies. Finally, we collect and register several tens of thousands from web site all over Japan, and confirm the feasibility of our method.

  9. Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

    Directory of Open Access Journals (Sweden)

    Koji Iwano

    2007-03-01

    Full Text Available This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.

  10. Approaching the largest ‘API’: extracting information from the Internet with Python

    Directory of Open Access Journals (Sweden)

    Jonathan E. Germann

    2018-02-01

    Full Text Available This article explores the need for libraries to algorithmically access and manipulate the world’s largest API: the Internet. The billions of pages on the ‘Internet API’ (HTTP, HTML, CSS, XPath, DOM, etc. are easily accessible and manipulable. Libraries can assist in creating meaning through the datafication of information on the world wide web. Because most information is created for human consumption, some programming is required for automated extraction. Python is an easy-to-learn programming language with extensive packages and community support for web page automation. Four packages (Urllib, Selenium, BeautifulSoup, Scrapy in Python can automate almost any web page for all sized projects. An example warrant data project is explained to illustrate how well Python packages can manipulate web pages to create meaning through assembling custom datasets.

  11. Ten years of maintaining and expanding a microbial genome and metagenome analysis system.

    Science.gov (United States)

    Markowitz, Victor M; Chen, I-Min A; Chu, Ken; Pati, Amrita; Ivanova, Natalia N; Kyrpides, Nikos C

    2015-11-01

    Launched in March 2005, the Integrated Microbial Genomes (IMG) system is a comprehensive data management system that supports multidimensional comparative analysis of genomic data. At the core of the IMG system is a data warehouse that contains genome and metagenome datasets sequenced at the Joint Genome Institute or provided by scientific users, as well as public genome datasets available at the National Center for Biotechnology Information Genbank sequence data archive. Genomes and metagenome datasets are processed using IMG's microbial genome and metagenome sequence data processing pipelines and are integrated into the data warehouse using IMG's data integration toolkits. Microbial genome and metagenome application specific data marts and user interfaces provide access to different subsets of IMG's data and analysis toolkits. This review article revisits IMG's original aims, highlights key milestones reached by the system during the past 10 years, and discusses the main challenges faced by a rapidly expanding system, in particular the complexity of maintaining such a system in an academic setting with limited budgets and computing and data management infrastructure. Copyright © 2015 Elsevier Ltd. All rights reserved.

  12. EXTRACTING URBAN GROUND OBJECT INFORMATION FROM IMAGES AND LiDAR DATA

    Directory of Open Access Journals (Sweden)

    L. Yi

    2016-06-01

    Full Text Available To deal with the problem of urban ground object information extraction, the paper proposes an object-oriented classification method using aerial image and LiDAR data. Firstly, we select the optimal segmentation scales of different ground objects and synthesize them to get accurate object boundaries. Then, this paper uses ReliefF algorithm to select the optimal feature combination and eliminate the Hughes phenomenon. Eventually, the multiple classifier combination method is applied to get the outcome of the classification. In order to validate the feasible of this method, this paper selects two experimental regions in Stuttgart and Germany (Region A and B, covers 0.21 km2 and 1.1 km2 respectively. The aim of the first experiment on the Region A is to get the optimal segmentation scales and classification features. The overall accuracy of the classification reaches to 93.3 %. The purpose of the experiment on region B is to validate the application-ability of this method for a large area, which is turned out to be reaches 88.4 % overall accuracy. In the end of this paper, the conclusion shows that the proposed method can be performed accurately and efficiently in terms of urban ground information extraction and be of high application value.

  13. Geopositioning with a quadcopter: Extracted feature locations and predicted accuracy without a priori sensor attitude information

    Science.gov (United States)

    Dolloff, John; Hottel, Bryant; Edwards, David; Theiss, Henry; Braun, Aaron

    2017-05-01

    This paper presents an overview of the Full Motion Video-Geopositioning Test Bed (FMV-GTB) developed to investigate algorithm performance and issues related to the registration of motion imagery and subsequent extraction of feature locations along with predicted accuracy. A case study is included corresponding to a video taken from a quadcopter. Registration of the corresponding video frames is performed without the benefit of a priori sensor attitude (pointing) information. In particular, tie points are automatically measured between adjacent frames using standard optical flow matching techniques from computer vision, an a priori estimate of sensor attitude is then computed based on supplied GPS sensor positions contained in the video metadata and a photogrammetric/search-based structure from motion algorithm, and then a Weighted Least Squares adjustment of all a priori metadata across the frames is performed. Extraction of absolute 3D feature locations, including their predicted accuracy based on the principles of rigorous error propagation, is then performed using a subset of the registered frames. Results are compared to known locations (check points) over a test site. Throughout this entire process, no external control information (e.g. surveyed points) is used other than for evaluation of solution errors and corresponding accuracy.

  14. MEBS, a software platform to evaluate large (meta)genomic collections according to their metabolic machinery: unraveling the sulfur cycle.

    Science.gov (United States)

    De Anda, Valerie; Zapata-Peñasco, Icoquih; Poot-Hernandez, Augusto Cesar; Eguiarte, Luis E; Contreras-Moreira, Bruno; Souza, Valeria

    2017-11-01

    The increasing number of metagenomic and genomic sequences has dramatically improved our understanding of microbial diversity, yet our ability to infer metabolic capabilities in such datasets remains challenging. We describe the Multigenomic Entropy Based Score pipeline (MEBS), a software platform designed to evaluate, compare, and infer complex metabolic pathways in large "omic" datasets, including entire biogeochemical cycles. MEBS is open source and available through https://github.com/eead-csic-compbio/metagenome_Pfam_score. To demonstrate its use, we modeled the sulfur cycle by exhaustively curating the molecular and ecological elements involved (compounds, genes, metabolic pathways, and microbial taxa). This information was reduced to a collection of 112 characteristic Pfam protein domains and a list of complete-sequenced sulfur genomes. Using the mathematical framework of relative entropy (H΄), we quantitatively measured the enrichment of these domains among sulfur genomes. The entropy of each domain was used both to build up a final score that indicates whether a (meta)genomic sample contains the metabolic machinery of interest and to propose marker domains in metagenomic sequences such as DsrC (PF04358). MEBS was benchmarked with a dataset of 2107 non-redundant microbial genomes from RefSeq and 935 metagenomes from MG-RAST. Its performance, reproducibility, and robustness were evaluated using several approaches, including random sampling, linear regression models, receiver operator characteristic plots, and the area under the curve metric (AUC). Our results support the broad applicability of this algorithm to accurately classify (AUC = 0.985) hard-to-culture genomes (e.g., Candidatus Desulforudis audaxviator), previously characterized ones, and metagenomic environments such as hydrothermal vents, or deep-sea sediment. Our benchmark indicates that an entropy-based score can capture the metabolic machinery of interest and can be used to efficiently classify

  15. Zone analysis in biology articles as a basis for information extraction.

    Science.gov (United States)

    Mizuta, Yoko; Korhonen, Anna; Mullen, Tony; Collier, Nigel

    2006-06-01

    In the field of biomedicine, an overwhelming amount of experimental data has become available as a result of the high throughput of research in this domain. The amount of results reported has now grown beyond the limits of what can be managed by manual means. This makes it increasingly difficult for the researchers in this area to keep up with the latest developments. Information extraction (IE) in the biological domain aims to provide an effective automatic means to dynamically manage the information contained in archived journal articles and abstract collections and thus help researchers in their work. However, while considerable advances have been made in certain areas of IE, pinpointing and organizing factual information (such as experimental results) remains a challenge. In this paper we propose tackling this task by incorporating into IE information about rhetorical zones, i.e. classification of spans of text in terms of argumentation and intellectual attribution. As the first step towards this goal, we introduce a scheme for annotating biological texts for rhetorical zones and provide a qualitative and quantitative analysis of the data annotated according to this scheme. We also discuss our preliminary research on automatic zone analysis, and its incorporation into our IE framework.

  16. GenomePeek—an online tool for prokaryotic genome and metagenome analysis

    Directory of Open Access Journals (Sweden)

    Katelyn McNair

    2015-06-01

    Full Text Available As more and more prokaryotic sequencing takes place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations; such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping error rates low, as well as offering unique data visualization options.

  17. Nonlinear electrophoresis for purification of soil DNA for metagenomics.

    Science.gov (United States)

    Engel, Katja; Pinnell, Lee; Cheng, Jiujun; Charles, Trevor C; Neufeld, Josh D

    2012-01-01

    Purification of microbial DNA from soil is challenging due to the co-extraction of humic acids and associated phenolic compounds that inhibit subsequent cloning, amplification or sequencing. Removal of these contaminants is critical for the success of metagenomic library construction and high-throughput sequencing of extracted DNA. Using three different composite soil samples, we compared a novel DNA purification technique using nonlinear electrophoresis on the synchronous coefficient of drag alteration (SCODA) instrument with alternate purification methods such as direct current (DC) agarose gel electrophoresis followed by gel filtration or anion exchange chromatography, Wizard DNA Clean-Up System, and the PowerSoil DNA Isolation kit. Both nonlinear and DC electrophoresis were effective at retrieving high-molecular weight DNA with high purity, suitable for construction of large-insert libraries. The PowerSoil DNA Isolation kit and the nonlinear electrophoresis had high recovery of high purity DNA suitable for sequencing purposes. All methods demonstrated high consistency in the bacterial community profiles generated from the DNA extracts. Nonlinear electrophoresis using the SCODA instrument was the ideal methodology for the preparation of soil DNA samples suitable for both high-throughput sequencing and large-insert cloning applications. Copyright © 2011 Elsevier B.V. All rights reserved.

  18. An Experimental Metagenome Data Management and AnalysisSystem

    Energy Technology Data Exchange (ETDEWEB)

    Markowitz, Victor M.; Korzeniewski, Frank; Palaniappan, Krishna; Szeto, Ernest; Ivanova, Natalia N.; Kyrpides, Nikos C.; Hugenholtz, Philip

    2006-03-01

    The application of shotgun sequencing to environmental samples has revealed a new universe of microbial community genomes (metagenomes) involving previously uncultured organisms. Metagenome analysis, which is expected to provide a comprehensive picture of the gene functions and metabolic capacity of microbial community, needs to be conducted in the context of a comprehensive data management and analysis system. We present in this paper IMG/M, an experimental metagenome data management and analysis system that is based on the Integrated Microbial Genomes (IMG) system. IMG/M provides tools and viewers for analyzing both metagenomes and isolate genomes individually or in a comparative context.

  19. SmashCommunity: A metagenomic annotation and analysis tool

    DEFF Research Database (Denmark)

    Arumugam, Manimozhiyan; Harrington, Eoghan D; Foerstner, Konrad U

    2010-01-01

    SUMMARY: SmashCommunity is a stand-alone metagenomic annotation and analysis pipeline suitable for data from Sanger and 454 sequencing technologies. It supports state-of-the-art software for essential metagenomic tasks such as assembly and gene prediction. It provides tools to estimate the quanti......SUMMARY: SmashCommunity is a stand-alone metagenomic annotation and analysis pipeline suitable for data from Sanger and 454 sequencing technologies. It supports state-of-the-art software for essential metagenomic tasks such as assembly and gene prediction. It provides tools to estimate...

  20. Metagenomic Detection Methods in Biopreparedness Outbreak Scenarios

    DEFF Research Database (Denmark)

    Karlsson, Oskar Erik; Hansen, Trine; Knutsson, Rickard

    2013-01-01

    In the field of diagnostic microbiology, rapid molecular methods are critically important for detecting pathogens. With rapid and accurate detection, preventive measures can be put in place early, thereby preventing loss of life and further spread of a disease. From a preparedness perspective...... of a clinical sample, creating a metagenome, in a single week of laboratory work. As new technologies emerge, their dissemination and capacity building must be facilitated, and criteria for use, as well as guidelines on how to report results, must be established. This article focuses on the use of metagenomics......, from sample collection to data analysis and to some extent NGS, for the detection of pathogens, the integration of the technique in outbreak response systems, and the risk-based evaluation of sample processing in routine diagnostics labs. The article covers recent advances in the field, current debate...

  1. Automated Information Extraction on Treatment and Prognosis for Non-Small Cell Lung Cancer Radiotherapy Patients: Clinical Study.

    Science.gov (United States)

    Zheng, Shuai; Jabbour, Salma K; O'Reilly, Shannon E; Lu, James J; Dong, Lihua; Ding, Lijuan; Xiao, Ying; Yue, Ning; Wang, Fusheng; Zou, Wei

    2018-02-01

    In outcome studies of oncology patients undergoing radiation, researchers extract valuable information from medical records generated before, during, and after radiotherapy visits, such as survival data, toxicities, and complications. Clinical studies rely heavily on these data to correlate the treatment regimen with the prognosis to develop evidence-based radiation therapy paradigms. These data are available mainly in forms of narrative texts or table formats with heterogeneous vocabularies. Manual extraction of the related information from these data can be time consuming and labor intensive, which is not ideal for large studies. The objective of this study was to adapt the interactive information extraction platform Information and Data Extraction using Adaptive Learning (IDEAL-X) to extract treatment and prognosis data for patients with locally advanced or inoperable non-small cell lung cancer (NSCLC). We transformed patient treatment and prognosis documents into normalized structured forms using the IDEAL-X system for easy data navigation. The adaptive learning and user-customized controlled toxicity vocabularies were applied to extract categorized treatment and prognosis data, so as to generate structured output. In total, we extracted data from 261 treatment and prognosis documents relating to 50 patients, with overall precision and recall more than 93% and 83%, respectively. For toxicity information extractions, which are important to study patient posttreatment side effects and quality of life, the precision and recall achieved 95.7% and 94.5% respectively. The IDEAL-X system is capable of extracting study data regarding NSCLC chemoradiation patients with significant accuracy and effectiveness, and therefore can be used in large-scale radiotherapy clinical data studies. ©Shuai Zheng, Salma K Jabbour, Shannon E O'Reilly, James J Lu, Lihua Dong, Lijuan Ding, Ying Xiao, Ning Yue, Fusheng Wang, Wei Zou. Originally published in JMIR Medical Informatics (http

  2. Explaining diversity in metagenomic datasets by phylogenetic-based feature weighting.

    Directory of Open Access Journals (Sweden)

    Davide Albanese

    2015-03-01

    Full Text Available Metagenomics is revolutionizing our understanding of microbial communities, showing that their structure and composition have profound effects on the ecosystem and in a variety of health and disease conditions. Despite the flourishing of new analysis methods, current approaches based on statistical comparisons between high-level taxonomic classes often fail to identify the microbial taxa that are differentially distributed between sets of samples, since in many cases the taxonomic schema do not allow an adequate description of the structure of the microbiota. This constitutes a severe limitation to the use of metagenomic data in therapeutic and diagnostic applications. To provide a more robust statistical framework, we introduce a class of feature-weighting algorithms that discriminate the taxa responsible for the classification of metagenomic samples. The method unambiguously groups the relevant taxa into clades without relying on pre-defined taxonomic categories, thus including in the analysis also those sequences for which a taxonomic classification is difficult. The phylogenetic clades are weighted and ranked according to their abundance measuring their contribution to the differentiation of the classes of samples, and a criterion is provided to define a reduced set of most relevant clades. Applying the method to public datasets, we show that the data-driven definition of relevant phylogenetic clades accomplished by our ranking strategy identifies features in the samples that are lost if phylogenetic relationships are not considered, improving our ability to mine metagenomic datasets. Comparison with supervised classification methods currently used in metagenomic data analysis highlights the advantages of using phylogenetic information.

  3. Exploration of soil metagenome diversity for prospection of enzymes involved in lignocellulosic biomass conversion

    Energy Technology Data Exchange (ETDEWEB)

    Alvarez, T.M.; Squina, F.M. [Laboratorio Nacional de Luz Sincrotron (LNLS), Campinas, SP (Brazil); Paixao, D.A.A.; Franco Cairo, J.P.L.; Buchli, F.; Ruller, R. [Laboratorio Nacional de Ciencia e Tecnologia do Bioetanol (CTBE), Campinas, SP (Brazil); Prade, R. [Oklahoma State University, Sillwater, OK (United States)

    2012-07-01

    Full text: Metagenomics allows access to genetic information encoded in DNA of microorganisms recalcitrant to cultivation. They represent a reservoir of novel biocatalyst with potential application in environmental friendly techniques aiming to overcome the dependence on fossil fuels and also to diminish air and water pollution. The focus of our work is the generation of a tool kit of lignocellulolytic enzymes from soil metagenome, which could be used for second generation ethanol production. Environmental samples were collected at a sugarcane field after harvesting, where it is expected that the microbial population involved on lignocellulose degradation was enriched due to the presence of straws covering the soil. Sugarcane Bagasse-Degrading-Soil (SBDS) metagenome was massively-parallel-454-Roche-sequenced. We identified a full repertoire of genes with significant match to glycosyl hydrolases catalytic domain and carbohydrate-binding modules. Soil metagenomics libraries cloned into pUC19 were screened through functional assays. CMC-agar screening resulted in positive clones, revealing new cellulases coding genes. Through a CMC-zymogram it was possible to observe that one of these genes, nominated as E-1, corresponds to an enzyme that is secreted to the extracellular medium, suggesting that the cloned gene carried the original signal peptide. Enzymatic assays and analysis through capillary electrophoresis showed that E-1 was able to cleave internal glycosidic bonds of cellulose. New rounds of functional screenings through chromogenic substrates are being conducted aiming the generation of a library of lignocellulolytic enzymes derived from soil metagenome, which may become key component for development of second generation biofuels. (author)

  4. EXTRACTING TEMPORAL AND SPATIAL DISTRIBUTIONS INFORMATION ABOUT ALGAL GLOOMS BASED ON MULTITEMPORAL MODIS

    Directory of Open Access Journals (Sweden)

    L. Chunguang

    2012-07-01

    Full Text Available Based on MODIS remote sensing data, method and technology to extraction the time and space distribution information of algae bloom is studied and established. The dynamic feature of time and space in Taihu Lake from 2009 to 2011 can be obtained by extracted method. Variation of cyanobacterial bloom in the Taihu Lake is analyzed and discussed. The algae bloom frequency index (AFI and algae bloom sustainability index (ASI is important criterion which can show the interannual and inter-monthly variation in the whole area or the subregion of Taihu Lake. Utilizing the AFI and ASI from 2009 to 2011, it found some phenomena that: the booming frequency decreased from the north and west to the East and South of Taihu Lake. The annual month algae bloom variation of AFI reflect the booming existing twin peaks in the high shock level and lag trend in general. In the subregion statistics, the IBD and ASI in 2011 show the abnormal condition in the border between the Gongshan Bay and Central Lake. The date is obvious earlier than that on the same subregion in previous years and that on others subregion in the same year.

  5. Enriching a document collection by integrating information extraction and PDF annotation

    Science.gov (United States)

    Powley, Brett; Dale, Robert; Anisimoff, Ilya

    2009-01-01

    Modern digital libraries offer all the hyperlinking possibilities of the World Wide Web: when a reader finds a citation of interest, in many cases she can now click on a link to be taken to the cited work. This paper presents work aimed at providing the same ease of navigation for legacy PDF document collections that were created before the possibility of integrating hyperlinks into documents was ever considered. To achieve our goal, we need to carry out two tasks: first, we need to identify and link citations and references in the text with high reliability; and second, we need the ability to determine physical PDF page locations for these elements. We demonstrate the use of a high-accuracy citation extraction algorithm which significantly improves on earlier reported techniques, and a technique for integrating PDF processing with a conventional text-stream based information extraction pipeline. We demonstrate these techniques in the context of a particular document collection, this being the ACL Anthology; but the same approach can be applied to other document sets.

  6. Videomicroscopic extraction of specific information on cell proliferation and migration in vitro

    International Nuclear Information System (INIS)

    Debeir, Olivier; Megalizzi, Veronique; Warzee, Nadine; Kiss, Robert; Decaestecker, Christine

    2008-01-01

    In vitro cell imaging is a useful exploratory tool for cell behavior monitoring with a wide range of applications in cell biology and pharmacology. Combined with appropriate image analysis techniques, this approach has been shown to provide useful information on the detection and dynamic analysis of cell events. In this context, numerous efforts have been focused on cell migration analysis. In contrast, the cell division process has been the subject of fewer investigations. The present work focuses on this latter aspect and shows that, in complement to cell migration data, interesting information related to cell division can be extracted from phase-contrast time-lapse image series, in particular cell division duration, which is not provided by standard cell assays using endpoint analyses. We illustrate our approach by analyzing the effects induced by two sigma-1 receptor ligands (haloperidol and 4-IBP) on the behavior of two glioma cell lines using two in vitro cell models, i.e., the low-density individual cell model and the high-density scratch wound model. This illustration also shows that the data provided by our approach are suggestive as to the mechanism of action of compounds, and are thus capable of informing the appropriate selection of further time-consuming and more expensive biological evaluations required to elucidate a mechanism

  7. Feature extraction and learning using context cue and Rényi entropy based mutual information

    DEFF Research Database (Denmark)

    Pan, Hong; Olsen, Søren Ingvor; Zhu, Yaping

    2015-01-01

    Feature extraction and learning play a critical role for visual perception tasks. We focus on improving the robustness of the kernel descriptors (KDES) by embedding context cues and further learning a compact and discriminative feature codebook for feature reduction using Rényi entropy based mutu....... Experimental results show that our method has promising potential for visual object recognition and detection applications....... as the information about the underlying labels of the CKD using CSQMI. Thus the resulting codebook and reduced CKD are discriminative. We verify the effectiveness of our method on several public image benchmark datasets such as YaleB, Caltech-101 and CIFAR-10, as well as a challenging chicken feet dataset of our own...

  8. Developing a Process Model for the Forensic Extraction of Information from Desktop Search Applications

    Directory of Open Access Journals (Sweden)

    Timothy Pavlic

    2008-03-01

    Full Text Available Desktop search applications can contain cached copies of files that were deleted from the file system. Forensic investigators see this as a potential source of evidence, as documents deleted by suspects may still exist in the cache. Whilst there have been attempts at recovering data collected by desktop search applications, there is no methodology governing the process, nor discussion on the most appropriate means to do so. This article seeks to address this issue by developing a process model that can be applied when developing an information extraction application for desktop search applications, discussing preferred methods and the limitations of each. This work represents a more structured approach than other forms of current research.

  9. 5W1H Information Extraction with CNN-Bidirectional LSTM

    Science.gov (United States)

    Nurdin, A.; Maulidevi, N. U.

    2018-03-01

    In this work, information about who, did what, when, where, why, and how on Indonesian news articles were extracted by combining Convolutional Neural Network and Bidirectional Long Short-Term Memory. Convolutional Neural Network can learn semantically meaningful representations of sentences. Bidirectional LSTM can analyze the relations among words in the sequence. We also use word embedding word2vec for word representation. By combining these algorithms, we obtained F-measure 0.808. Our experiments show that CNN-BLSTM outperforms other shallow methods, namely IBk, C4.5, and Naïve Bayes with the F-measure 0.655, 0.645, and 0.595, respectively.

  10. 3D information extraction based on a novel x ray imaging system

    Science.gov (United States)

    Yu, Chunyu; Kong, Lingli; Zhang, Junju; Zhang, Shengdong

    2011-08-01

    In this paper, a novel x-ray imaging system was introduced. It was a CCD based system, but different from the traditional CCD based x-ray imaging system, which was composed of the x-ray intensifying screen, the CCD and the low light level image intensifier, specially using the zoom lens for coupling. Zoom lens can give a continuous variable visual field, which not only reduce the geometrical blur but also can produce several image pairs for stereo imaging. It is convenient for three dimension information extraction from a group of two dimension x-ray images and is valuable for stereovision radiography in the application of medical diagnosis, security checking, non-destructive testing, and industry detection. This stereo imaging method is also referential for the three dimension reconstruction daily living.

  11. Quantitative evaluation of translational medicine based on scientometric analysis and information extraction.

    Science.gov (United States)

    Zhang, Yin; Diao, Tianxi; Wang, Lei

    2014-12-01

    Designed to advance the two-way translational process between basic research and clinical practice, translational medicine has become one of the most important areas in biomedicine. The quantitative evaluation of translational medicine is valuable for the decision making of global translational medical research and funding. Using the scientometric analysis and information extraction techniques, this study quantitatively analyzed the scientific articles on translational medicine. The results showed that translational medicine had significant scientific output and impact, specific core field and institute, and outstanding academic status and benefit. While it is not considered in this study, the patent data are another important indicators that should be integrated in the relevant research in the future. © 2014 Wiley Periodicals, Inc.

  12. EnvMine: A text-mining system for the automatic extraction of contextual information

    Directory of Open Access Journals (Sweden)

    de Lorenzo Victor

    2010-06-01

    Full Text Available Abstract Background For ecological studies, it is crucial to count on adequate descriptions of the environments and samples being studied. Such a description must be done in terms of their physicochemical characteristics, allowing a direct comparison between different environments that would be difficult to do otherwise. Also the characterization must include the precise geographical location, to make possible the study of geographical distributions and biogeographical patterns. Currently, there is no schema for annotating these environmental features, and these data have to be extracted from textual sources (published articles. So far, this had to be performed by manual inspection of the corresponding documents. To facilitate this task, we have developed EnvMine, a set of text-mining tools devoted to retrieve contextual information (physicochemical variables and geographical locations from textual sources of any kind. Results EnvMine is capable of retrieving the physicochemical variables cited in the text, by means of the accurate identification of their associated units of measurement. In this task, the system achieves a recall (percentage of items retrieved of 92% with less than 1% error. Also a Bayesian classifier was tested for distinguishing parts of the text describing environmental characteristics from others dealing with, for instance, experimental settings. Regarding the identification of geographical locations, the system takes advantage of existing databases such as GeoNames to achieve 86% recall with 92% precision. The identification of a location includes also the determination of its exact coordinates (latitude and longitude, thus allowing the calculation of distance between the individual locations. Conclusion EnvMine is a very efficient method for extracting contextual information from different text sources, like published articles or web pages. This tool can help in determining the precise location and physicochemical

  13. Metagenomics of microbial life in extreme temperature environments.

    Science.gov (United States)

    Lewin, Anna; Wentzel, Alexander; Valla, Svein

    2013-06-01

    Microbial life in extreme environments is attracting broad scientific interest. Knowledge about it helps in defining the boundaries for life to exist, and organisms living under extreme conditions are also interesting sources for enzymes with unusual and desirable properties. The tremendous progress in DNA sequencing technologies now makes it relatively easy to gain a representative overview of the composition of such communities, and many community studies have in the last decade applied metagenomics to characterize habitats extreme in, for example, temperature, salt and acidity. The future challenges in the field are likely to become more and more related to the conversion of the expected massive amounts of sequence information into an understanding of the corresponding biological community functions. Copyright © 2012 Elsevier Ltd. All rights reserved.

  14. Meta4: a web application for sharing and annotating metagenomic gene predictions using web services.

    Science.gov (United States)

    Richardson, Emily J; Escalettes, Franck; Fotheringham, Ian; Wallace, Robert J; Watson, Mick

    2013-01-01

    Whole-genome shotgun metagenomics experiments produce DNA sequence data from entire ecosystems, and provide a huge amount of novel information. Gene discovery projects require up-to-date information about sequence homology and domain structure for millions of predicted proteins to be presented in a simple, easy-to-use system. There is a lack of simple, open, flexible tools that allow the rapid sharing of metagenomics datasets with collaborators in a format they can easily interrogate. We present Meta4, a flexible and extensible web application that can be used to share and annotate metagenomic gene predictions. Proteins and predicted domains are stored in a simple relational database, with a dynamic front-end which displays the results in an internet browser. Web services are used to provide up-to-date information about the proteins from homology searches against public databases. Information about Meta4 can be found on the project website, code is available on Github, a cloud image is available, and an example implementation can be seen at.

  15. Extracting contextual information in digital imagery: applications to automatic target recognition and mammography

    Science.gov (United States)

    Spence, Clay D.; Sajda, Paul; Pearson, John C.

    1996-02-01

    An important problem in image analysis is finding small objects in large images. The problem is challenging because (1) searching a large image is computationally expensive, and (2) small targets (on the order of a few pixels in size) have relatively few distinctive features which enable them to be distinguished from non-targets. To overcome these challenges we have developed a hierarchical neural network (HNN) architecture which combines multi-resolution pyramid processing with neural networks. The advantages of the architecture are: (1) both neural network training and testing can be done efficiently through coarse-to-fine techniques, and (2) such a system is capable of learning low-resolution contextual information to facilitate the detection of small target objects. We have applied this neural network architecture to two problems in which contextual information appears to be important for detecting small targets. The first problem is one of automatic target recognition (ATR), specifically the problem of detecting buildings in aerial photographs. The second problem focuses on a medical application, namely searching mammograms for microcalcifications, which are cues for breast cancer. Receiver operating characteristic (ROC) analysis suggests that the hierarchical architecture improves the detection accuracy for both the ATR and microcalcification detection problems, reducing false positive rates by a significant factor. In addition, we have examined the hidden units at various levels of the processing hierarchy and found what appears to be representations of road location (for the ATR example) and ductal/vasculature location (for mammography), both of which are in agreement with the contextual information used by humans to find these classes of targets. We conclude that this hierarchical neural network architecture is able to automatically extract contextual information in imagery and utilize it for target detection.

  16. Stable isotope probing in the metagenomics era: a bridge towards improved bioremediation

    Science.gov (United States)

    Uhlik, Ondrej; Leewis, Mary-Cathrine; Strejcek, Michal; Musilova, Lucie; Mackova, Martina; Leigh, Mary Beth; Macek, Tomas

    2012-01-01

    Microbial biodegradation and biotransformation reactions are essential to most bioremediation processes, yet the specific organisms, genes, and mechanisms involved are often not well understood. Stable isotope probing (SIP) enables researchers to directly link microbial metabolic capability to phylogenetic and metagenomic information within a community context by tracking isotopically labeled substances into phylogenetically and functionally informative biomarkers. SIP is thus applicable as a tool for the identification of active members of the microbial community and associated genes integral to the community functional potential, such as biodegradative processes. The rapid evolution of SIP over the last decade and integration with metagenomics provides researchers with a much deeper insight into potential biodegradative genes, processes, and applications, thereby enabling an improved mechanistic understanding that can facilitate advances in the field of bioremediation. PMID:23022353

  17. How to Mine Information from Each Instance to Extract an Abbreviated and Credible Logical Rule

    Directory of Open Access Journals (Sweden)

    Limin Wang

    2014-10-01

    Full Text Available Decision trees are particularly promising in symbolic representation and reasoning due to their comprehensible nature, which resembles the hierarchical process of human decision making. However, their drawbacks, caused by the single-tree structure,cannot be ignored. A rigid decision path may cause the majority class to overwhelm otherclass when dealing with imbalanced data sets, and pruning removes not only superfluousnodes, but also subtrees. The proposed learning algorithm, flexible hybrid decision forest(FHDF, mines information implicated in each instance to form logical rules on the basis of a chain rule of local mutual information, then forms different decision tree structures and decision forests later. The most credible decision path from the decision forest can be selected to make a prediction. Furthermore, functional dependencies (FDs, which are extracted from the whole data set based on association rule analysis, perform embedded attribute selection to remove nodes rather than subtrees, thus helping to achieve different levels of knowledge representation and improve model comprehension in the framework of semi-supervised learning. Naive Bayes replaces the leaf nodes at the bottom of the tree hierarchy, where the conditional independence assumption may hold. This technique reduces the potential for overfitting and overtraining and improves the prediction quality and generalization. Experimental results on UCI data sets demonstrate the efficacy of the proposed approach.

  18. Unsupervised Symbolization of Signal Time Series for Extraction of the Embedded Information

    Directory of Open Access Journals (Sweden)

    Yue Li

    2017-03-01

    Full Text Available This paper formulates an unsupervised algorithm for symbolization of signal time series to capture the embedded dynamic behavior. The key idea is to convert time series of the digital signal into a string of (spatially discrete symbols from which the embedded dynamic information can be extracted in an unsupervised manner (i.e., no requirement for labeling of time series. The main challenges here are: (1 definition of the symbol assignment for the time series; (2 identification of the partitioning segment locations in the signal space of time series; and (3 construction of probabilistic finite-state automata (PFSA from the symbol strings that contain temporal patterns. The reported work addresses these challenges by maximizing the mutual information measures between symbol strings and PFSA states. The proposed symbolization method has been validated by numerical simulation as well as by experimentation in a laboratory environment. Performance of the proposed algorithm has been compared to that of two commonly used algorithms of time series partitioning.

  19. Measuring nuclear reaction cross sections to extract information on neutrinoless double beta decay

    Science.gov (United States)

    Cavallaro, M.; Cappuzzello, F.; Agodi, C.; Acosta, L.; Auerbach, N.; Bellone, J.; Bijker, R.; Bonanno, D.; Bongiovanni, D.; Borello-Lewin, T.; Boztosun, I.; Branchina, V.; Bussa, M. P.; Calabrese, S.; Calabretta, L.; Calanna, A.; Calvo, D.; Carbone, D.; Chávez Lomelí, E. R.; Coban, A.; Colonna, M.; D’Agostino, G.; De Geronimo, G.; Delaunay, F.; Deshmukh, N.; de Faria, P. N.; Ferraresi, C.; Ferreira, J. L.; Finocchiaro, P.; Fisichella, M.; Foti, A.; Gallo, G.; Garcia, U.; Giraudo, G.; Greco, V.; Hacisalihoglu, A.; Kotila, J.; Iazzi, F.; Introzzi, R.; Lanzalone, G.; Lavagno, A.; La Via, F.; Lay, J. A.; Lenske, H.; Linares, R.; Litrico, G.; Longhitano, F.; Lo Presti, D.; Lubian, J.; Medina, N.; Mendes, D. R.; Muoio, A.; Oliveira, J. R. B.; Pakou, A.; Pandola, L.; Petrascu, H.; Pinna, F.; Reito, S.; Rifuggiato, D.; Rodrigues, M. R. D.; Russo, A. D.; Russo, G.; Santagati, G.; Santopinto, E.; Sgouros, O.; Solakci, S. O.; Souliotis, G.; Soukeras, V.; Spatafora, A.; Torresi, D.; Tudisco, S.; Vsevolodovna, R. I. M.; Wheadon, R. J.; Yildirin, A.; Zagatto, V. A. B.

    2018-02-01

    Neutrinoless double beta decay (0vββ) is considered the best potential resource to access the absolute neutrino mass scale. Moreover, if observed, it will signal that neutrinos are their own anti-particles (Majorana particles). Presently, this physics case is one of the most important research “beyond Standard Model” and might guide the way towards a Grand Unified Theory of fundamental interactions. Since the 0vββ decay process involves nuclei, its analysis necessarily implies nuclear structure issues. In the NURE project, supported by a Starting Grant of the European Research Council (ERC), nuclear reactions of double charge-exchange (DCE) are used as a tool to extract information on the 0vββ Nuclear Matrix Elements. In DCE reactions and ββ decay indeed the initial and final nuclear states are the same and the transition operators have similar structure. Thus the measurement of the DCE absolute cross-sections can give crucial information on ββ matrix elements. In a wider view, the NUMEN international collaboration plans a major upgrade of the INFN-LNS facilities in the next years in order to increase the experimental production of nuclei of at least two orders of magnitude, thus making feasible a systematic study of all the cases of interest as candidates for 0vββ.

  20. Draft Genome Sequence of Uncultured SAR324 Bacterium lautmerah10, Binned from a Red Sea Metagenome

    KAUST Repository

    Haroon, Mohamed

    2016-02-11

    A draft genome of SAR324 bacterium lautmerah10 was assembled from a metagenome of a surface water sample from the Red Sea, Saudi Arabia. The genome is more complete and has a higher G+C content than that of previously sequenced SAR324 representatives. Its genomic information shows a versatile metabolism that confers an advantage to SAR324, which is reflected in its distribution throughout different depths of the marine water column.

  1. Cross-cutting activities: Soil quality and soil metagenomics

    OpenAIRE

    Motavalli, Peter P.; Garrett, Karen A.

    2008-01-01

    This presentation reports on the work of the SANREM CRSP cross-cutting activities "Assessing and Managing Soil Quality for Sustainable Agricultural Systems" and "Soil Metagenomics to Construct Indicators of Soil Degradation." The introduction gives an overview of the extensiveness of soil degradation globally and defines soil quality. The objectives of the soil quality cross cutting activity are: CCRA-4 (Soil Metagenomics)

  2. Dual-wavelength phase-shifting digital holography selectively extracting wavelength information from wavelength-multiplexed holograms.

    Science.gov (United States)

    Tahara, Tatsuki; Mori, Ryota; Kikunaga, Shuhei; Arai, Yasuhiko; Takaki, Yasuhiro

    2015-06-15

    Dual-wavelength phase-shifting digital holography that selectively extracts wavelength information from five wavelength-multiplexed holograms is presented. Specific phase shifts for respective wavelengths are introduced to remove the crosstalk components and extract only the object wave at the desired wavelength from the holograms. Object waves in multiple wavelengths are selectively extracted by utilizing 2π ambiguity and the subtraction procedures based on phase-shifting interferometry. Numerical results show the validity of the proposed technique. The proposed technique is also experimentally demonstrated.

  3. Metagenomics of Bacterial Diversity in Villa Luz Caves with Sulfur Water Springs

    Directory of Open Access Journals (Sweden)

    Giuseppe D’Auria

    2018-01-01

    Full Text Available New biotechnology applications require in-depth preliminary studies of biodiversity. The methods of massive sequencing using metagenomics and bioinformatics tools offer us sufficient and reliable knowledge to understand environmental diversity, to know new microorganisms, and to take advantage of their functional genes. Villa Luz caves, in the southern Mexican state of Tabasco, are fed by at least 26 groundwater inlets, containing 300–500 mg L-1 H2S and <0.1 mg L-1 O2. We extracted environmental DNA for metagenomic analysis of collected samples in five selected Villa Luz caves sites, with pH values from 2.5 to 7. Foreign organisms found in this underground ecosystem can oxidize H2S to H2SO4. These include: biovermiculites, a bacterial association that can grow on the rock walls; snottites, that are whitish, viscous biofilms hanging from the rock walls, and sacks or bags of phlegm, which live within the aquatic environment of the springs. Through the emergency food assistance program (TEFAP pyrosequencing, a total of 20,901 readings of amplification products from hypervariable regions V1 and V3 of 16S rRNA bacterial gene in whole and pure metagenomic DNA samples were generated. Seven bacterial phyla were identified. As a result, Proteobacteria was more frequent than Acidobacteria. Finally, acidophilic Proteobacteria was detected in UJAT5 sample

  4. Overview of BioCreAtIvE: critical assessment of information extraction for biology

    Directory of Open Access Journals (Sweden)

    Hirschman Lynette

    2005-05-01

    Full Text Available Abstract Background The goal of the first BioCreAtIvE challenge (Critical Assessment of Information Extraction in Biology was to provide a set of common evaluation tasks to assess the state of the art for text mining applied to biological problems. The results were presented in a workshop held in Granada, Spain March 28–31, 2004. The articles collected in this BMC Bioinformatics supplement entitled "A critical assessment of text mining methods in molecular biology" describe the BioCreAtIvE tasks, systems, results and their independent evaluation. Results BioCreAtIvE focused on two tasks. The first dealt with extraction of gene or protein names from text, and their mapping into standardized gene identifiers for three model organism databases (fly, mouse, yeast. The second task addressed issues of functional annotation, requiring systems to identify specific text passages that supported Gene Ontology annotations for specific proteins, given full text articles. Conclusion The first BioCreAtIvE assessment achieved a high level of international participation (27 groups from 10 countries. The assessment provided state-of-the-art performance results for a basic task (gene name finding and normalization, where the best systems achieved a balanced 80% precision / recall or better, which potentially makes them suitable for real applications in biology. The results for the advanced task (functional annotation from free text were significantly lower, demonstrating the current limitations of text-mining approaches where knowledge extrapolation and interpretation are required. In addition, an important contribution of BioCreAtIvE has been the creation and release of training and test data sets for both tasks. There are 22 articles in this special issue, including six that provide analyses of results or data quality for the data sets, including a novel inter-annotator consistency assessment for the test set used in task 2.

  5. INFORMATION EXTRACTION AND DEPENDENCY ON OPEN GOVERNMENT DATA (OGD FOR ENVIRONMENTAL MONITORING

    Directory of Open Access Journals (Sweden)

    H. Abdulmuttalib

    2016-06-01

    Full Text Available Environmental monitoring practices support decision makers of different government / private institutions, besides environmentalists and planners among others. This support helps them act towards the sustainability of our environment, and also take efficient measures for protecting human beings in general, but it is difficult to explore useful information from 'OGD' and assure its quality for the purpose. On the other hand, Monitoring itself comprises detecting changes as happens, or within the mitigation period range, which means that any source of data, that is to be used for monitoring, should replicate the information related to the period of environmental monitoring, or otherwise it's considered almost useless or history. In this paper the assessment of information extraction and structuring from Open Government Data 'OGD', that can be useful to environmental monitoring is performed, looking into availability, usefulness to environmental monitoring of a certain type, checking its repetition period and dependences. The particular assessment is being performed on a small sample selected from OGD, bearing in mind the type of the environmental change monitored, such as the increase and concentrations of built up areas, and reduction of green areas, or monitoring the change of temperature in a specific area. The World Bank mentioned in its blog that Data is open if it satisfies both conditions of, being technically open, and legally open. The use of Open Data thus, is regulated by published terms of use, or an agreement which implies some conditions without violating the above mentioned two conditions. Within the scope of the paper I wish to share the experience of using some OGD for supporting an environmental monitoring work, that is performed to mitigate the production of carbon dioxide, by regulating energy consumption, and by properly designing the test area's landscapes, thus using Geodesign tactics, meanwhile wish to add to the results

  6. Information Extraction and Dependency on Open Government Data (ogd) for Environmental Monitoring

    Science.gov (United States)

    Abdulmuttalib, Hussein

    2016-06-01

    Environmental monitoring practices support decision makers of different government / private institutions, besides environmentalists and planners among others. This support helps them act towards the sustainability of our environment, and also take efficient measures for protecting human beings in general, but it is difficult to explore useful information from 'OGD' and assure its quality for the purpose. On the other hand, Monitoring itself comprises detecting changes as happens, or within the mitigation period range, which means that any source of data, that is to be used for monitoring, should replicate the information related to the period of environmental monitoring, or otherwise it's considered almost useless or history. In this paper the assessment of information extraction and structuring from Open Government Data 'OGD', that can be useful to environmental monitoring is performed, looking into availability, usefulness to environmental monitoring of a certain type, checking its repetition period and dependences. The particular assessment is being performed on a small sample selected from OGD, bearing in mind the type of the environmental change monitored, such as the increase and concentrations of built up areas, and reduction of green areas, or monitoring the change of temperature in a specific area. The World Bank mentioned in its blog that Data is open if it satisfies both conditions of, being technically open, and legally open. The use of Open Data thus, is regulated by published terms of use, or an agreement which implies some conditions without violating the above mentioned two conditions. Within the scope of the paper I wish to share the experience of using some OGD for supporting an environmental monitoring work, that is performed to mitigate the production of carbon dioxide, by regulating energy consumption, and by properly designing the test area's landscapes, thus using Geodesign tactics, meanwhile wish to add to the results achieved by many

  7. RCN4GSC workshop report: managing data at the interface of biodiversity and (meta)genomics, March 2011

    Science.gov (United States)

    The Genomic Standards Consortium (GSC) is an international working body with the mission of working towards richer descriptions of genomic and metagenomic data through the development of standards and tools for supporting the consistent documentation of contextual information about sequences. Becaus...

  8. Mining metagenomic and metatranscriptomic data for clues about microbial metabolic functions in ruminants.

    Science.gov (United States)

    Li, Fuyong; Neves, Andre L A; Ghoshal, Bibaswan; Guan, Le Luo

    2017-12-20

    Metagenomics and metatranscriptomics can capture the whole genome and transcriptome repertoire of microorganisms through sequencing total DNA/RNA from various environmental samples, providing both taxonomic and functional information with high resolution. The unique and complex rumen microbial ecosystem is receiving great research attention because the rumen microbiota coevolves with the host and equips ruminants with the ability to convert cellulosic plant materials to high-protein products for human consumption. To date, hundreds to thousands of microbial phylotypes have been identified in the rumen using culture-independent molecular-based approaches, and genomic information of rumen microorganisms is rapidly accumulating through the single genome sequencing. However, functional characteristics of the rumen microbiome have not been well described because there are numerous uncultivable microorganisms in the rumen. The advent of metagenomics and metatranscriptomics along with advanced bioinformatics methods can help us better understand mechanisms of the rumen fermentation, which is vital for improving nutrient utilization and animal productivity. Therefore, in this review, we summarize a general workflow to conduct rumen metagenomics and metatranscriptomics and discuss how the data can be interpreted to be useful information. Moreover, we review recent literatures studying associations between the rumen microbiome and host phenotypes (e.g., feed efficiency and methane emissions) using these approaches, aiming to provide a useful guide to include studying the rumen microbiome as one of the research objectives using these 2 approaches. Copyright © 2018 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  9. An Integrated Outlook on the Metagenome and Metabolome of Intestinal Diseases

    Directory of Open Access Journals (Sweden)

    Wanping Aw

    2015-11-01

    Full Text Available Recently, metagenomics and metabolomics are the two most rapidly advancing “omics” technologies. Metagenomics seeks to characterize the composition of microbial communities, their operations, and their dynamically co-evolving relationships with the habitats they occupy, whereas metabolomics studies unique chemical endpoints (metabolites that specific cellular processes leave behind. Remarkable progress in DNA sequencing and mass spectrometry technologies has enabled the comprehensive collection of information on the gut microbiome and its metabolome in order to assess the influence of the gut microbiota on host physiology on a whole-systems level. Our gut microbiota, which consists of prokaryotic cells together with its metabolites, creates a unique gut ecosystem together with the host eukaryotic cells. In this review, we will highlight the detailed relationships between gut microbiota and its metabolites on host health and the pathogenesis of various intestinal diseases such as inflammatory bowel disease and colorectal cancer. Therapeutic interventions such as probiotic and prebiotic administrations and fecal microbiota transplantations will also be discussed. We would like to promote this unique biology-wide approach of incorporating metagenome and metabolome information as we believe that this can help us understand the intricate interplay between gut microbiota and host metabolism to a greater extent. This novel integration of microbiome, metatranscriptome, and metabolome information will help us have an improved holistic understanding of the complex mammalian superorganism, thereby allowing us to gain new and unprecedented insights to providing exciting novel therapeutic approaches for optimal intestinal health.

  10. Metagenomics: Retrospect and Prospects in High Throughput Age

    Directory of Open Access Journals (Sweden)

    Satish Kumar

    2015-01-01

    Full Text Available In recent years, metagenomics has emerged as a powerful tool for mining of hidden microbial treasure in a culture independent manner. In the last two decades, metagenomics has been applied extensively to exploit concealed potential of microbial communities from almost all sorts of habitats. A brief historic progress made over the period is discussed in terms of origin of metagenomics to its current state and also the discovery of novel biological functions of commercial importance from metagenomes of diverse habitats. The present review also highlights the paradigm shift of metagenomics from basic study of community composition to insight into the microbial community dynamics for harnessing the full potential of uncultured microbes with more emphasis on the implication of breakthrough developments, namely, Next Generation Sequencing, advanced bioinformatics tools, and systems biology.

  11. Bioinformatics strategies for taxonomy independent binning and visualization of sequences in shotgun metagenomics

    Directory of Open Access Journals (Sweden)

    Karel Sedlar

    2017-01-01

    Full Text Available One of main steps in a study of microbial communities is resolving their composition, diversity and function. In the past, these issues were mostly addressed by the use of amplicon sequencing of a target gene because of reasonable price and easier computational postprocessing of the bioinformatic data. With the advancement of sequencing techniques, the main focus shifted to the whole metagenome shotgun sequencing, which allows much more detailed analysis of the metagenomic data, including reconstruction of novel microbial genomes and to gain knowledge about genetic potential and metabolic capacities of whole environments. On the other hand, the output of whole metagenomic shotgun sequencing is mixture of short DNA fragments belonging to various genomes, therefore this approach requires more sophisticated computational algorithms for clustering of related sequences, commonly referred to as sequence binning. There are currently two types of binning methods: taxonomy dependent and taxonomy independent. The first type classifies the DNA fragments by performing a standard homology inference against a reference database, while the latter performs the reference-free binning by applying clustering techniques on features extracted from the sequences. In this review, we describe the strategies within the second approach. Although these strategies do not require prior knowledge, they have higher demands on the length of sequences. Besides their basic principle, an overview of particular methods and tools is provided. Furthermore, the review covers the utilization of the methods in context with the length of sequences and discusses the needs for metagenomic data preprocessing in form of initial assembly prior to binning.

  12. Information Extraction from Large-scale WSNs: Approaches and Research Issues Part II: Query-Based and Macroprogramming Approaches

    Directory of Open Access Journals (Sweden)

    Tessa DANIEL

    2008-07-01

    Full Text Available Regardless of the application domain and deployment scope, the ability to retrieve information is critical to the successful functioning of any wireless sensor network (WSN system. In general, information extraction procedures can be categorized into three main approaches: agent-based, query-based and macroprogramming led. Whilst query-based systems are the most popular, macroprogramming techniques provide a more general-purpose approach to distributed computation. Finally, the agent-based approaches tailor the information extraction mechanism to the type of information needed and the configuration of the network it needs to be extracted from. This suite of three papers (Part I-III offers an extensive survey of the literature in the area of WSN information extraction, covering in Part I and Part II the three main approaches above. Part III highlights the open research questions and issues faced by deployable WSN system designers and discusses the potential benefits of both in-network processing and complex querying for large scale wireless informational systems.

  13. A metagenomic study of primate insect diet diversity.

    Science.gov (United States)

    Pickett, Sarah B; Bergey, Christina M; Di Fiore, Anthony

    2012-07-01

    Descriptions of primate diets are generally based on either direct observation of foraging behavior, morphological classification of food remains from feces, or analysis of the stomach contents of deceased individuals. Some diet items (e.g. insect prey), however, are difficult to identify visually, and observation conditions often do not permit adequate quantitative sampling of feeding behavior. Moreover, the taxonomically informative morphology of some food species (e.g. swallowed seeds, insect exoskeletons) may be destroyed by the digestive process. Because of these limitations, we used a metagenomic approach to conduct a preliminary, "proof of concept" study of interspecific variation in the insect component of the diets of six sympatric New World monkeys known, based on observational field studies, to differ markedly in their feeding ecology. We used generalized arthropod polymerase chain reaction (PCR) primers and cloning to sequence mitochondrial DNA (mtDNA) sequences of the arthropod cytochrome b (CYT B) gene from fecal samples of wild woolly, titi, saki, capuchin, squirrel, and spider monkeys collected from a single sampling site in western Amazonia where these genera occur sympatrically. We then assigned preliminary taxonomic identifications to the sequences by basic local alignment search tool (BLAST) comparison to arthropod CYT B sequences present in GenBank. This study is the first to use molecular techniques to identify insect prey in primate diets. The results suggest that a metagenomic approach may prove valuable in augmenting and corroborating observational data and increasing the resolution of primate diet studies, although the lack of comparative reference sequences for many South American insects limits the approach at present. As such reference data become available for more animal and plant taxa, this approach also holds promise for studying additional components of primate diets. © 2012 Wiley Periodicals, Inc.

  14. Machine learning classification of surgical pathology reports and chunk recognition for information extraction noise reduction.

    Science.gov (United States)

    Napolitano, Giulio; Marshall, Adele; Hamilton, Peter; Gavin, Anna T

    2016-06-01

    Machine learning techniques for the text mining of cancer-related clinical documents have not been sufficiently explored. Here some techniques are presented for the pre-processing of free-text breast cancer pathology reports, with the aim of facilitating the extraction of information relevant to cancer staging. The first technique was implemented using the freely available software RapidMiner to classify the reports according to their general layout: 'semi-structured' and 'unstructured'. The second technique was developed using the open source language engineering framework GATE and aimed at the prediction of chunks of the report text containing information pertaining to the cancer morphology, the tumour size, its hormone receptor status and the number of positive nodes. The classifiers were trained and tested respectively on sets of 635 and 163 manually classified or annotated reports, from the Northern Ireland Cancer Registry. The best result of 99.4% accuracy - which included only one semi-structured report predicted as unstructured - was produced by the layout classifier with the k nearest algorithm, using the binary term occurrence word vector type with stopword filter and pruning. For chunk recognition, the best results were found using the PAUM algorithm with the same parameters for all cases, except for the prediction of chunks containing cancer morphology. For semi-structured reports the performance ranged from 0.97 to 0.94 and from 0.92 to 0.83 in precision and recall, while for unstructured reports performance ranged from 0.91 to 0.64 and from 0.68 to 0.41 in precision and recall. Poor results were found when the classifier was trained on semi-structured reports but tested on unstructured. These results show that it is possible and beneficial to predict the layout of reports and that the accuracy of prediction of which segments of a report may contain certain information is sensitive to the report layout and the type of information sought. Copyright

  15. Analysis of composition-based metagenomic classification.

    Science.gov (United States)

    Higashi, Susan; Barreto, André da Motta Salles; Cantão, Maurício Egidio; de Vasconcelos, Ana Tereza Ribeiro

    2012-01-01

    An essential step of a metagenomic study is the taxonomic classification, that is, the identification of the taxonomic lineage of the organisms in a given sample. The taxonomic classification process involves a series of decisions. Currently, in the context of metagenomics, such decisions are usually based on empirical studies that consider one specific type of classifier. In this study we propose a general framework for analyzing the impact that several decisions can have on the classification problem. Instead of focusing on any specific classifier, we define a generic score function that provides a measure of the difficulty of the classification task. Using this framework, we analyze the impact of the following parameters on the taxonomic classification problem: (i) the length of n-mers used to encode the metagenomic sequences, (ii) the similarity measure used to compare sequences, and (iii) the type of taxonomic classification, which can be conventional or hierarchical, depending on whether the classification process occurs in a single shot or in several steps according to the taxonomic tree. We defined a score function that measures the degree of separability of the taxonomic classes under a given configuration induced by the parameters above. We conducted an extensive computational experiment and found out that reasonable values for the parameters of interest could be (i) intermediate values of n, the length of the n-mers; (ii) any similarity measure, because all of them resulted in similar scores; and (iii) the hierarchical strategy, which performed better in all of the cases. As expected, short n-mers generate lower configuration scores because they give rise to frequency vectors that represent distinct sequences in a similar way. On the other hand, large values for n result in sparse frequency vectors that represent differently metagenomic fragments that are in fact similar, also leading to low configuration scores. Regarding the similarity measure, in

  16. Study of time-frequency characteristics of single snores: extracting new information for sleep apnea diagnosis

    Energy Technology Data Exchange (ETDEWEB)

    Castillo Escario, Y.; Blanco Almazan, D.; Camara Vazquez, M.A.; Jane Campos, R.

    2016-07-01

    Obstructive sleep apnea (OSA) is a highly prevalent chronic disease, especially in elderly and obese population. Despite constituting a huge health and economic problem, most patients remain undiagnosed due to limitations in current strategies. Therefore, it is essential to find cost-effective diagnostic alternatives. One of these novel approaches is the analysis of acoustic snoring signals. Snoring is an early symptom of OSA which carries pathophysiological information of high diagnostic value. For this reason, the main objective of this work is to study the characteristics of single snores of different types, from healthy and OSA subjects. To do that, we analyzed snoring signals from previous databases and developed an experimental protocol to record simulated OSA-related sounds and characterize the response of two commercial tracheal microphones. Automatic programs for filtering, downsampling, event detection and time-frequency analysis were built in MATLAB. We found that time-frequency maps and spectral parameters (central, mean and peak frequency and energy in the 100-500 Hz band) allow distinguishing regular snores of healthy subjects from non-regular snores and snores of OSA subjects. Regarding the two commercial microphones, we found that one of them was a suitable snoring sensor, while the other had a too restricted frequency response. Future work shall include a higher number of episodes and subjects, but our study has contributed to show how important the differences between regular and non-regular snores can be for OSA diagnosis, and how much clinically relevant information can be extracted from time-frequency maps and spectral parameters of single snores. (Author)

  17. Study of quadrature FIR filters for extraction of low-frequency instantaneous information in biophysical signals

    Science.gov (United States)

    Arce-Guevara, Valdemar E.; Alba-Cadena, Alfonso; Mendez, Martín O.

    Quadrature bandpass filters take a real-valued signal and output an analytic signal from which the instantaneous amplitude and phase can be computed. For this reason, they represent a useful tool to extract time-varying, narrow-band information from electrophysiological signals such as electroencephalogram (EEG) or electrocardiogram. One of the defining characteristics of quadrature filters is its null response to negative frequencies. However, when the frequency band of interest is close to 0 Hz, a careless filter design could let through negative frequencies, producing distortions in the amplitude and phase of the output. In this work, three types of quadrature filters (Ideal, Gabor and Sinusoidal) have been evaluated using both artificial and real EEG signals. For the artificial signals, the performance of each filter was measured in terms of the distortion in amplitude and phase, and sensitivity to noise and bandwidth selection. For the real EEG signals, a qualitative evaluation of the dynamics of the synchronization between two EEG channels was performed. The results suggest that, while all filters under study behave similarly under noise, they differ in terms of their sensitivity to bandwidth choice. In this study, the Sinusoidal filter showed clear advantages for the estimation of low-frequency EEG synchronization.

  18. P3-19: Failure to Extract Velocity Information from Contours Induces the Footsteps Illusion

    Directory of Open Access Journals (Sweden)

    Tsubasa Tano

    2012-10-01

    Full Text Available When a black or white rectangle drifts horizontally across a background of black and white vertical stripes, the rectangle appears to stop and start as it crosses each stripe (the footsteps illusion; Anstis, 2001 Perception 30 785–794. Although previous studies indicate that confusion between contrast and velocity signals in the motion detectors or the spatial pattern of the background contribute to the footsteps illusion (e.g., Sunaga et al., 2008 Perception 37 902–914, it remains unclear which factor is critical. We hypothesize that the contour of the rectangle is significant to the footsteps illusion. A subjective experiment is conducted using modified rectangles, the contour of which were emphasized by adding contour lines, filling random dots inside, or putting illusory contour inducers on the four corners. Two kinds of rectangles were presented above and below central fixation simultaneously and the background strips were scrolled from right to left, or vice versa. Participants were asked which rectangle was perceived to drift more smoothly. The results demonstrate that the footsteps illusion is reduced when the rectangle's contour is emphasized. Placing random dots inside the rectangle yielded a weaker illusion than the rectangle that was surrounded by lines. These results suggest that humans perceive the velocity of moving objects (or background based on the extracted contours which are constructed by integrating low spatial frequency information.

  19. Metagenomic Analysis of Chicken Gut Microbiota for Improving Metabolism and Health of Chickens — A Review

    Directory of Open Access Journals (Sweden)

    Ki Young Choi

    2015-09-01

    Full Text Available Chicken is a major food source for humans, hence it is important to understand the mechanisms involved in nutrient absorption in chicken. In the gastrointestinal tract (GIT, the microbiota plays a central role in enhancing nutrient absorption and strengthening the immune system, thereby affecting both growth and health of chicken. There is little information on the diversity and functions of chicken GIT microbiota, its impact on the host, and the interactions between the microbiota and host. Here, we review the recent metagenomic strategies to analyze the chicken GIT microbiota composition and its functions related to improving metabolism and health. We summarize methodology of metagenomics in order to obtain bacterial taxonomy and functional inferences of the GIT microbiota and suggest a set of indicator genes for monitoring and manipulating the microbiota to promote host health in future.

  20. Natural variation in SAR11 marine bacterioplankton genomes inferred from metagenomic data

    Directory of Open Access Journals (Sweden)

    Wilhelm Larry J

    2007-11-01

    Full Text Available Abstract Background One objective of metagenomics is to reconstruct information about specific uncultured organisms from fragmentary environmental DNA sequences. We used the genome of an isolate of the marine alphaproteobacterium SAR11 ('Candidatus Pelagibacter ubique'; strain HTCC1062, obtained from the cold, productive Oregon coast, as a query sequence to study variation in SAR11 metagenome sequence data from the Sargasso Sea, a warm, oligotrophic ocean gyre. Results The average amino acid identity of SAR11 genes encoded by the metagenomic data to the query genome was only 71%, indicating significant evolutionary divergence between the coastal isolates and Sargasso Sea populations. However, an analysis of gene neighbors indicated that SAR11 genes in the Sargasso Sea metagenomic data match the gene order of the HTCC1062 genome in 96% of cases (> 85,000 observations, and that rearrangements are most frequent at predicted operon boundaries. There were no conserved examples of genes with known functions being found in the coastal isolates, but not the Sargasso Sea metagenomic data, or vice versa, suggesting that core regions of these diverse SAR11 genomes are relatively conserved in gene content. However, four hypervariable regions were observed, which may encode properties associated with variation in SAR11 ecotypes. The largest of these, HVR2, is a 48 kb region flanked by the sole 5S and 23S genes in the HTCC1062 genome, and mainly encodes genes that determine cell surface properties. A comparison of two closely related 'Candidatus Pelagibacter' genomes (HTCC1062 and HTCC1002 revealed a number of "gene indels" in core regions. Most of these were found to be polymorphic in the metagenomic data and showed evidence of purifying selection, suggesting that the same "polymorphic gene indels" are maintained in physically isolated SAR11 populations. Conclusion These findings suggest that natural selection has conserved many core features of SAR11

  1. Metagenome-Based Metabolic Reconstruction Reveals the Ecophysiological Function of Epsilonproteobacteria in a Hydrocarbon-Contaminated Sulfidic Aquifer

    OpenAIRE

    Keller, Andreas H.; Schleinitz, Kathleen M.; Starke, Robert; Bertilsson, Stefan; Vogt, Carsten; Kleinsteuber, Sabine

    2015-01-01

    The population genome of an uncultured bacterium assigned to the Campylobacterales (Epsilonproteobacteria) was reconstructed from a metagenome dataset obtained by whole-genome shotgun pyrosequencing. Genomic DNA was extracted from a sulfate-reducing, m-xylene-mineralizing enrichment culture isolated from groundwater of a benzene-contaminated sulfidic aquifer. The identical epsilonproteobacterial phylotype has previously been detected in toluene- or benzene-mineralizing, sulfate-reducing conso...

  2. Comparative metagenomics of Daphnia symbionts

    Directory of Open Access Journals (Sweden)

    Preston James F

    2009-04-01

    Full Text Available Abstract Background Shotgun sequences of DNA extracts from whole organisms allow a comprehensive assessment of possible symbionts. The current project makes use of four shotgun datasets from three species of the planktonic freshwater crustaceans Daphnia: one dataset from clones of D. pulex and D. pulicaria and two datasets from one clone of D. magna. We analyzed these datasets with three aims: First, we search for bacterial symbionts, which are present in all three species. Second, we search for evidence for Cyanobacteria and plastids, which had been suggested to occur as symbionts in a related Daphnia species. Third, we compare the metacommunities revealed by two different 454 pyrosequencing methods (GS 20 and GS FLX. Results In all datasets we found evidence for a large number of bacteria belonging to diverse taxa. The vast majority of these were Proteobacteria. Of those, most sequences were assigned to different genera of the Betaproteobacteria family Comamonadaceae. Other taxa represented in all datasets included the genera Flavobacterium, Rhodobacter, Chromobacterium, Methylibium, Bordetella, Burkholderia and Cupriavidus. A few taxa matched sequences only from the D. pulex and the D. pulicaria datasets: Aeromonas, Pseudomonas and Delftia. Taxa with many hits specific to a single dataset were rare. For most of the identified taxa earlier studies reported the finding of related taxa in aquatic environmental samples. We found no clear evidence for the presence of symbiotic Cyanobacteria or plastids. The apparent similarity of the symbiont communities of the three Daphnia species breaks down on a species and strain level. Communities have a similar composition at a higher taxonomic level, but the actual sequences found are divergent. The two Daphnia magna datasets obtained from two different pyrosequencing platforms revealed rather similar results. Conclusion Three clones from three species of the genus Daphnia were found to harbor a rich

  3. Metagenomic islands of hyperhalophiles: the case of Salinibacter ruber

    Directory of Open Access Journals (Sweden)

    Rohwer Forest

    2009-12-01

    Full Text Available Abstract Background Saturated brines are extreme environments of low diversity. Salinibacter ruber is the only bacterium that inhabits this environment in significant numbers. In order to establish the extent of genetic diversity in natural populations of this microbe, the genomic sequence of reference strain DSM 13855 was compared to metagenomic fragments recovered from climax saltern crystallizers and obtained with 454 sequencing technology. This kind of analysis reveals the presence of metagenomic islands, i.e. highly variable regions among the different lineages in the population. Results Three regions of the sequenced isolate were scarcely represented in the metagenome thus appearing to vary among co-occurring S. ruber cells. These metagenomic islands showed evidence of extensive genomic corruption with atypically low GC content, low coding density, high numbers of pseudogenes and short hypothetical proteins. A detailed analysis of island gene content showed that the genes in metagenomic island 1 code for cell surface polysaccharides. The strain-specific genes of metagenomic island 2 were found to be involved in biosynthesis of cell wall polysaccharide components. Finally, metagenomic island 3 was rich in DNA related enzymes. Conclusion The genomic organisation of S. ruber variable genomic regions showed a number of convergences with genomic islands of marine microbes studied, being largely involved in variable cell surface traits. This variation at the level of cell envelopes in an environment devoid of grazing pressure probably reflects a global strategy of bacteria to escape phage predation.

  4. New Bacterial Phytase through Metagenomic Prospection

    Directory of Open Access Journals (Sweden)

    Nathálya Farias

    2018-02-01

    Full Text Available Alkaline phytases from uncultured microorganisms, which hydrolyze phytate to less phosphorylated myo-inositols and inorganic phosphate, have great potential as additives in agricultural industry. The development of metagenomics has stemmed from the ineluctable evidence that as-yet-uncultured microorganisms represent the vast majority of organisms in most environments on earth. In this study, a gene encoding a phytase was cloned from red rice crop residues and castor bean cake using a metagenomics strategy. The amino acid identity between this gene and its closest published counterparts is lower than 60%. The phytase was named PhyRC001 and was biochemically characterized. This recombinant protein showed activity on sodium phytate, indicating that PhyRC001 is a hydrolase enzyme. The enzymatic activity was optimal at a pH of 7.0 and at a temperature of 35 °C. β-propeller phytases possess great potential as feed additives because they are the only type of phytase with high activity at neutral pH. Therefore, to explore and exploit the underlying mechanism for β-propeller phytase functions could be of great benefit to biotechnology.

  5. Metagenomic characterization of ambulances across the USA.

    Science.gov (United States)

    O'Hara, Niamh B; Reed, Harry J; Afshinnekoo, Ebrahim; Harvin, Donell; Caplan, Nora; Rosen, Gail; Frye, Brook; Woloszynek, Stephen; Ounit, Rachid; Levy, Shawn; Butler, Erin; Mason, Christopher E

    2017-09-22

    Microbial communities in our built environments have great influence on human health and disease. A variety of built environments have been characterized using a metagenomics-based approach, including some healthcare settings. However, there has been no study to date that has used this approach in pre-hospital settings, such as ambulances, an important first point-of-contact between patients and hospitals. We sequenced 398 samples from 137 ambulances across the USA using shotgun sequencing. We analyzed these data to explore the microbial ecology of ambulances including characterizing microbial community composition, nosocomial pathogens, patterns of diversity, presence of functional pathways and antimicrobial resistance, and potential spatial and environmental factors that may contribute to community composition. We found that the top 10 most abundant species are either common built environment microbes, microbes associated with the human microbiome (e.g., skin), or are species associated with nosocomial infections. We also found widespread evidence of antimicrobial resistance markers (hits ~ 90% samples). We identified six factors that may influence the microbial ecology of ambulances including ambulance surfaces, geographical-related factors (including region, longitude, and latitude), and weather-related factors (including temperature and precipitation). While the vast majority of microbial species classified were beneficial, we also found widespread evidence of species associated with nosocomial infections and antimicrobial resistance markers. This study indicates that metagenomics may be useful to characterize the microbial ecology of pre-hospital ambulance settings and that more rigorous testing and cleaning of ambulances may be warranted.

  6. Metagenomic investigation of gastrointestinal microbiome in cattle

    Directory of Open Access Journals (Sweden)

    Minseok Kim

    2017-11-01

    Full Text Available The gastrointestinal (GI tract, including the rumen and the other intestinal segments of cattle, harbors a diverse, complex, and dynamic microbiome that drives feed digestion and fermentation in cattle, determining feed efficiency and output of pollutants. This microbiome also plays an important role in affecting host health. Research has been conducted for more than a century to understand the microbiome and its relationship to feed efficiency and host health. The traditional cultivation-based research elucidated some of the major metabolism, but studies using molecular biology techniques conducted from late 1980’s to the late early 2000’s greatly expanded our view of the diversity of the rumen and intestinal microbiome of cattle. Recently, metagenomics has been the primary technology to characterize the GI microbiome and its relationship with host nutrition and health. This review addresses the main methods/techniques in current use, the knowledge gained, and some of the challenges that remain. Most of the primers used in quantitative real-time polymerase chain reaction quantification and diversity analysis using metagenomics of ruminal bacteria, archaea, fungi, and protozoa were also compiled.

  7. Very Large Graphs for Information Extraction (VLG). Summary of First-Year Proof-of-Concept Study

    Science.gov (United States)

    2013-08-20

    15 700M.---~----~----~------,----, 600M Cl) ’E 500M 8 400M Ci) 0:: 300M .... 0 200M ~ Sep-12 Sep-19 Sep-26 Time Total Proxy Log Records...Large Graphs for Information Extraction (VLG) Summary of First-Year Proof-of-Concept Study 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6

  8. Extraction as a source of additional information when concentrations in multicomponent systems are simultaneously determined

    International Nuclear Information System (INIS)

    Perkov, I.G.

    1988-01-01

    Using as an example photometric determination of Nd and Sm in their joint presence, the possibility to use the influence of extraction on analytic signal increase is considered. It is shown that interligand exchange in extracts in combination with simultaneous determination of concentrations can be used as a simple means increasing the accuracy of determination. 5 refs.; 2 figs.; 3 tabs

  9. Bioprospecting Potential of the Soil Metagenome: Novel Enzymes and Bioactivities

    Directory of Open Access Journals (Sweden)

    Myung Hwan Lee

    2013-09-01

    Full Text Available The microbial diversity in soil ecosystems is higher than in any other microbial ecosystem. The majority of soil microorganisms has not been characterized, because the dominant members have not been readily culturable on standard cultivation media; therefore, the soil ecosystem is a great reservoir for the discovery of novel microbial enzymes and bioactivities. The soil metagenome, the collective microbial genome, could be cloned and sequenced directly from soils to search for novel microbial resources. This review summarizes the microbial diversity in soils and the efforts to search for microbial resources from the soil metagenome, with more emphasis on the potential of bioprospecting metagenomics and recent discoveries.

  10. The Microbiome of Brazilian Mangrove Sediments as Revealed by Metagenomics

    Science.gov (United States)

    Andreote, Fernando Dini; Jiménez, Diego Javier; Chaves, Diego; Dias, Armando Cavalcante Franco; Luvizotto, Danice Mazzer; Dini-Andreote, Francisco; Fasanella, Cristiane Cipola; Lopez, Maryeimy Varon; Baena, Sandra; Taketani, Rodrigo Gouvêa; de Melo, Itamar Soares

    2012-01-01

    Here we embark in a deep metagenomic survey that revealed the taxonomic and potential metabolic pathways aspects of mangrove sediment microbiology. The extraction of DNA from sediment samples and the direct application of pyrosequencing resulted in approximately 215 Mb of data from four distinct mangrove areas (BrMgv01 to 04) in Brazil. The taxonomic approaches applied revealed the dominance of Deltaproteobacteria and Gammaproteobacteria in the samples. Paired statistical analysis showed higher proportions of specific taxonomic groups in each dataset. The metabolic reconstruction indicated the possible occurrence of processes modulated by the prevailing conditions found in mangrove sediments. In terms of carbon cycling, the sequences indicated the prevalence of genes involved in the metabolism of methane, formaldehyde, and carbon dioxide. With respect to the nitrogen cycle, evidence for sequences associated with dissimilatory reduction of nitrate, nitrogen immobilization, and denitrification was detected. Sequences related to the production of adenylsulfate, sulfite, and H2S were relevant to the sulphur cycle. These data indicate that the microbial core involved in methane, nitrogen, and sulphur metabolism consists mainly of Burkholderiaceae, Planctomycetaceae, Rhodobacteraceae, and Desulfobacteraceae. Comparison of our data to datasets from soil and sea samples resulted in the allotment of the mangrove sediments between those samples. The results of this study add valuable data about the composition of microbial communities in mangroves and also shed light on possible transformations promoted by microbial organisms in mangrove sediments. PMID:22737213

  11. The microbiome of Brazilian mangrove sediments as revealed by metagenomics.

    Directory of Open Access Journals (Sweden)

    Fernando Dini Andreote

    Full Text Available Here we embark in a deep metagenomic survey that revealed the taxonomic and potential metabolic pathways aspects of mangrove sediment microbiology. The extraction of DNA from sediment samples and the direct application of pyrosequencing resulted in approximately 215 Mb of data from four distinct mangrove areas (BrMgv01 to 04 in Brazil. The taxonomic approaches applied revealed the dominance of Deltaproteobacteria and Gammaproteobacteria in the samples. Paired statistical analysis showed higher proportions of specific taxonomic groups in each dataset. The metabolic reconstruction indicated the possible occurrence of processes modulated by the prevailing conditions found in mangrove sediments. In terms of carbon cycling, the sequences indicated the prevalence of genes involved in the metabolism of methane, formaldehyde, and carbon dioxide. With respect to the nitrogen cycle, evidence for sequences associated with dissimilatory reduction of nitrate, nitrogen immobilization, and denitrification was detected. Sequences related to the production of adenylsulfate, sulfite, and H(2S were relevant to the sulphur cycle. These data indicate that the microbial core involved in methane, nitrogen, and sulphur metabolism consists mainly of Burkholderiaceae, Planctomycetaceae, Rhodobacteraceae, and Desulfobacteraceae. Comparison of our data to datasets from soil and sea samples resulted in the allotment of the mangrove sediments between those samples. The results of this study add valuable data about the composition of microbial communities in mangroves and also shed light on possible transformations promoted by microbial organisms in mangrove sediments.

  12. 3D building reconstruction based on given ground plan information and surface models extracted from spaceborne imagery

    Science.gov (United States)

    Tack, Frederik; Buyuksalih, Gurcan; Goossens, Rudi

    2012-01-01

    3D surface models have gained field as an important tool for urban planning and mapping. However, urban environments have a complex nature to model and they provide a challenge to investigate the current limits of automatic digital surface modeling from high resolution satellite imagery. An approach is introduced to improve a 3D surface model, extracted photogrammetrically from satellite imagery, based on the geometric building information embodied in existing 2D ground plans. First buildings are clipped from the extracted DSM based on the 2D polygonal building ground plans. To generate prismatic shaped structures with vertical walls and flat roofs, building shape is retrieved from the cadastre database while elevation information is extracted from the DSM. Within each 2D building boundary, a constant roof height is extracted based on statistical calculations of the height values. After buildings are extracted from the initial surface model, the remaining DSM is further processed to simplify to a smooth DTM that reflects bare ground, without artifacts, local relief, vegetation, cars and city furniture. In a next phase, both models are merged to yield an integrated city model or generalized DSM. The accuracy of the generalized surface model is assessed according to a quantitative-statistical analysis by comparison with two different types of reference data.

  13. Unravelling core microbial metabolisms in the hypersaline microbial mats of Shark Bay using high-throughput metagenomics

    Energy Technology Data Exchange (ETDEWEB)

    Ruvindy, Rendy; White III, Richard Allen; Neilan, Brett Anthony; Burns, Brendan Paul

    2015-05-29

    Modern microbial mats are potential analogues of some of Earth’s earliest ecosystems. Excellent examples can be found in Shark Bay, Australia, with mats of various morphologies. To further our understanding of the functional genetic potential of these complex microbial ecosystems, we conducted for the first time shotgun metagenomic analyses. We assembled metagenomic nextgeneration sequencing data to classify the taxonomic and metabolic potential across diverse morphologies of marine mats in Shark Bay. The microbial community across taxonomic classifications using protein-coding and small subunit rRNA genes directly extracted from the metagenomes suggests that three phyla Proteobacteria, Cyanobacteria and Bacteriodetes dominate all marine mats. However, the microbial community structure between Shark Bay and Highbourne Cay (Bahamas) marine systems appears to be distinct from each other. The metabolic potential (based on SEED subsystem classifications) of the Shark Bay and Highbourne Cay microbial communities were also distinct. Shark Bay metagenomes have a metabolic pathway profile consisting of both heterotrophic and photosynthetic pathways, whereas Highbourne Cay appears to be dominated almost exclusively by photosynthetic pathways. Alternative non-rubisco-based carbon metabolism including reductive TCA cycle and 3-hydroxypropionate/4-hydroxybutyrate pathways is highly represented in Shark Bay metagenomes while not represented in Highbourne Cay microbial mats or any other mat forming ecosystems investigated to date. Potentially novel aspects of nitrogen cycling were also observed, as well as putative heavy metal cycling (arsenic, mercury, copper and cadmium). Finally, archaea are highly represented in Shark Bay and may have critical roles in overall ecosystem function in these modern microbial mats.

  14. Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software

    DEFF Research Database (Denmark)

    Sczyrba, Alexander; Hofmann, Peter; Belmann, Peter

    2017-01-01

    Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark...... their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially...... affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI...

  15. Novel resistance functions uncovered using functional metagenomic investigations of resistance reservoirs

    Directory of Open Access Journals (Sweden)

    Erica C. Pehrsson

    2013-06-01

    Full Text Available Rates of infection with antibiotic-resistant bacteria have increased precipitously over the past several decades, with far-reaching healthcare and societal costs. Recent evidence has established a link between antibiotic resistance genes in human pathogens and those found in non-pathogenic, commensal, and environmental organisms, prompting deeper investigation of natural and human-associated reservoirs of antibiotic resistance. Functional metagenomic selections, in which shotgun-cloned DNA fragments are selected for their ability to confer survival to an indicator host, have been increasingly applied to the characterization of many antibiotic resistance reservoirs. These experiments have demonstrated that antibiotic resistance genes are highly diverse and widely distributed, many times bearing little to no similarity to known sequences. Through unbiased selections for survival to antibiotic exposure, functional metagenomics can improve annotations by reducing the discovery of false-positive resistance and by allowing for the identification of previously unrecognizable resistance genes. In this review, we summarize the novel resistance functions uncovered using functional metagenomic investigations of natural and human-impacted resistance reservoirs. Examples of novel antibiotic resistance genes include those highly divergent from known sequences, those for which sequence is entirely unable to predict resistance function, bifunctional resistance genes, and those with unconventional, atypical resistance mechanisms. Overcoming antibiotic resistance in the clinic will require a better understanding of existing resistance reservoirs and the dissemination networks that govern horizontal gene exchange, informing best practices to limit the spread of resistance-conferring genes to human pathogens.

  16. Novel resistance functions uncovered using functional metagenomic investigations of resistance reservoirs.

    Science.gov (United States)

    Pehrsson, Erica C; Forsberg, Kevin J; Gibson, Molly K; Ahmadi, Sara; Dantas, Gautam

    2013-01-01

    Rates of infection with antibiotic-resistant bacteria have increased precipitously over the past several decades, with far-reaching healthcare and societal costs. Recent evidence has established a link between antibiotic resistance genes in human pathogens and those found in non-pathogenic, commensal, and environmental organisms, prompting deeper investigation of natural and human-associated reservoirs of antibiotic resistance. Functional metagenomic selections, in which shotgun-cloned DNA fragments are selected for their ability to confer survival to an indicator host, have been increasingly applied to the characterization of many antibiotic resistance reservoirs. These experiments have demonstrated that antibiotic resistance genes are highly diverse and widely distributed, many times bearing little to no similarity to known sequences. Through unbiased selections for survival to antibiotic exposure, functional metagenomics can improve annotations by reducing the discovery of false-positive resistance and by allowing for the identification of previously unrecognizable resistance genes. In this review, we summarize the novel resistance functions uncovered using functional metagenomic investigations of natural and human-impacted resistance reservoirs. Examples of novel antibiotic resistance genes include those highly divergent from known sequences, those for which sequence is entirely unable to predict resistance function, bifunctional resistance genes, and those with unconventional, atypical resistance mechanisms. Overcoming antibiotic resistance in the clinic will require a better understanding of existing resistance reservoirs and the dissemination networks that govern horizontal gene exchange, informing best practices to limit the spread of resistance-conferring genes to human pathogens.

  17. Identification of Viral Pathogen Diversity in Sewage Sludge by Metagenome Analysis

    Science.gov (United States)

    BIBBY, KYLE; PECCIA, JORDAN

    2013-01-01

    The large diversity of viruses that exist in human populations are potentially excreted into sewage collection systems and concentrated in sewage sludge. In the US, the primary fate of processed sewage sludge (class B biosolids) is application to agricultural land as a soil amendment. To characterize and understand infectious risks associated with land application, and to describe the diversity of viruses in human populations, shotgun viral metagenomics was applied to 10 sewage sludge samples from 5 wastewater treatment plants throughout the continental U.S, each serving between 100,000 and 1,000,000 people. Nearly 330 million DNA sequences were produced and assembled, and annotation resulted in identifying 43 (26 DNA, 17 RNA) different types of human viruses in sewage sludge. Novel insights include the high abundance of newly emerging viruses (e.g. Coronavirus HKU1, Klassevirus, and Cosavirus) the strong representation of respiratory viruses, and the relatively minor abundance and occurrence of Enteroviruses. Viral metagenome sequence annotations were reproducible and independent PCR-based identification of selected viruses suggests that viral metagenomes were a conservative estimate of the true viral occurrence and diversity. These results represent the most complete description of human virus diversity in any wastewater sample to date, provide engineers and environmental scientists with critical information on important viral agents and routes of infection from exposure to wastewater and sewage sludge, and represent a significant leap forward in understanding the pathogen content of class B biosolids. PMID:23346855

  18. MetaMIS: a metagenomic microbial interaction simulator based on microbial community profiles.

    Science.gov (United States)

    Shaw, Grace Tzun-Wen; Pao, Yueh-Yang; Wang, Daryi

    2016-11-25

    The complexity and dynamics of microbial communities are major factors in the ecology of a system. With the NGS technique, metagenomics data provides a new way to explore microbial interactions. Lotka-Volterra models, which have been widely used to infer animal interactions in dynamic systems, have recently been applied to the analysis of metagenomic data. In this paper, we present the Lotka-Volterra model based tool, the Metagenomic Microbial Interacticon Simulator (MetaMIS), which is designed to analyze the time series data of microbial community profiles. MetaMIS first infers underlying microbial interactions from abundance tables for operational taxonomic units (OTUs) and then interprets interaction networks using the Lotka-Volterra model. We also embed a Bray-Curtis dissimilarity method in MetaMIS in order to evaluate the similarity to biological reality. MetaMIS is designed to tolerate a high level of missing data, and can estimate interaction information without the influence of rare microbes. For each interaction network, MetaMIS systematically examines interaction patterns (such as mutualism or competition) and refines the biotic role within microbes. As a case study, we collect a human male fecal microbiome and show that Micrococcaceae, a relatively low abundance OTU, is highly connected with 13 dominant OTUs and seems to play a critical role. MetaMIS is able to organize multiple interaction networks into a consensus network for comparative studies; thus we as a case study have also identified a consensus interaction network between female and male fecal microbiomes. MetaMIS provides an efficient and user-friendly platform that may reveal new insights into metagenomics data. MetaMIS is freely available at: https://sourceforge.net/projects/metamis/ .

  19. Culture-independent detection and characterisation of Mycobacterium tuberculosis and M. africanum in sputum samples using shotgun metagenomics on a benchtop sequencer

    Directory of Open Access Journals (Sweden)

    Emma L. Doughty

    2014-09-01

    Full Text Available Tuberculosis remains a major global health problem. Laboratory diagnostic methods that allow effective, early detection of cases are central to management of tuberculosis in the individual patient and in the community. Since the 1880s, laboratory diagnosis of tuberculosis has relied primarily on microscopy and culture. However, microscopy fails to provide species- or lineage-level identification and culture-based workflows for diagnosis of tuberculosis remain complex, expensive, slow, technically demanding and poorly able to handle mixed infections. We therefore explored the potential of shotgun metagenomics, sequencing of DNA from samples without culture or target-specific amplification or capture, to detect and characterise strains from the Mycobacterium tuberculosis complex in smear-positive sputum samples obtained from The Gambia in West Africa. Eight smear- and culture-positive sputum samples were investigated using a differential-lysis protocol followed by a kit-based DNA extraction method, with sequencing performed on a benchtop sequencing instrument, the Illumina MiSeq. The number of sequence reads in each sputum-derived metagenome ranged from 989,442 to 2,818,238. The proportion of reads in each metagenome mapping against the human genome ranged from 20% to 99%. We were able to detect sequences from the M. tuberculosis complex in all eight samples, with coverage of the H37Rv reference genome ranging from 0.002X to 0.7X. By analysing the distribution of large sequence polymorphisms (deletions and the locations of the insertion element IS6110 and single nucleotide polymorphisms (SNPs, we were able to assign seven of eight metagenome-derived genomes to a species and lineage within the M. tuberculosis complex. Two metagenome-derived mycobacterial genomes were assigned to M. africanum, a species largely confined to West Africa; the others that could be assigned belonged to lineages T, H or LAM within the clade of “modern” M. tuberculosis

  20. Comparative Metagenomics of Freshwater Microbial Communities

    International Nuclear Information System (INIS)

    Hemme, Chris; Deng, Ye; Tu, Qichao; Fields, Matthew; Gentry, Terry; Wu, Liyou; Tringe, Susannah; Watson, David; He, Zhili; Hazen, Terry; Tiedje, James; Rubin, Eddy; Zhou, Jizhong

    2010-01-01

    Previous analyses of a microbial metagenome from uranium and nitric-acid contaminated groundwater (FW106) showed significant environmental effects resulting from the rapid introduction of multiple contaminants. Effects include a massive loss of species and strain biodiversity, accumulation of toxin resistant genes in the metagenome and lateral transfer of toxin resistance genes between community members. To better understand these results in an ecological context, a second metagenome from a pristine groundwater system located along the same geological strike was sequenced and analyzed (FW301). It is hypothesized that FW301 approximates the ancestral FW106 community based on phylogenetic profiles and common geological parameters; however, even if is not the case, the datasets still permit comparisons between healthy and stressed groundwater ecosystems. Complex carbohydrate metabolism has been almost entirely lost in the stressed ecosystem. In contrast, the pristine system encodes a wide diversity of complex carbohydrate metabolism systems, suggesting that carbon turnover is very rapid and less leaky in the healthy groundwater system. FW301 encodes many (∼160+) carbon monoxide dehydrogenase genes while FW106 encodes none. This result suggests that the community is frequently exposed to oxygen from aerated rainwater percolating into the subsurface, with a resulting high rate of carbon metabolism and CO production. When oxygen levels fall, the CO then serves as a major carbon source for the community. FW301 appears to be capable of CO2 fixation via the reductive carboxylase (reverse TCA) cycle and possibly acetogenesis, activities; these activities are lacking in the heterotrophic FW106 system which relies exclusively on respiration of nitrate and/or oxygen for energy production. FW301 encodes a complete set of B12 biosynthesis pathway at high abundance suggesting the use of sodium gradients for energy production in the healthy groundwater community. Overall

  1. Comparative Metagenomics of Freshwater Microbial Communities

    Energy Technology Data Exchange (ETDEWEB)

    Hemme, Chris; Deng, Ye; Tu, Qichao; Fields, Matthew; Gentry, Terry; Wu, Liyou; Tringe, Susannah; Watson, David; He, Zhili; Hazen, Terry; Tiedje, James; Rubin, Eddy; Zhou, Jizhong

    2010-05-17

    Previous analyses of a microbial metagenome from uranium and nitric-acid contaminated groundwater (FW106) showed significant environmental effects resulting from the rapid introduction of multiple contaminants. Effects include a massive loss of species and strain biodiversity, accumulation of toxin resistant genes in the metagenome and lateral transfer of toxin resistance genes between community members. To better understand these results in an ecological context, a second metagenome from a pristine groundwater system located along the same geological strike was sequenced and analyzed (FW301). It is hypothesized that FW301 approximates the ancestral FW106 community based on phylogenetic profiles and common geological parameters; however, even if is not the case, the datasets still permit comparisons between healthy and stressed groundwater ecosystems. Complex carbohydrate metabolism has been almost entirely lost in the stressed ecosystem. In contrast, the pristine system encodes a wide diversity of complex carbohydrate metabolism systems, suggesting that carbon turnover is very rapid and less leaky in the healthy groundwater system. FW301 encodes many (~;;160+) carbon monoxide dehydrogenase genes while FW106 encodes none. This result suggests that the community is frequently exposed to oxygen from aerated rainwater percolating into the subsurface, with a resulting high rate of carbon metabolism and CO production. When oxygen levels fall, the CO then serves as a major carbon source for the community. FW301 appears to be capable of CO2 fixation via the reductive carboxylase (reverse TCA) cycle and possibly acetogenesis, activities; these activities are lacking in the heterotrophic FW106 system which relies exclusively on respiration of nitrate and/or oxygen for energy production. FW301 encodes a complete set of B12 biosynthesis pathway at high abundance suggesting the use of sodium gradients for energy production in the healthy groundwater community. Overall

  2. A COMPARATIVE ANALYSIS OF WEB INFORMATION EXTRACTION TECHNIQUES DEEP LEARNING vs. NAÏVE BAYES vs. BACK PROPAGATION NEURAL NETWORKS IN WEB DOCUMENT EXTRACTION

    Directory of Open Access Journals (Sweden)

    J. Sharmila

    2016-01-01

    Full Text Available Web mining related exploration is getting the chance to be more essential these days in view of the reason that a lot of information is overseen through the web. Web utilization is expanding in an uncontrolled way. A particular framework is required for controlling such extensive measure of information in the web space. Web mining is ordered into three noteworthy divisions: Web content mining, web usage mining and web structure mining. Tak-Lam Wong has proposed a web content mining methodology in the exploration with the aid of Bayesian Networks (BN. In their methodology, they were learning on separating the web data and characteristic revelation in view of the Bayesian approach. Roused from their investigation, we mean to propose a web content mining methodology, in view of a Deep Learning Algorithm. The Deep Learning Algorithm gives the interest over BN on the basis that BN is not considered in any learning architecture planning like to propose system. The main objective of this investigation is web document extraction utilizing different grouping algorithm and investigation. This work extricates the data from the web URL. This work shows three classification algorithms, Deep Learning Algorithm, Bayesian Algorithm and BPNN Algorithm. Deep Learning is a capable arrangement of strategies for learning in neural system which is connected like computer vision, speech recognition, and natural language processing and biometrics framework. Deep Learning is one of the simple classification technique and which is utilized for subset of extensive field furthermore Deep Learning has less time for classification. Naive Bayes classifiers are a group of basic probabilistic classifiers in view of applying Bayes hypothesis with concrete independence assumptions between the features. At that point the BPNN algorithm is utilized for classification. Initially training and testing dataset contains more URL. We extract the content presently from the dataset. The

  3. Analysis and comparison of very large metagenomes with fast clustering and functional annotation

    Directory of Open Access Journals (Sweden)

    Li Weizhong

    2009-10-01

    Full Text Available Abstract Background The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely time-consuming, and there are often many novel sequences in these metagenomes that are not fully utilized. The number of available metagenomes is rapidly increasing, so fast and efficient metagenome comparison methods are in great demand. Results The new metagenomic data analysis method Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (RAMMCAP was developed using an ultra-fast sequence clustering algorithm, fast protein family annotation tools, and a novel statistical metagenome comparison method that employs a unique graphic interface. RAMMCAP processes extremely large datasets with only moderate computational effort. It identifies raw read clusters and protein clusters that may include novel gene families, and compares metagenomes using clusters or functional annotations calculated by RAMMCAP. In this study, RAMMCAP was applied to the two largest available metagenomic collections, the "Global Ocean Sampling" and the "Metagenomic Profiling of Nine Biomes". Conclusion RAMMCAP is a very fast method that can cluster and annotate one million metagenomic reads in only hundreds of CPU hours. It is available from http://tools.camera.calit2.net/camera/rammcap/.

  4. Activity screening of environmental metagenomic libraries reveals novel carboxylesterase families

    Science.gov (United States)

    Popovic, Ana; Hai, Tran; Tchigvintsev, Anatoly; Hajighasemi, Mahbod; Nocek, Boguslaw; Khusnutdinova, Anna N.; Brown, Greg; Glinos, Julia; Flick, Robert; Skarina, Tatiana; Chernikova, Tatyana N.; Yim, Veronica; Brüls, Thomas; Paslier, Denis Le; Yakimov, Michail M.; Joachimiak, Andrzej; Ferrer, Manuel; Golyshina, Olga V.; Savchenko, Alexei; Golyshin, Peter N.; Yakunin, Alexander F.

    2017-01-01

    Metagenomics has made accessible an enormous reserve of global biochemical diversity. To tap into this vast resource of novel enzymes, we have screened over one million clones from metagenome DNA libraries derived from sixteen different environments for carboxylesterase activity and identified 714 positive hits. We have validated the esterase activity of 80 selected genes, which belong to 17 different protein families including unknown and cyclase-like proteins. Three metagenomic enzymes exhibited lipase activity, and seven proteins showed polyester depolymerization activity against polylactic acid and polycaprolactone. Detailed biochemical characterization of four new enzymes revealed their substrate preference, whereas their catalytic residues were identified using site-directed mutagenesis. The crystal structure of the metal-ion dependent esterase MGS0169 from the amidohydrolase superfamily revealed a novel active site with a bound unknown ligand. Thus, activity-centered metagenomics has revealed diverse enzymes and novel families of microbial carboxylesterases, whose activity could not have been predicted using bioinformatics tools. PMID:28272521

  5. Assessment of metagenomic assembly using simulated next generation sequencing data

    DEFF Research Database (Denmark)

    Mende, Daniel R; Waller, Alison S; Sunagawa, Shinichi

    2012-01-01

    the accuracy and contig lengths of resulting assemblies. We then compared the quality-trimmed Illumina assemblies to those from Sanger and pyrosequencing. For the simple community (10 genomes) all sequencing technologies assembled a similar amount and accurately represented the expected functional composition......Due to the complexity of the protocols and a limited knowledge of the nature of microbial communities, simulating metagenomic sequences plays an important role in testing the performance of existing tools and data analysis methods with metagenomic data. We developed metagenomic read simulators...... with platform-specific (Sanger, pyrosequencing, Illumina) base-error models, and simulated metagenomes of differing community complexities. We first evaluated the effect of rigorous quality control on Illumina data. Although quality filtering removed a large proportion of the data, it greatly improved...

  6. FY11 Report on Metagenome Analysis using Pathogen Marker Libraries

    Energy Technology Data Exchange (ETDEWEB)

    Gardner, Shea N. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Allen, Jonathan E. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); McLoughlin, Kevin S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Slezak, Tom [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2011-06-02

    A method, sequence library, and software suite was invented to rapidly assess whether any member of a pre-specified list of threat organisms or their near neighbors is present in a metagenome. The system was designed to handle mega- to giga-bases of FASTA-formatted raw sequence reads from short or long read next generation sequencing platforms. The approach is to pre-calculate a viral and a bacterial "Pathogen Marker Library" (PML) containing sub-sequences specific to pathogens or their near neighbors. A list of expected matches comparing every bacterial or viral genome against the PML sequences is also pre-calculated. To analyze a metagenome, reads are compared to the PML, and observed PML-metagenome matches are compared to the expected PML-genome matches, and the ratio of observed relative to expected matches is reported. In other words, a 3-way comparison among the PML, metagenome, and existing genome sequences is used to quickly assess which (if any) species included in the PML is likely to be present in the metagenome, based on available sequence data. Our tests showed that the species with the most PML matches correctly indicated the organism sequenced for empirical metagenomes consisting of a cultured, relatively pure isolate. These runs completed in 1 minute to 3 hours on 12 CPU (1 thread/CPU), depending on the metagenome and PML. Using more threads on the same number of CPU resulted in speed improvements roughly proportional to the number of threads. Simulations indicated that detection sensitivity depends on both sequencing coverage levels for a species and the size of the PML: species were correctly detected even at ~0.003x coverage by the large PMLs, and at ~0.03x coverage by the smaller PMLs. Matches to true positive species were 3-4 orders of magnitude higher than to false positives. Simulations with short reads (36 nt and ~260 nt) showed that species were usually detected for metagenome coverage above 0.005x and coverage in the PML above 0.05x, and

  7. Visualization and Analysis of Geology Word Vectors for Efficient Information Extraction

    Science.gov (United States)

    Floyd, J. S.

    2016-12-01

    allow one to extract information from hundreds of papers or more and find relationships in less time than it would take to read all of the papers. As machine learning tools become more commonly available, more and more scientists will be able to use and refine these tools for their individual needs.

  8. Information extraction from dynamic PS-InSAR time series using machine learning

    Science.gov (United States)

    van de Kerkhof, B.; Pankratius, V.; Chang, L.; van Swol, R.; Hanssen, R. F.

    2017-12-01

    Due to the increasing number of SAR satellites, with shorter repeat intervals and higher resolutions, SAR data volumes are exploding. Time series analyses of SAR data, i.e. Persistent Scatterer (PS) InSAR, enable the deformation monitoring of the built environment at an unprecedented scale, with hundreds of scatterers per km2, updated weekly. Potential hazards, e.g. due to failure of aging infrastructure, can be detected at an early stage. Yet, this requires the operational data processing of billions of measurement points, over hundreds of epochs, updating this data set dynamically as new data come in, and testing whether points (start to) behave in an anomalous way. Moreover, the quality of PS-InSAR measurements is ambiguous and heterogeneous, which will yield false positives and false negatives. Such analyses are numerically challenging. Here we extract relevant information from PS-InSAR time series using machine learning algorithms. We cluster (group together) time series with similar behaviour, even though they may not be spatially close, such that the results can be used for further analysis. First we reduce the dimensionality of the dataset in order to be able to cluster the data, since applying clustering techniques on high dimensional datasets often result in unsatisfying results. Our approach is to apply t-distributed Stochastic Neighbor Embedding (t-SNE), a machine learning algorithm for dimensionality reduction of high-dimensional data to a 2D or 3D map, and cluster this result using Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The results show that we are able to detect and cluster time series with similar behaviour, which is the starting point for more extensive analysis into the underlying driving mechanisms. The results of the methods are compared to conventional hypothesis testing as well as a Self-Organising Map (SOM) approach. Hypothesis testing is robust and takes the stochastic nature of the observations into account

  9. Integrated Metagenomic and Metatranscriptomic Analyses of Microbial Communities in the Meso- and Bathypelagic Realm of North Pacific Ocean

    Directory of Open Access Journals (Sweden)

    Deirdre R. Meldrum

    2013-10-01

    Full Text Available Although emerging evidence indicates that deep-sea water contains an untapped reservoir of high metabolic and genetic diversity, this realm has not been studied well compared with surface sea water. The study provided the first integrated meta-genomic and -transcriptomic analysis of the microbial communities in deep-sea water of North Pacific Ocean. DNA/RNA amplifications and simultaneous metagenomic and metatranscriptomic analyses were employed to discover information concerning deep-sea microbial communities from four different deep-sea sites ranging from the mesopelagic to pelagic ocean. Within the prokaryotic community, bacteria is absolutely dominant (~90% over archaea in both metagenomic and metatranscriptomic data pools. The emergence of archaeal phyla Crenarchaeota, Euryarchaeota, Thaumarchaeota, bacterial phyla Actinobacteria, Firmicutes, sub-phyla Betaproteobacteria, Deltaproteobacteria, and Gammaproteobacteria, and the decrease of bacterial phyla Bacteroidetes and Alphaproteobacteria are the main composition changes of prokaryotic communities in the deep-sea water, when compared with the reference Global Ocean Sampling Expedition (GOS surface water. Photosynthetic Cyanobacteria exist in all four metagenomic libraries and two metatranscriptomic libraries. In Eukaryota community, decreased abundance of fungi and algae in deep sea was observed. RNA/DNA ratio was employed as an index to show metabolic activity strength of microbes in deep sea. Functional analysis indicated that deep-sea microbes are leading a defensive lifestyle.

  10. Exploration of Metagenome Assemblies with an Interactive Visualization Tool

    Energy Technology Data Exchange (ETDEWEB)

    Cantor, Michael; Nordberg, Henrik; Smirnova, Tatyana; Andersen, Evan; Tringe, Susannah; Hess, Matthias; Dubchak, Inna

    2014-07-09

    Metagenomics, one of the fastest growing areas of modern genomic science, is the genetic profiling of the entire community of microbial organisms present in an environmental sample. Elviz is a web-based tool for the interactive exploration of metagenome assemblies. Elviz can be used with publicly available data sets from the Joint Genome Institute or with custom user-loaded assemblies. Elviz is available at genome.jgi.doe.gov/viz

  11. MGkit: Metagenomic Framework For The Study Of Microbial Communities

    OpenAIRE

    Rubino, Francesco; Creevey, C. J.

    2014-01-01

    Introduction While metagenomics has been used extensively to study microbial communities from a taxonomic and functional perspective, little has been done to address how the species in a microbiome are adapted to and maintain specific roles in dynamic environments like the rumen. Rationale To address this issue we have developed a framework for the robust analysis of metagenomic data that includes fully automated analysis from next-generation sequencing (NGS) reads to assembly, gene ...

  12. Going Deeper: Metagenome of a Hadopelagic Microbial Community

    OpenAIRE

    Eloe, Emiley A.; Fadrosh, Douglas W.; Novotny, Mark; Zeigler Allen, Lisa; Kim, Maria; Lombardo, Mary-Jane; Yee-Greenbaum, Joyclyn; Yooseph, Shibu; Allen, Eric E.; Lasken, Roger; Williamson, Shannon J.; Bartlett, Douglas H.

    2011-01-01

    The paucity of sequence data from pelagic deep-ocean microbial assemblages has severely restricted molecular exploration of the largest biome on Earth. In this study, an analysis is presented of a large-scale 454-pyrosequencing metagenomic dataset from a hadopelagic environment from 6,000 m depth within the Puerto Rico Trench (PRT). A total of 145 Mbp of assembled sequence data was generated and compared to two pelagic deep ocean metagenomes and two representative surface seawater datasets fr...

  13. Expanding the marine virosphere using metagenomics.

    Directory of Open Access Journals (Sweden)

    Carolina Megumi Mizuno

    Full Text Available Viruses infecting prokaryotic cells (phages are the most abundant entities of the biosphere and contain a largely uncharted wealth of genomic diversity. They play a critical role in the biology of their hosts and in ecosystem functioning at large. The classical approaches studying phages require isolation from a pure culture of the host. Direct sequencing approaches have been hampered by the small amounts of phage DNA present in most natural habitats and the difficulty in applying meta-omic approaches, such as annotation of small reads and assembly. Serendipitously, it has been discovered that cellular metagenomes of highly productive ocean waters (the deep chlorophyll maximum contain significant amounts of viral DNA derived from cells undergoing the lytic cycle. We have taken advantage of this phenomenon to retrieve metagenomic fosmids containing viral DNA from a Mediterranean deep chlorophyll maximum sample. This method allowed description of complete genomes of 208 new marine phages. The diversity of these genomes was remarkable, contributing 21 genomic groups of tailed bacteriophages of which 10 are completely new. Sequence based methods have allowed host assignment to many of them. These predicted hosts represent a wide variety of important marine prokaryotic microbes like members of SAR11 and SAR116 clades, Cyanobacteria and also the newly described low GC Actinobacteria. A metavirome constructed from the same habitat showed that many of the new phage genomes were abundantly represented. Furthermore, other available metaviromes also indicated that some of the new phages are globally distributed in low to medium latitude ocean waters. The availability of many genomes from the same sample allows a direct approach to viral population genomics confirming the remarkable mosaicism of phage genomes.

  14. TempoWordNet : une ressource lexicale pour l'extraction d'information temporelle

    OpenAIRE

    Hasanuzzaman , Mohammed

    2016-01-01

    The ability to capture the time information conveyed in natural language, where that information is expressed either explicitly, or implicitly, or connotative, is essential to many natural language processing applications such as information retrieval, question answering, automatic summarization, targeted marketing, loan repayment forecasting, and understanding economic patterns. Associating word senses with temporal orientation to grasp the temporal information in language is relatively stra...

  15. Characterization of a Soil Metagenome-Derived Gene Encoding Wax Ester Synthase.

    Science.gov (United States)

    Kim, Nam Hee; Park, Ji-Hye; Chung, Eunsook; So, Hyun-Ah; Lee, Myung Hwan; Kim, Jin-Cheol; Hwang, Eul Chul; Lee, Seon-Woo

    2016-02-01

    A soil metagenome contains the genomes of all microbes included in a soil sample, including those that cannot be cultured. In this study, soil metagenome libraries were searched for microbial genes exhibiting lipolytic activity and those involved in potential lipid metabolism that could yield valuable products in microorganisms. One of the subclones derived from the original fosmid clone, pELP120, was selected for further analysis. A subclone spanning a 3.3 kb DNA fragment was found to encode for lipase/esterase and contained an additional partial open reading frame encoding a wax ester synthase (WES) motif. Consequently, both pELP120 and the full length of the gene potentially encoding WES were sequenced. To determine if the wes gene encoded a functioning WES protein that produced wax esters, gas chromatography-mass spectroscopy was conducted using ethyl acetate extract from an Escherichia coli strain that expressed the wes gene and was grown with hexadecanol. The ethyl acetate extract from this E. coli strain did indeed produce wax ester compounds of various carbon-chain lengths. DNA sequence analysis of the full-length gene revealed that the gene cluster may be derived from a member of Proteobacteria, whereas the clone does not contain any clear phylogenetic markers. These results suggest that the wes gene discovered in this study encodes a functional protein in E. coli and produces wax esters through a heterologous expression system.

  16. Functional Screening of Antibiotic Resistance Genes from a Representative Metagenomic Library of Food Fermenting Microbiota

    Directory of Open Access Journals (Sweden)

    Chiara Devirgiliis

    2014-01-01

    Full Text Available Lactic acid bacteria (LAB represent the predominant microbiota in fermented foods. Foodborne LAB have received increasing attention as potential reservoir of antibiotic resistance (AR determinants, which may be horizontally transferred to opportunistic pathogens. We have previously reported isolation of AR LAB from the raw ingredients of a fermented cheese, while AR genes could be detected in the final, marketed product only by PCR amplification, thus pointing at the need for more sensitive microbial isolation techniques. We turned therefore to construction of a metagenomic library containing microbial DNA extracted directly from the food matrix. To maximize yield and purity and to ensure that genomic complexity of the library was representative of the original bacterial population, we defined a suitable protocol for total DNA extraction from cheese which can also be applied to other lipid-rich foods. Functional library screening on different antibiotics allowed recovery of ampicillin and kanamycin resistant clones originating from Streptococcus salivarius subsp. thermophilus and Lactobacillus helveticus genomes. We report molecular characterization of the cloned inserts, which were fully sequenced and shown to confer AR phenotype to recipient bacteria. We also show that metagenomics can be applied to food microbiota to identify underrepresented species carrying specific genes of interest.

  17. Gene-Based Pathogen Detection: Can We Use qPCR to Predict the Outcome of Diagnostic Metagenomics?

    DEFF Research Database (Denmark)

    Andersen, Sandra Christine; Fachmann, Mette Sofie Rousing; Kiil, Kristoffer

    2017-01-01

    In microbial food safety, molecular methods such as quantitative PCR (qPCR) and next-generation sequencing (NGS) of bacterial isolates can potentially be replaced by diagnostic shotgun metagenomics. However, the methods for pre-analytical sample preparation are often optimized for qPCR, and do...... not necessarily perform equally well for qPCR and sequencing. The present study investigates, through screening of methods, whether qPCR can be used as an indicator for the optimization of sample preparation for NGS-based shotgun metagenomics with a diagnostic focus. This was used on human fecal samples spiked...... with 10³ or 10⁶ colony-forming units (CFU)/g Campylobacter jejuni, as well as porcine fecal samples spiked with 10³ or 10⁶ CFU/g Salmonella typhimurium. DNA was extracted from the samples using variations of two widely used kits. The following quality parameters were measured: DNA concentration, qPCR, DNA...

  18. The BEL information extraction workflow (BELIEF): Evaluation in the BioCreative V BEL and IAT track

    OpenAIRE

    Madan, Sumit; Hodapp, Sven; Senger, Philipp; Ansari, Sam; Szostak, Justyna; Hoeng, Julia; Peitsch, Manuel; Fluck, Juliane

    2016-01-01

    Network-based approaches have become extremely important in systems biology to achieve a better understanding of biological mechanisms. For network representation, the Biological Expression Language (BEL) is well designed to collate findings from the scientific literature into biological network models. To facilitate encoding and biocuration of such findings in BEL, a BEL Information Extraction Workflow (BELIEF) was developed. BELIEF provides a web-based curation interface, the BELIEF Dashboa...

  19. Proactive Response to Potential Material Shortages Arising from Environmental Restrictions Using Automatic Discovery and Extraction of Information from Technical Documents

    Science.gov (United States)

    2012-12-21

    documents is via web links on manufacturer and distributor product catalogs on the web . These links were discovered using XSB’s focused crawler technology...regulatory information with specific items Focused Crawler Data In addition to PDF documents, we made use of data collected from the web using XSB, Inc...listed in the catalog. As a rule, the focused crawler extracts attributes and values as they appear on the web page. To be useful for this project

  20. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata

    Science.gov (United States)

    Liolios, Konstantinos; Chen, I-Min A.; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Philip; Markowitz, Victor M.; Kyrpides, Nikos C.

    2010-01-01

    The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification. GOLD is available at: http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece, at: http://gold.imbb.forth.gr/ PMID:19914934

  1. Comparison of Qinzhou bay wetland landscape information extraction by three methods

    Directory of Open Access Journals (Sweden)

    X. Chang

    2014-04-01

    and OO is 219 km2, 193.70 km2, 217.40 km2 respectively. The result indicates that SC is in the f irst place, followed by OO approach, and the third DT method when used to extract Qingzhou Bay coastal wetland.

  2. Microbial community profiling of human saliva using shotgun metagenomic sequencing.

    Directory of Open Access Journals (Sweden)

    Nur A Hasan

    Full Text Available Human saliva is clinically informative of both oral and general health. Since next generation shotgun sequencing (NGS is now widely used to identify and quantify bacteria, we investigated the bacterial flora of saliva microbiomes of two healthy volunteers and five datasets from the Human Microbiome Project, along with a control dataset containing short NGS reads from bacterial species representative of the bacterial flora of human saliva. GENIUS, a system designed to identify and quantify bacterial species using unassembled short NGS reads was used to identify the bacterial species comprising the microbiomes of the saliva samples and datasets. Results, achieved within minutes and at greater than 90% accuracy, showed more than 175 bacterial species comprised the bacterial flora of human saliva, including bacteria known to be commensal human flora but also Haemophilus influenzae, Neisseria meningitidis, Streptococcus pneumoniae, and Gamma proteobacteria. Basic Local Alignment Search Tool (BLASTn analysis in parallel, reported ca. five times more species than those actually comprising the in silico sample. Both GENIUS and BLAST analyses of saliva samples identified major genera comprising the bacterial flora of saliva, but GENIUS provided a more precise description of species composition, identifying to strain in most cases and delivered results at least 10,000 times faster. Therefore, GENIUS offers a facile and accurate system for identification and quantification of bacterial species and/or strains in metagenomic samples.

  3. Modulations of the chicken cecal microbiome and metagenome in response to anticoccidial and growth promoter treatment.

    Directory of Open Access Journals (Sweden)

    Jessica L Danzeisen

    Full Text Available With increasing pressures to reduce or eliminate the use of antimicrobials for growth promotion purposes in production animals, there is a growing need to better understand the effects elicited by these agents in order to identify alternative approaches that might be used to maintain animal health. Antibiotic usage at subtherapeutic levels is postulated to confer a number of modulations in the microbes within the gut that ultimately result in growth promotion and reduced occurrence of disease. This study examined the effects of the coccidiostat monensin and the growth promoters virginiamycin and tylosin on the broiler chicken cecal microbiome and metagenome. Using a longitudinal design, cecal contents of commercial chickens were extracted and examined using 16S rRNA and total DNA shotgun metagenomic pyrosequencing. A number of genus-level enrichments and depletions were observed in response to monensin alone, or monensin in combination with virginiamycin or tylosin. Of note, monensin effects included depletions of Roseburia, Lactobacillus and Enterococcus, and enrichments in Coprococcus and Anaerofilum. The most notable effect observed in the monensin/virginiamycin and monensin/tylosin treatments, but not in the monensin-alone treatments, was enrichments in Escherichia coli. Analysis of the metagenomic dataset identified enrichments in transport system genes, type I fimbrial genes, and type IV conjugative secretion system genes. No significant differences were observed with regard to antimicrobial resistance gene counts. Overall, this study provides a more comprehensive glimpse of the chicken cecum microbial community, the modulations of this community in response to growth promoters, and targets for future efforts to mimic these effects using alternative approaches.

  4. Modulations of the chicken cecal microbiome and metagenome in response to anticoccidial and growth promoter treatment.

    Science.gov (United States)

    Danzeisen, Jessica L; Kim, Hyeun Bum; Isaacson, Richard E; Tu, Zheng Jin; Johnson, Timothy J

    2011-01-01

    With increasing pressures to reduce or eliminate the use of antimicrobials for growth promotion purposes in production animals, there is a growing need to better understand the effects elicited by these agents in order to identify alternative approaches that might be used to maintain animal health. Antibiotic usage at subtherapeutic levels is postulated to confer a number of modulations in the microbes within the gut that ultimately result in growth promotion and reduced occurrence of disease. This study examined the effects of the coccidiostat monensin and the growth promoters virginiamycin and tylosin on the broiler chicken cecal microbiome and metagenome. Using a longitudinal design, cecal contents of commercial chickens were extracted and examined using 16S rRNA and total DNA shotgun metagenomic pyrosequencing. A number of genus-level enrichments and depletions were observed in response to monensin alone, or monensin in combination with virginiamycin or tylosin. Of note, monensin effects included depletions of Roseburia, Lactobacillus and Enterococcus, and enrichments in Coprococcus and Anaerofilum. The most notable effect observed in the monensin/virginiamycin and monensin/tylosin treatments, but not in the monensin-alone treatments, was enrichments in Escherichia coli. Analysis of the metagenomic dataset identified enrichments in transport system genes, type I fimbrial genes, and type IV conjugative secretion system genes. No significant differences were observed with regard to antimicrobial resistance gene counts. Overall, this study provides a more comprehensive glimpse of the chicken cecum microbial community, the modulations of this community in response to growth promoters, and targets for future efforts to mimic these effects using alternative approaches.

  5. Isolation of xylose isomerases by sequence- and function-based screening from a soil metagenomic library

    Directory of Open Access Journals (Sweden)

    Parachin Nádia

    2011-05-01

    Full Text Available Abstract Background Xylose isomerase (XI catalyses the isomerisation of xylose to xylulose in bacteria and some fungi. Currently, only a limited number of XI genes have been functionally expressed in Saccharomyces cerevisiae, the microorganism of choice for lignocellulosic ethanol production. The objective of the present study was to search for novel XI genes in the vastly diverse microbial habitat present in soil. As the exploitation of microbial diversity is impaired by the ability to cultivate soil microorganisms under standard laboratory conditions, a metagenomic approach, consisting of total DNA extraction from a given environment followed by cloning of DNA into suitable vectors, was undertaken. Results A soil metagenomic library was constructed and two screening methods based on protein sequence similarity and enzyme activity were investigated to isolate novel XI encoding genes. These two screening approaches identified the xym1 and xym2 genes, respectively. Sequence and phylogenetic analyses revealed that the genes shared 67% similarity and belonged to different bacterial groups. When xym1 and xym2 were overexpressed in a xylA-deficient Escherichia coli strain, similar growth rates to those in which the Piromyces XI gene was expressed were obtained. However, expression in S. cerevisiae resulted in only one-fourth the growth rate of that obtained for the strain expressing the Piromyces XI gene. Conclusions For the first time, the screening of a soil metagenomic library in E. coli resulted in the successful isolation of two active XIs. However, the discrepancy between XI enzyme performance in E. coli and S. cerevisiae suggests that future screening for XI activity from soil should be pursued directly using yeast as a host.

  6. Going deeper: metagenome of a hadopelagic microbial community.

    Directory of Open Access Journals (Sweden)

    Emiley A Eloe

    Full Text Available The paucity of sequence data from pelagic deep-ocean microbial assemblages has severely restricted molecular exploration of the largest biome on Earth. In this study, an analysis is presented of a large-scale 454-pyrosequencing metagenomic dataset from a hadopelagic environment from 6,000 m depth within the Puerto Rico Trench (PRT. A total of 145 Mbp of assembled sequence data was generated and compared to two pelagic deep ocean metagenomes and two representative surface seawater datasets from the Sargasso Sea. In a number of instances, all three deep metagenomes displayed similar trends, but were most magnified in the PRT, including enrichment in functions for two-component signal transduction mechanisms and transcriptional regulation. Overrepresented transporters in the PRT metagenome included outer membrane porins, diverse cation transporters, and di- and tri-carboxylate transporters that matched well with the prevailing catabolic processes such as butanoate, glyoxylate and dicarboxylate metabolism. A surprisingly high abundance of sulfatases for the degradation of sulfated polysaccharides were also present in the PRT. The most dramatic adaptational feature of the PRT microbes appears to be heavy metal resistance, as reflected in the large numbers of transporters present for their removal. As a complement to the metagenome approach, single-cell genomic techniques were utilized to generate partial whole-genome sequence data from four uncultivated cells from members of the dominant phyla within the PRT, Alphaproteobacteria, Gammaproteobacteria, Bacteroidetes and Planctomycetes. The single-cell sequence data provided genomic context for many of the highly abundant functional attributes identified from the PRT metagenome, as well as recruiting heavily the PRT metagenomic sequence data compared to 172 available reference marine genomes. Through these multifaceted sequence approaches, new insights have been provided into the unique functional

  7. MIDAS. An algorithm for the extraction of modal information from experimentally determined transfer functions

    International Nuclear Information System (INIS)

    Durrans, R.F.

    1978-12-01

    In order to design reactor structures to withstand the large flow and acoustic forces present it is necessary to know something of their dynamic properties. In many cases these properties cannot be predicted theoretically and it is necessary to determine them experimentally. The algorithm MIDAS (Modal Identification for the Dynamic Analysis of Structures) which has been developed at B.N.L. for extracting these structural properties from experimental data is described. (author)

  8. A metagenome of a full-scale microbial community carrying out Enhanced Biological Phosphorus Removal

    DEFF Research Database (Denmark)

    Albertsen, Mads; Hansen, Lea Benedicte Skov; Saunders, Aaron Marc

    2012-01-01

    in situ hybridization (qFISH) was applied as an independent method to evaluate the community structure. The results were in qualitative agreement, but a DNA extraction bias against gram positive bacteria using standard extraction protocols was identified, which would not have been identified without...... the use of qFISH. The genetic potential for community function showed enrichment of genes involved in phosphate metabolism and biofilm formation, reflecting the selective pressure of the EBPR process. Most contigs in the assembled metagenome had low similarity to genes from currently sequenced genomes...... bacteria by qFISH, but the depth of sequencing enabled detailed insight into their microdiversity in the full-scale plant. Only 15% of the reads matching Accumulibacter had a high similarity (495%) to the sequenced Accumulibacter clade IIA strain UW-1 genome, indicating the presence of some microdiversity...

  9. Extracting Information about the Initial State from the Black Hole Radiation.

    Science.gov (United States)

    Lochan, Kinjalk; Padmanabhan, T

    2016-02-05

    The crux of the black hole information paradox is related to the fact that the complete information about the initial state of a quantum field in a collapsing spacetime is not available to future asymptotic observers, belying the expectations from a unitary quantum theory. We study the imprints of the initial quantum state contained in a specific class of distortions of the black hole radiation and identify the classes of in states that can be partially or fully reconstructed from the information contained within. Even for the general in state, we can uncover some specific information. These results suggest that a classical collapse scenario ignores this richness of information in the resulting spectrum and a consistent quantum treatment of the entire collapse process might allow us to retrieve much more information from the spectrum of the final radiation.

  10. Meta4: a web-application for sharing and annotating metagenomic gene predictions using web-services

    Directory of Open Access Journals (Sweden)

    Emily J Richardson

    2013-09-01

    Full Text Available Whole-genome-shotgun (WGS metagenomics experiments produce DNA sequence data from entire ecosystems, and provide a huge amount of novel information. Gene discovery projects require up-to-date information about sequence homology and domain structure for millions of predicted proteins to be presented in a simple, easy-to-use system. There is a lack of simple, open, flexible tools that allow the rapid sharing of metagenomics datasets with collaborators in a format they can easily interrogate. We present Meta4, a flexible and extensible web-application that can be used to share and annotate metagenomic gene predictions. Proteins and predicted domains are stored in a simple relational database, with a dynamic front-end which displays the results in an internet browser. Web-services are used to provide up-to-date information about the proteins from homology searches against public databases. Information about Meta4 can be found on the project website (http://www.ark-genomics.org/bioinformatics/meta4, code is available on Github (https://github.com/mw55309/meta4, a cloud image is available, and an example implementation can be seen at http://www.ark-genomics.org/tools/meta4

  11. Unsupervised improvement of named entity extraction in short informal context using disambiguation clues

    NARCIS (Netherlands)

    Habib, Mena Badieh; van Keulen, Maurice

    2012-01-01

    Short context messages (like tweets and SMS’s) are a potentially rich source of continuously and instantly updated information. Shortness and informality of such messages are challenges for Natural Language Processing tasks. Most efforts done in this direction rely on machine learning techniques

  12. Automated Methods to Extract Patient New Information from Clinical Notes in Electronic Health Record Systems

    Science.gov (United States)

    Zhang, Rui

    2013-01-01

    The widespread adoption of Electronic Health Record (EHR) has resulted in rapid text proliferation within clinical care. Clinicians' use of copying and pasting functions in EHR systems further compounds this by creating a large amount of redundant clinical information in clinical documents. A mixture of redundant information (especially outdated…

  13. Lung region extraction based on the model information and the inversed MIP method by using chest CT images

    International Nuclear Information System (INIS)

    Tomita, Toshihiro; Miguchi, Ryosuke; Okumura, Toshiaki; Yamamoto, Shinji; Matsumoto, Mitsuomi; Tateno, Yukio; Iinuma, Takeshi; Matsumoto, Toru.

    1997-01-01

    We developed a lung region extraction method based on the model information and the inversed MIP method in the Lung Cancer Screening CT (LSCT). Original model is composed of typical 3-D lung contour lines, a body axis, an apical point, and a convex hull. First, the body axis. the apical point, and the convex hull are automatically extracted from the input image Next, the model is properly transformed to fit to those of input image by the affine transformation. Using the same affine transformation coefficients, typical lung contour lines are also transferred, which correspond to rough contour lines of input image. Experimental results applied for 68 samples showed this method quite promising. (author)

  14. A Massively Parallel Sequence Similarity Search for Metagenomic Sequencing Data

    Directory of Open Access Journals (Sweden)

    Masanori Kakuta

    2017-10-01

    Full Text Available Sequence similarity searches have been widely used in the analyses of metagenomic sequencing data. Finding homologous sequences in a reference database enables the estimation of taxonomic and functional characteristics of each query sequence. Because current metagenomic sequencing data consist of a large number of nucleotide sequences, the time required for sequence similarity searches account for a large proportion of the total time. This time-consuming step makes it difficult to perform large-scale analyses. To analyze large-scale metagenomic data, such as those found in the human oral microbiome, we developed GHOST-MP (Genome-wide HOmology Search Tool on Massively Parallel system, a parallel sequence similarity search tool for massively parallel computing systems. This tool uses a fast search algorithm based on suffix arrays of query and database sequences and a hierarchical parallel search to accelerate the large-scale sequence similarity search of metagenomic sequencing data. The parallel computing efficiency and the search speed of this tool were evaluated. GHOST-MP was shown to be scalable over 10,000 CPU (Central Processing Unit cores, and achieved over 80-fold acceleration compared with mpiBLAST using the same computational resources. We applied this tool to human oral metagenomic data, and the results indicate that the oral cavity, the oral vestibule, and plaque have different characteristics based on the functional gene category.

  15. Metagenomic analysis of permafrost microbial community response to thaw

    Energy Technology Data Exchange (ETDEWEB)

    Mackelprang, R.; Waldrop, M.P.; DeAngelis, K.M.; David, M.M.; Chavarria, K.L.; Blazewicz, S.J.; Rubin, E.M.; Jansson, J.K.

    2011-07-01

    We employed deep metagenomic sequencing to determine the impact of thaw on microbial phylogenetic and functional genes and related this data to measurements of methane emissions. Metagenomics, the direct sequencing of DNA from the environment, allows for the examination of whole biochemical pathways and associated processes, as opposed to individual pieces of the metabolic puzzle. Our metagenome analyses revealed that during transition from a frozen to a thawed state there were rapid shifts in many microbial, phylogenetic and functional gene abundances and pathways. After one week of incubation at 5°C, permafrost metagenomes converged to be more similar to each other than while they were frozen. We found that multiple genes involved in cycling of C and nitrogen shifted rapidly during thaw. We also constructed the first draft genome from a complex soil metagenome, which corresponded to a novel methanogen. Methane previously accumulated in permafrost was released during thaw and subsequently consumed by methanotrophic bacteria. Together these data point towards the importance of rapid cycling of methane and nitrogen in thawing permafrost.

  16. Application of metagenomics in the human gut microbiome.

    Science.gov (United States)

    Wang, Wei-Lin; Xu, Shao-Yan; Ren, Zhi-Gang; Tao, Liang; Jiang, Jian-Wen; Zheng, Shu-Sen

    2015-01-21

    There are more than 1000 microbial species living in the complex human intestine. The gut microbial community plays an important role in protecting the host against pathogenic microbes, modulating immunity, regulating metabolic processes, and is even regarded as an endocrine organ. However, traditional culture methods are very limited for identifying microbes. With the application of molecular biologic technology in the field of the intestinal microbiome, especially metagenomic sequencing of the next-generation sequencing technology, progress has been made in the study of the human intestinal microbiome. Metagenomics can be used to study intestinal microbiome diversity and dysbiosis, as well as its relationship to health and disease. Moreover, functional metagenomics can identify novel functional genes, microbial pathways, antibiotic resistance genes, functional dysbiosis of the intestinal microbiome, and determine interactions and co-evolution between microbiota and host, though there are still some limitations. Metatranscriptomics, metaproteomics and metabolomics represent enormous complements to the understanding of the human gut microbiome. This review aims to demonstrate that metagenomics can be a powerful tool in studying the human gut microbiome with encouraging prospects. The limitations of metagenomics to be overcome are also discussed. Metatranscriptomics, metaproteomics and metabolomics in relation to the study of the human gut microbiome are also briefly discussed.

  17. Meta-IDBA: a de Novo assembler for metagenomic data.

    Science.gov (United States)

    Peng, Yu; Leung, Henry C M; Yiu, S M; Chin, Francis Y L

    2011-07-01

    Next-generation sequencing techniques allow us to generate reads from a microbial environment in order to analyze the microbial community. However, assembling of a set of mixed reads from different species to form contigs is a bottleneck of metagenomic research. Although there are many assemblers for assembling reads from a single genome, there are no assemblers for assembling reads in metagenomic data without reference genome sequences. Moreover, the performances of these assemblers on metagenomic data are far from satisfactory, because of the existence of common regions in the genomes of subspecies and species, which make the assembly problem much more complicated. We introduce the Meta-IDBA algorithm for assembling reads in metagenomic data, which contain multiple genomes from different species. There are two core steps in Meta-IDBA. It first tries to partition the de Bruijn graph into isolated components of different species based on an important observation. Then, for each component, it captures the slight variants of the genomes of subspecies from the same species by multiple alignments and represents the genome of one species, using a consensus sequence. Comparison of the performances of Meta-IDBA and existing assemblers, such as Velvet and Abyss for different metagenomic datasets shows that Meta-IDBA can reconstruct longer contigs with similar accuracy. Meta-IDBA toolkit is available at our website http://www.cs.hku.hk/~alse/metaidba. chin@cs.hku.hk.

  18. Inference of microbial recombination rates from metagenomic data.

    Directory of Open Access Journals (Sweden)

    Philip L F Johnson

    2009-10-01

    Full Text Available Metagenomic sequencing projects from environments dominated by a small number of species produce genome-wide population samples. We present a two-site composite likelihood estimator of the scaled recombination rate, rho = 2N(ec, that operates on metagenomic assemblies in which each sequenced fragment derives from a different individual. This new estimator properly accounts for sequencing error, as quantified by per-base quality scores, and missing data, as inferred from the placement of reads in a metagenomic assembly. We apply our estimator to data from a sludge metagenome project to demonstrate how this method will elucidate the rates of exchange of genetic material in natural microbial populations. Surprisingly, for a fixed amount of sequencing, this estimator has lower variance than similar methods that operate on more traditional population genetic samples of comparable size. In addition, we can infer variation in recombination rate across the genome because metagenomic projects sample genetic diversity genome-wide, not just at particular loci. The method itself makes no assumption specific to microbial populations, opening the door for application to any mixed population sample where the number of individuals sampled is much greater than the number of fragments sequenced.

  19. The great screen anomaly-a new frontier in product discovery through functional metagenomics

    NARCIS (Netherlands)

    Ekkers, David Matthias; Cretoiu, Mariana Silvia; Kielak, Anna Maria; van Elsas, Jan Dirk

    Functional metagenomics, the study of the collective genome of a microbial community by expressing it in a foreign host, is an emerging field in biotechnology. Over the past years, the possibility of novel product discovery through metagenomics has developed rapidly. Thus, metagenomics has been

  20. The oral metagenome in health and disease.

    Science.gov (United States)

    Belda-Ferre, Pedro; Alcaraz, Luis David; Cabrera-Rubio, Raúl; Romero, Héctor; Simón-Soro, Aurea; Pignatelli, Miguel; Mira, Alex

    2012-01-01

    The oral cavity of humans is inhabited by hundreds of bacterial species and some of them have a key role in the development of oral diseases, mainly dental caries and periodontitis. We describe for the first time the metagenome of the human oral cavity under health and diseased conditions, with a focus on supragingival dental plaque and cavities. Direct pyrosequencing of eight samples with different oral-health status produced 1 Gbp of sequence without the biases imposed by PCR or cloning. These data show that cavities are not dominated by Streptococcus mutans (the species originally identified as the ethiological agent of dental caries) but are in fact a complex community formed by tens of bacterial species, in agreement with the view that caries is a polymicrobial disease. The analysis of the reads indicated that the oral cavity is functionally a different environment from the gut, with many functional categories enriched in one of the two environments and depleted in the other. Individuals who had never suffered from dental caries showed an over-representation of several functional categories, like genes for antimicrobial peptides and quorum sensing. In addition, they did not have mutans streptococci but displayed high recruitment of other species. Several isolates belonging to these dominant bacteria in healthy individuals were cultured and shown to inhibit the growth of cariogenic bacteria, suggesting the use of these commensal bacterial strains as probiotics to promote oral health and prevent dental caries.

  1. Metagenomic analysis of phosphorus removing sludgecommunities

    Energy Technology Data Exchange (ETDEWEB)

    Garcia Martin, Hector; Ivanova, Natalia; Kunin, Victor; Warnecke,Falk; Barry, Kerrie; McHardy, Alice C.; Yeates, Christine; He, Shaomei; Salamov, Asaf; Szeto, Ernest; Dalin, Eileen; Putnam, Nik; Shapiro, HarrisJ.; Pangilinan, Jasmyn L.; Rigoutsos, Isidore; Kyrpides, Nikos C.; Blackall, Linda Louise; McMahon, Katherine D.; Hugenholtz, Philip

    2006-02-01

    Enhanced Biological Phosphorus Removal (EBPR) is not wellunderstood at the metabolic level despite being one of the best-studiedmicrobially-mediated industrial processes due to its ecological andeconomic relevance. Here we present a metagenomic analysis of twolab-scale EBPR sludges dominated by the uncultured bacterium, "CandidatusAccumulibacter phosphatis." This analysis resolves several controversiesin EBPR metabolic models and provides hypotheses explaining the dominanceof A. phosphatis in this habitat, its lifestyle outside EBPR and probablecultivation requirements. Comparison of the same species from differentEBPR sludges highlights recent evolutionary dynamics in the A. phosphatisgenome that could be linked to mechanisms for environmental adaptation.In spite of an apparent lack of phylogenetic overlap in the flankingcommunities of the two sludges studied, common functional themes werefound, at least one of them complementary to the inferred metabolism ofthe dominant organism. The present study provides a much-needed blueprintfor a systems-level understanding of EBPR and illustrates thatmetagenomics enables detailed, often novel, insights into evenwell-studied biological systems.

  2. Metagenomic scaffolds enable combinatorial lignin transformation.

    Science.gov (United States)

    Strachan, Cameron R; Singh, Rahul; VanInsberghe, David; Ievdokymenko, Kateryna; Budwill, Karen; Mohn, William W; Eltis, Lindsay D; Hallam, Steven J

    2014-07-15

    Engineering the microbial transformation of lignocellulosic biomass is essential to developing modern biorefining processes that alleviate reliance on petroleum-derived energy and chemicals. Many current bioprocess streams depend on the genetic tractability of Escherichia coli with a primary emphasis on engineering cellulose/hemicellulose catabolism, small molecule production, and resistance to product inhibition. Conversely, bioprocess streams for lignin transformation remain embryonic, with relatively few environmental strains or enzymes implicated. Here we develop a biosensor responsive to monoaromatic lignin transformation products compatible with functional screening in E. coli. We use this biosensor to retrieve metagenomic scaffolds sourced from coal bed bacterial communities conferring an array of lignin transformation phenotypes that synergize in combination. Transposon mutagenesis and comparative sequence analysis of active clones identified genes encoding six functional classes mediating lignin transformation phenotypes that appear to be rearrayed in nature via horizontal gene transfer. Lignin transformation activity was then demonstrated for one of the predicted gene products encoding a multicopper oxidase to validate the screen. These results illuminate cellular and community-wide networks acting on aromatic polymers and expand the toolkit for engineering recombinant lignin transformation based on ecological design principles.

  3. OTU analysis using metagenomic shotgun sequencing data.

    Directory of Open Access Journals (Sweden)

    Xiaolin Hao

    Full Text Available Because of technological limitations, the primer and amplification biases in targeted sequencing of 16S rRNA genes have veiled the true microbial diversity underlying environmental samples. However, the protocol of metagenomic shotgun sequencing provides 16S rRNA gene fragment data with natural immunity against the biases raised during priming and thus the potential of uncovering the true structure of microbial community by giving more accurate predictions of operational taxonomic units (OTUs. Nonetheless, the lack of statistically rigorous comparison between 16S rRNA gene fragments and other data types makes it difficult to interpret previously reported results using 16S rRNA gene fragments. Therefore, in the present work, we established a standard analysis pipeline that would help confirm if the differences in the data are true or are just due to potential technical bias. This pipeline is built by using simulated data to find optimal mapping and OTU prediction methods. The comparison between simulated datasets revealed a relationship between 16S rRNA gene fragments and full-length 16S rRNA sequences that a 16S rRNA gene fragment having a length >150 bp provides the same accuracy as a full-length 16S rRNA sequence using our proposed pipeline, which could serve as a good starting point for experimental design and making the comparison between 16S rRNA gene fragment-based and targeted 16S rRNA sequencing-based surveys possible.

  4. Wavelet analysis of molecular dynamics: Efficient extraction of time-frequency information in ultrafast optical processes

    International Nuclear Information System (INIS)

    Prior, Javier; Castro, Enrique; Chin, Alex W.; Almeida, Javier; Huelga, Susana F.; Plenio, Martin B.

    2013-01-01

    New experimental techniques based on nonlinear ultrafast spectroscopies have been developed over the last few years, and have been demonstrated to provide powerful probes of quantum dynamics in different types of molecular aggregates, including both natural and artificial light harvesting complexes. Fourier transform-based spectroscopies have been particularly successful, yet “complete” spectral information normally necessitates the loss of all information on the temporal sequence of events in a signal. This information though is particularly important in transient or multi-stage processes, in which the spectral decomposition of the data evolves in time. By going through several examples of ultrafast quantum dynamics, we demonstrate that the use of wavelets provide an efficient and accurate way to simultaneously acquire both temporal and frequency information about a signal, and argue that this greatly aids the elucidation and interpretation of physical process responsible for non-stationary spectroscopic features, such as those encountered in coherent excitonic energy transport

  5. FIVA : Functional Information Viewer and Analyzer extracting biological knowledge from transcriptome data of prokaryotes

    NARCIS (Netherlands)

    Blom, E.J.; Bosman, D.W.; van Hijum, S.A F T; Breitling, R.; Tijsma, L.; Silvis, R.; Roerdink, J.B.T.M.; Kuipers, O.P.

    2007-01-01

    FIVA (Function Information Viewer and Analyzer) aids researchers in the prokaryotic community to quickly identify relevant biological processes following transcriptome analysis. Our software assists in functional profiling of large sets of genes and generates a comprehensive overview of affected

  6. Phylogenetic and functional analysis of metagenome sequence from high-temperature archaeal habitats demonstrate linkages between metabolic potential and geochemistry

    Directory of Open Access Journals (Sweden)

    William P. Inskeep

    2013-05-01

    Full Text Available Geothermal habitats in Yellowstone National Park (YNP provide an unparalled opportunity to understand the environmental factors that control the distribution of archaea in thermal habitats. Here we describe, analyze and synthesize metagenomic and geochemical data collected from seven high-temperature sites that contain microbial communities dominated by archaea relative to bacteria. The specific objectives of the study were to use metagenome sequencing to determine the structure and functional capacity of thermophilic archaeal-dominated microbial communities across a pH range from 2.5 to 6.4 and to discuss specific examples where the metabolic potential correlated with measured environmental parameters and geochemical processes occurring in situ. Random shotgun metagenome sequence (~40-45 Mbase Sanger sequencing per site was obtained from environmental DNA extracted from high-temperature sediments and/or microbial mats and subjected to numerous phylogenetic and functional analyses. Analysis of individual sequences (e.g., MEGAN and G+C content and assemblies from each habitat type revealed the presence of dominant archaeal populations in all environments, 10 of whose genomes were largely reconstructed from the sequence data. Analysis of protein family occurrence, particularly of those involved in energy conservation, electron transport and autotrophic metabolism, revealed significant differences in metabolic strategies across sites consistent with differences in major geochemical attributes (e.g., sulfide, oxygen, pH. These observations provide an ecological basis for understanding the distribution of indigenous archaeal lineages across high temperature systems of YNP.

  7. Phylogenetic and Functional Analysis of Metagenome Sequence from High-Temperature Archaeal Habitats Demonstrate Linkages between Metabolic Potential and Geochemistry.

    Science.gov (United States)

    Inskeep, William P; Jay, Zackary J; Herrgard, Markus J; Kozubal, Mark A; Rusch, Douglas B; Tringe, Susannah G; Macur, Richard E; Jennings, Ryan deM; Boyd, Eric S; Spear, John R; Roberto, Francisco F

    2013-01-01

    Geothermal habitats in Yellowstone National Park (YNP) provide an unparalleled opportunity to understand the environmental factors that control the distribution of archaea in thermal habitats. Here we describe, analyze, and synthesize metagenomic and geochemical data collected from seven high-temperature sites that contain microbial communities dominated by archaea relative to bacteria. The specific objectives of the study were to use metagenome sequencing to determine the structure and functional capacity of thermophilic archaeal-dominated microbial communities across a pH range from 2.5 to 6.4 and to discuss specific examples where the metabolic potential correlated with measured environmental parameters and geochemical processes occurring in situ. Random shotgun metagenome sequence (∼40-45 Mb Sanger sequencing per site) was obtained from environmental DNA extracted from high-temperature sediments and/or microbial mats and subjected to numerous phylogenetic and functional analyses. Analysis of individual sequences (e.g., MEGAN and G + C content) and assemblies from each habitat type revealed the presence of dominant archaeal populations in all environments, 10 of whose genomes were largely reconstructed from the sequence data. Analysis of protein family occurrence, particularly of those involved in energy conservation, electron transport, and autotrophic metabolism, revealed significant differences in metabolic strategies across sites consistent with differences in major geochemical attributes (e.g., sulfide, oxygen, pH). These observations provide an ecological basis for understanding the distribution of indigenous archaeal lineages across high-temperature systems of YNP.

  8. Metagenomic sequencing complements routine diagnostics in identifying viral pathogens in lung transplant recipients with unknown etiology of respiratory infection.

    Science.gov (United States)

    Lewandowska, Dagmara W; Schreiber, Peter W; Schuurmans, Macé M; Ruehe, Bettina; Zagordi, Osvaldo; Bayard, Cornelia; Greiner, Michael; Geissberger, Fabienne D; Capaul, Riccarda; Zbinden, Andrea; Böni, Jürg; Benden, Christian; Mueller, Nicolas J; Trkola, Alexandra; Huber, Michael

    2017-01-01

    Lung transplant patients are a vulnerable group of immunosuppressed patients that are prone to frequent respiratory infections. We studied 60 episodes of respiratory symptoms in 71 lung transplant patients. Almost half of these episodes were of unknown infectious etiology despite extensive routine diagnostic testing. We re-analyzed respiratory samples of all episodes with undetermined etiology in order to detect potential viral pathogens missed/not accounted for in routine diagnostics. Respiratory samples were enriched for viruses by filtration and nuclease digestion, whole nucleic acids extracted and randomly amplified before high throughput metagenomic virus sequencing. Viruses were identified by a bioinformatic pipeline and confirmed and quantified using specific real-time PCR. In completion of routine diagnostics, we identified and confirmed a viral etiology of infection by our metagenomic approach in four patients (three Rhinovirus A, one Rhinovirus B infection) despite initial negative results in specific multiplex PCR. Notably, the majority of samples were also positive for Torque teno virus (TTV) and Human Herpesvirus 7 (HHV-7). While TTV viral loads increased with immunosuppression in both throat swabs and blood samples, HHV-7 remained at low levels throughout the observation period and was restricted to the respiratory tract. This study highlights the potential of metagenomic sequencing for virus diagnostics in cases with previously unknown etiology of infection and in complex diagnostic situations such as in immunocompromised hosts.

  9. RNA viral metagenome of whiteflies leads to the discovery and characterization of a whitefly-transmitted carlavirus in North America.

    Directory of Open Access Journals (Sweden)

    Karyna Rosario

    Full Text Available Whiteflies from the Bemisia tabaci species complex have the ability to transmit a large number of plant viruses and are some of the most detrimental pests in agriculture. Although whiteflies are known to transmit both DNA and RNA viruses, most of the diversity has been recorded for the former, specifically for the Begomovirus genus. This study investigated the total diversity of DNA and RNA viruses found in whiteflies collected from a single site in Florida to evaluate if there are additional, previously undetected viral types within the B. tabaci vector. Metagenomic analysis of viral DNA extracted from the whiteflies only resulted in the detection of begomoviruses. In contrast, whiteflies contained sequences similar to RNA viruses from divergent groups, with a diversity that extends beyond currently described viruses. The metagenomic analysis of whiteflies also led to the first report of a whitefly-transmitted RNA virus similar to Cowpea mild mottle virus (CpMMV Florida (genus Carlavirus in North America. Further investigation resulted in the detection of CpMMV Florida in native and cultivated plants growing near the original field site of whitefly collection and determination of its experimental host range. Analysis of complete CpMMV Florida genomes recovered from whiteflies and plants suggests that the current classification criteria for carlaviruses need to be reevaluated. Overall, metagenomic analysis supports that DNA plant viruses carried by B. tabaci are dominated by begomoviruses, whereas significantly less is known about RNA viruses present in this damaging insect vector.

  10. Text mining tools for extracting information about microbial biodiversity in food

    OpenAIRE

    Deleger, Louise; Bossy, Robert; Nédellec, Claire

    2017-01-01

    Introduction Information on food microbial biodiversity is scattered across millions of scientific papers (2 million references in the PubMed bibliographic database in 2017). It is impossible to manually achieve an exhaustive analysis of these documents. Text-mining and knowledge engineering methods can assist the researcher in finding relevant information. Material & Methods We propose to study bacterial biodiversity using text-mining tools from the Alvis platform. First, w...

  11. Investigation of the Impact of Extracting and Exchanging Health Information by Using Internet and Social Networks.

    Science.gov (United States)

    Pistolis, John; Zimeras, Stelios; Chardalias, Kostas; Roupa, Zoe; Fildisis, George; Diomidous, Marianna

    2016-06-01

    Social networks (1) have been embedded in our daily life for a long time. They constitute a powerful tool used nowadays for both searching and exchanging information on different issues by using Internet searching engines (Google, Bing, etc.) and Social Networks (Facebook, Twitter etc.). In this paper, are presented the results of a research based on the frequency and the type of the usage of the Internet and the Social Networks by the general public and the health professionals. The objectives of the research were focused on the investigation of the frequency of seeking and meticulously searching for health information in the social media by both individuals and health practitioners. The exchanging of information is a procedure that involves the issues of reliability and quality of information. In this research, by using advanced statistical techniques an effort is made to investigate the participant's profile in using social networks for searching and exchanging information on health issues. Based on the answers 93 % of the people, use the Internet to find information on health-subjects. Considering principal component analysis, the most important health subjects were nutrition (0.719 %), respiratory issues (0.79 %), cardiological issues (0.777%), psychological issues (0.667%) and total (73.8%). The research results, based on different statistical techniques revealed that the 61.2% of the males and 56.4% of the females intended to use the social networks for searching medical information. Based on the principal components analysis, the most important sources that the participants mentioned, were the use of the Internet and social networks for exchanging information on health issues. These sources proved to be of paramount importance to the participants of the study. The same holds for nursing, medical and administrative staff in hospitals.

  12. Investigation of Microbial Diversity in Geothermal Hot Springs in Unkeshwar, India, Based on 16S rRNA Amplicon Metagenome Sequencing

    OpenAIRE

    Mehetre, Gajanan T.; Paranjpe, Aditi; Dastager, Syed G.; Dharne, Mahesh S.

    2016-01-01

    Microbial diversity in geothermal waters of the Unkeshwar hot springs in Maharashtra, India, was studied using 16S rRNA amplicon metagenomic sequencing. Taxonomic analysis revealed the presence of Bacteroidetes, Proteobacteria, Cyanobacteria, Actinobacteria, Archeae, and OD1 phyla. Metabolic function prediction analysis indicated a battery of biological information systems indicating rich and novel microbial diversity, with potential biotechnological applications in this niche.

  13. Using metagenomics to show the efficacy of forest restoration in the New Jersey Pine Barrens.

    Science.gov (United States)

    Eaton, William D; Shokralla, Shadi; McGee, Kathleen M; Hajibabaei, Mehrdad

    2017-10-01

    The Franklin Parker Preserve within the New Jersey Pine Barrens contains 5000 acres of wetlands habitat, including old-growth Atlantic white cedar (or AWC; Chamaecyparis thyoides) swamps, cranberry bogs, and former cranberry bogs undergoing restoration into AWC forests. This study showed that the C-use efficiency was greater in the old-growth AWC soils than in soils from 8-year-old mid-stage restored AWC stands, which were greater than found in soil from 4-year-old AWC stands-the latter two stands being restored from long-term cranberry bogs. A metagenomic analysis of eDNA extracted from these soils showed that the C-cycle trends were associated with increases in the relative numbers of DNA sequences from several copiotrophic bacterial groups (Bacteroidetes and Proteobacteria), complex C-decomposing fungal groups (Sordiomycetes, Mortierellales, and Thelephorales), and collembolan and formicid invertebrates. All groups are indicators of successionally more advanced soils, and critical for soil C-cycle activities. These data suggest that the restoration activities studied are enhancing critical guilds of soil biota, and increasing C-use efficiency in the soils of restored habitats, and that the use of metagenomic analysis of soil eDNA can be used in the development of assessment models for soil recovery of wetlands following restoration.

  14. Metagenomic Survey of Viral Diversity Obtained from Feces of Subantarctic and South American Fur Seals.

    Directory of Open Access Journals (Sweden)

    Mariana Kluge

    Full Text Available The Brazilian South coast seasonally hosts numerous marine species, observed particularly during winter months. Some animals, including fur seals, are found dead or debilitated along the shore and may harbor potential pathogens within their microbiota. In the present study, a metagenomic approach was performed to evaluate the viral diversity in feces of fur seals found deceased along the coast of the state of Rio Grande do Sul. The fecal virome of two fur seal species was characterized: the South American fur seal (Arctocephalus australis and the Subantarctic fur seal (Arctocephalus tropicalis. Fecal samples from 10 specimens (A. australis, n = 5; A. tropicalis, n = 5 were collected and viral particles were purified, extracted and amplified with a random PCR. The products were sequenced through Ion Torrent and Illumina platforms and assembled reads were submitted to BLASTx searches. Both viromes were dominated by bacteriophages and included a number of potentially novel virus genomes. Sequences of picobirnaviruses, picornaviruses and a hepevirus-like were identified in A. australis. A rotavirus related to group C, a novel member of the Sakobuvirus and a sapovirus very similar to California sea lion sapovirus 1 were found in A. tropicalis. Additionally, sequences of members of the Anelloviridae and Parvoviridae families were detected in both fur seal species. This is the first metagenomic study to screen the fecal virome of fur seals, contributing to a better understanding of the complexity of the viral community present in the intestinal microbiota of these animals.

  15. Combination of metagenomics and culture-based methods to study the interaction between ochratoxin a and gut microbiota.

    Science.gov (United States)

    Guo, Mingzhang; Huang, Kunlun; Chen, Siyuan; Qi, Xiaozhe; He, Xiaoyun; Cheng, Wen-Hsing; Luo, Yunbo; Xia, Kai; Xu, Wentao

    2014-09-01

    Gut microbiota represent an important bridge between environmental substances and host metabolism. Here we reported a comprehensive study of gut microbiota interaction with ochratoxin A (OTA), a major food-contaminating mycotoxin, using the combination of metagenomics and culture-based methods. Rats were given OTA (0, 70, or 210 μg/kg body weight) by gavage and fecal samples were collected at day 0 and day 28. Bacterial genomic DNA was extracted from the fecal samples and both 16S rRNA and shotgun sequencing (two main methods of metagenomics) were performed. The results indicated OTA treatment decreased the within-subject diversity of the gut microbiota, and the relative abundance of Lactobacillus increased considerably. Changes in functional genes of gut microbiota including signal transduction, carbohydrate transport, transposase, amino acid transport system, and mismatch repair were observed. To further understand the biological sense of increased Lactobacillus, Lactobacillus selective medium was used to isolate Lactobacillus species from fecal samples, and a strain with 99.8% 16S rRNA similarity with Lactobacillus plantarum strain PFK2 was obtained. Thin-layer chromatography showed that this strain could absorb but not degrade OTA, which was in agreement with the result in metagenomics that no genes related to OTA degradation increased. In conclusion, combination of metagenomics and culture-based methods can be a new strategy to study intestinal toxicity of toxins and find applicable bacterial strains for detoxification. When it comes to OTA, this kind of mycotoxin can cause compositional and functional changes of gut microbiota, and Lactobacillus are key genus to detoxify OTA in vivo. © The Author 2014. Published by Oxford University Press on behalf of the Society of Toxicology. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  16. Gene-Based Pathogen Detection: Can We Use qPCR to Predict the Outcome of Diagnostic Metagenomics?

    Directory of Open Access Journals (Sweden)

    Sandra Christine Andersen

    2017-11-01

    Full Text Available In microbial food safety, molecular methods such as quantitative PCR (qPCR and next-generation sequencing (NGS of bacterial isolates can potentially be replaced by diagnostic shotgun metagenomics. However, the methods for pre-analytical sample preparation are often optimized for qPCR, and do not necessarily perform equally well for qPCR and sequencing. The present study investigates, through screening of methods, whether qPCR can be used as an indicator for the optimization of sample preparation for NGS-based shotgun metagenomics with a diagnostic focus. This was used on human fecal samples spiked with 103 or 106 colony-forming units (CFU/g Campylobacter jejuni, as well as porcine fecal samples spiked with 103 or 106 CFU/g Salmonella typhimurium. DNA was extracted from the samples using variations of two widely used kits. The following quality parameters were measured: DNA concentration, qPCR, DNA fragmentation during library preparation, amount of DNA available for sequencing, amount of sequencing data, distribution of data between samples in a batch, and data insert size; none showed any correlation with the target ratio of the spiking organism detected in sequencing data. Surprisingly, diagnostic metagenomics can have better detection sensitivity than qPCR for samples spiked with 103 CFU/g C. jejuni. The study also showed that qPCR and sequencing results may be different due to inhibition in one of the methods. In conclusion, qPCR cannot uncritically be used as an indicator for the optimization of sample preparation for diagnostic metagenomics.

  17. Functional Metagenomic Investigations of the Human Intestinal Microbiota

    DEFF Research Database (Denmark)

    Moore, Aimee M.; Munck, Christian; Sommer, Morten Otto Alexander

    2011-01-01

    The human intestinal microbiota encode multiple critical functions impacting human health, including metabolism of dietary substrate, prevention of pathogen invasion, immune system modulation, and provision of a reservoir of antibiotic resistance genes accessible to pathogens. The complexity...... microorganisms, but relatively recently applied to the study of the human commensal microbiota. Metagenomic functional screens characterize the functional capacity of a microbial community, independent of identity to known genes, by subjecting the metagenome to functional assays in a genetically tractable host....... Here we highlight recent work applying this technique to study the functional diversity of the intestinal microbiota, and discuss how an approach combining high-throughput sequencing, cultivation, and metagenomic functional screens can improve our understanding of interactions between this complex...

  18. The potential of viral metagenomics in blood transfusion safety.

    Science.gov (United States)

    Sauvage, V; Gomez, J; Boizeau, L; Laperche, S

    2017-09-01

    Thanks to the significant advent of high throughput sequencing in the last ten years, it is now possible via metagenomics to define the spectrum of the microbial sequences present in human blood samples. Therefore, metagenomics sequencing appears as a promising approach for the identification and global surveillance of new, emerging and/or unexpected viruses that could impair blood transfusion safety. However, despite considerable advantages compared to the traditional methods of pathogen identification, this non-targeted approach presents several drawbacks including a lack of sensitivity and sequence contaminant issues. With further improvements, especially to increase sensitivity, metagenomics sequencing should become in a near future an additional diagnostic tool in infectious disease field and especially in blood transfusion safety. Copyright © 2017 Elsevier Masson SAS. All rights reserved.

  19. deFUME: Dynamic exploration of functional metagenomic sequencing data

    DEFF Research Database (Denmark)

    van der Helm, Eric; Geertz-Hansen, Henrik Marcus; Genee, Hans Jasper

    2015-01-01

    Functional metagenomic selections represent a powerful technique that is widely applied for identification of novel genes from complex metagenomic sources. However, whereas hundreds to thousands of clones can be easily generated and sequenced over a few days of experiments, analyzing the data...... is time consuming and constitutes a major bottleneck for experimental researchers in the field. Here we present the deFUME web server, an easy-to-use web-based interface for processing, annotation and visualization of functional metagenomics sequencing data, tailored to meet the requirements of non......-bioinformaticians. The web-server integrates multiple analysis steps into one single workflow: read assembly, open reading frame prediction, and annotation with BLAST, InterPro and GO classifiers. Analysis results are visualized in an online dynamic web-interface. The deFUME webserver provides a fast track from raw sequence...

  20. SLIMM: species level identification of microorganisms from metagenomes

    Directory of Open Access Journals (Sweden)

    Temesgen Hailemariam Dadi

    2017-03-01

    Full Text Available Identification and quantification of microorganisms is a significant step in studying the alpha and beta diversities within and between microbial communities respectively. Both identification and quantification of a given microbial community can be carried out using whole genome shotgun sequences with less bias than when using 16S-rDNA sequences. However, shared regions of DNA among reference genomes and taxonomic units pose a significant challenge in assigning reads correctly to their true origins. The existing microbial community profiling tools commonly deal with this problem by either preparing signature-based unique references or assigning an ambiguous read to its least common ancestor in a taxonomic tree. The former method is limited to making use of the reads which can be mapped to the curated regions, while the latter suffer from the lack of uniquely mapped reads at lower (more specific taxonomic ranks. Moreover, even if the tools exhibited good performance in calling the organisms present in a sample, there is still room for improvement in determining the correct relative abundance of the organisms. We present a new method Species Level Identification of Microorganisms from Metagenomes (SLIMM which addresses the above issues by using coverage information of reference genomes to remove unlikely genomes from the analysis and subsequently gain more uniquely mapped reads to assign at lower ranks of a taxonomic tree. SLIMM is based on a few, seemingly easy steps which when combined create a tool that outperforms state-of-the-art tools in run-time and memory usage while being on par or better in computing quantitative and qualitative information at species-level.

  1. Building Recognition on Subregion’s Multiscale Gist Feature Extraction and Corresponding Columns Information Based Dimensionality Reduction

    Directory of Open Access Journals (Sweden)

    Bin Li

    2014-01-01

    Full Text Available In this paper, we proposed a new building recognition method named subregion’s multiscale gist feature (SM-gist extraction and corresponding columns information based dimensionality reduction (CCI-DR. Our proposed building recognition method is presented as a two-stage model: in the first stage, a building image is divided into 4 × 5 subregions, and gist vectors are extracted from these regions individually. Then, we combine these gist vectors into a matrix with relatively high dimensions. In the second stage, we proposed CCI-DR to project the high dimensional manifold matrix to low dimensional subspace. Compared with the previous building recognition method the advantages of our proposed method are that (1 gist features extracted by SM-gist have the ability to adapt to nonuniform illumination and that (2 CCI-DR can address the limitation of traditional dimensionality reduction methods, which convert gist matrices into vectors and thus mix the corresponding gist vectors from different feature maps. Our building recognition method is evaluated on the Sheffield buildings database, and experiments show that our method can achieve satisfactory performance.

  2. Advanced feature extraction in remote sensing using artificial intelligence and geographic information systems

    Science.gov (United States)

    Estes, John E.; Friedl, Mark A.; Star, Jeffrey L.

    1988-01-01

    Traditional computer-assisted image-analysis techniques in remote sensing lag well behind human abilities in terms of both speed and accuracy. A fundamental limitation of computer-assisted techniques is their inability to assimilate a variety of different data types leading to an interpretation in a manner similar to human image interpretation. Expert systems and computer-vision techniques are proposed as a potential solution to these limitations. Some aspects of human expertise in image analysis may be codified into expert systems. Image understanding and symbolic reasoning provide a means of assimilating spatial information and spatial reasoning into the analysis procedure. Knowledge-based image-analysis systems incorporate many of these concepts and have been implemented for some well defined problem domains. Geographic information systems represent an excellent environment for this type of analysis, providing both analytic tools and contextual information to the analysis procedure.

  3. Multi-Layer and Recursive Neural Networks for Metagenomic Classification.

    Science.gov (United States)

    Ditzler, Gregory; Polikar, Robi; Rosen, Gail

    2015-09-01

    Recent advances in machine learning, specifically in deep learning with neural networks, has made a profound impact on fields such as natural language processing, image classification, and language modeling; however, feasibility and potential benefits of the approaches to metagenomic data analysis has been largely under-explored. Deep learning exploits many layers of learning nonlinear feature representations, typically in an unsupervised fashion, and recent results have shown outstanding generalization performance on previously unseen data. Furthermore, some deep learning methods can also represent the structure in a data set. Consequently, deep learning and neural networks may prove to be an appropriate approach for metagenomic data. To determine whether such approaches are indeed appropriate for metagenomics, we experiment with two deep learning methods: i) a deep belief network, and ii) a recursive neural network, the latter of which provides a tree representing the structure of the data. We compare these approaches to the standard multi-layer perceptron, which has been well-established in the machine learning community as a powerful prediction algorithm, though its presence is largely missing in metagenomics literature. We find that traditional neural networks can be quite powerful classifiers on metagenomic data compared to baseline methods, such as random forests. On the other hand, while the deep learning approaches did not result in improvements to the classification accuracy, they do provide the ability to learn hierarchical representations of a data set that standard classification methods do not allow. Our goal in this effort is not to determine the best algorithm in terms accuracy-as that depends on the specific application-but rather to highlight the benefits and drawbacks of each of the approach we discuss and provide insight on how they can be improved for predictive metagenomic analysis.

  4. The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata

    Energy Technology Data Exchange (ETDEWEB)

    Fenner, Marsha W; Liolios, Konstantinos; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Kyrpides, Nikos C.

    2007-12-31

    The Genomes On Line Database (GOLD) is a comprehensive resource of information for genome and metagenome projects world-wide. GOLD provides access to complete and ongoing projects and their associated metadata through pre-computed lists and a search page. The database currently incorporates information for more than 2900 sequencing projects, of which 639 have been completed and the data deposited in the public databases. GOLD is constantly expanding to provide metadata information related to the project and the organism and is compliant with the Minimum Information about a Genome Sequence (MIGS) specifications.

  5. Intelligent multimedia indexing and retrieval through multi-source information extraction and merging

    NARCIS (Netherlands)

    Kuper, Jan; Saggion, H.; Cunningham, H.; Declerck, T.; de Jong, Franciska M.G.; Reidsma, Dennis; Wilks, Y.; Wittenburg, P.

    This paper reports work on automated meta-data creation for multimedia content. The approach results in the generation of a conceptual index of the content which may then be searched via semantic categories instead of keywords. The novelty of the work is to exploit multiple sources of information

  6. What do Professional Forecasters' Stock Market Expectations Tell Us about Herding, Information Extraction and Beauty Contests?

    DEFF Research Database (Denmark)

    Rangvid, Jesper; Schmeling, Maik; Schrimpf, Andreas

    2013-01-01

    We study how professional forecasters form equity market expectations based on a new micro-level dataset which includes rich cross-sectional information about individual characteristics. We focus on testing whether agents rely on the beliefs of others, i.e., consensus expectations, when forming...

  7. Graph-Based Weakly-Supervised Methods for Information Extraction & Integration

    Science.gov (United States)

    Talukdar, Partha Pratim

    2010-01-01

    The variety and complexity of potentially-related data resources available for querying--webpages, databases, data warehouses--has been growing ever more rapidly. There is a growing need to pose integrative queries "across" multiple such sources, exploiting foreign keys and other means of interlinking data to merge information from diverse…

  8. Extracting additional risk managers information from a risk assessment of Listeria monocytogenes in deli meats

    NARCIS (Netherlands)

    Pérez-Rodríguez, F.; Asselt, van E.D.; García-Gimeno, R.M.; Zurera, G.; Zwietering, M.H.

    2007-01-01

    The risk assessment study of Listeria monocytogenes in ready-to-eat foods conducted by the U.S. Food and Drug Administration is an example of an extensive quantitative microbiological risk assessment that could be used by risk analysts and other scientists to obtain information and by managers and

  9. The Promise of Information and Communication Technology in Healthcare: Extracting Value From the Chaos.

    Science.gov (United States)

    Mamlin, Burke W; Tierney, William M

    2016-01-01

    Healthcare is an information business with expanding use of information and communication technologies (ICTs). Current ICT tools are immature, but a brighter future looms. We examine 7 areas of ICT in healthcare: electronic health records (EHRs), health information exchange (HIE), patient portals, telemedicine, social media, mobile devices and wearable sensors and monitors, and privacy and security. In each of these areas, we examine the current status and future promise, highlighting how each might reach its promise. Steps to better EHRs include a universal programming interface, universal patient identifiers, improved documentation and improved data analysis. HIEs require federal subsidies for sustainability and support from EHR vendors, targeting seamless sharing of EHR data. Patient portals must bring patients into the EHR with better design and training, greater provider engagement and leveraging HIEs. Telemedicine needs sustainable payment models, clear rules of engagement, quality measures and monitoring. Social media needs consensus on rules of engagement for providers, better data mining tools and approaches to counter disinformation. Mobile and wearable devices benefit from a universal programming interface, improved infrastructure, more rigorous research and integration with EHRs and HIEs. Laws for privacy and security need updating to match current technologies, and data stewards should share information on breaches and standardize best practices. ICT tools are evolving quickly in healthcare and require a rational and well-funded national agenda for development, use and assessment. Copyright © 2016 Southern Society for Clinical Investigation. Published by Elsevier Inc. All rights reserved.

  10. Metagenomic Approaches to Assess Bacteriophages in Various Environmental Niches.

    Science.gov (United States)

    Hayes, Stephen; Mahony, Jennifer; Nauta, Arjen; van Sinderen, Douwe

    2017-05-24

    Bacteriophages are ubiquitous and numerous parasites of bacteria and play a critical evolutionary role in virtually every ecosystem, yet our understanding of the extent of the diversity and role of phages remains inadequate for many ecological niches, particularly in cases in which the host is unculturable. During the past 15 years, the emergence of the field of viral metagenomics has drastically enhanced our ability to analyse the so-called viral 'dark matter' of the biosphere. Here, we review the evolution of viral metagenomic methodologies, as well as providing an overview of some of the most significant applications and findings in this field of research.

  11. Surveillance of Foodborne Pathogens: Towards Diagnostic Metagenomics of Fecal Samples

    DEFF Research Database (Denmark)

    Andersen, Sandra Christine; Hoorfar, Jeffrey

    2018-01-01

    Diagnostic metagenomics is a rapidly evolving laboratory tool for culture-independent tracing of foodborne pathogens. The method has the potential to become a generic platform for detection of most pathogens and many sample types. Today, however, it is still at an early and experimental stage...... for data analysis are being developed, and several studies applying diagnostic metagenomics to human clinical samples have been published, detecting, and sometimes, typing bacterial infections. It is possible to obtain a draft genome of the pathogen and to develop methods that can theoretically be applied...... in fecal samples from animals and humans....

  12. Analysis Methods for Extracting Knowledge from Large-Scale WiFi Monitoring to Inform Building Facility Planning

    DEFF Research Database (Denmark)

    Ruiz-Ruiz, Antonio; Blunck, Henrik; Prentow, Thor Siiger

    2014-01-01

    The optimization of logistics in large building com- plexes with many resources, such as hospitals, require realistic facility management and planning. Current planning practices rely foremost on manual observations or coarse unverified as- sumptions and therefore do not properly scale or provide...... realistic data to inform facility planning. In this paper, we propose analysis methods to extract knowledge from large sets of network collected WiFi traces to better inform facility management and planning in large building complexes. The analysis methods, which build on a rich set of temporal and spatial...... features, include methods for noise removal, e.g., labeling of beyond building-perimeter devices, and methods for quantification of area densities and flows, e.g., building enter and exit events, and for classifying the behavior of people, e.g., into user roles such as visitor, hospitalized or employee...

  13. Adaptive extraction of emotion-related EEG segments using multidimensional directed information in time-frequency domain.

    Science.gov (United States)

    Petrantonakis, Panagiotis C; Hadjileontiadis, Leontios J

    2010-01-01

    Emotion discrimination from electroencephalogram (EEG) has gained attention the last decade as a user-friendly and effective approach to EEG-based emotion recognition (EEG-ER) systems. Nevertheless, challenging issues regarding the emotion elicitation procedure, especially its effectiveness, raise. In this work, a novel method, which not only evaluates the degree of emotion elicitation but localizes the emotion information in the time-frequency domain, as well, is proposed. The latter, incorporates multidimensional directed information at the time-frequency EEG representation, extracted using empirical mode decomposition, and introduces an asymmetry index for adaptive emotion-related EEG segment selection. Experimental results derived from 16 subjects visually stimulated with pictures from the valence/arousal space drawn from the International Affective Picture System database, justify the effectiveness of the proposed approach and its potential contribution to the enhancement of EEG-ER systems.

  14. Inexperienced clinicians can extract pathoanatomic information from MRI narrative reports with high reproducability for use in research/quality assurance

    DEFF Research Database (Denmark)

    Kent, Peter; Briggs, Andrew M; Albert, Hanne Birgit

    2011-01-01

    and transforming that information into quantitative data. However, this process is frequently required in research and quality assurance contexts. The purpose of this study was to examine inter-rater reproducibility (agreement and reliability) among an inexperienced group of clinicians in extracting spinal...... of radiological training is not required in order to transform MRI-derived pathoanatomic information from a narrative format to a quantitative format with high reproducibility for research or quality assurance purposes....... a categorical electronic coding matrix. Decision rules were developed after initial coding in an effort to resolve ambiguities in narrative reports. This process was repeated a further three times using separate samples of 20 MRI reports until no further ambiguities were identified (total n=80). Reproducibility...

  15. Extracting Feature Information and its Visualization Based on the Characteristic Defect Octave Frequencies in a Rolling Element Bearing

    Directory of Open Access Journals (Sweden)

    Jianyu Lei

    2007-10-01

    Full Text Available Monitoring the condition of rolling element bearings and defect diagnosis has received considerable attention for many years because the majority of problems in rotating machines are caused by defective bearings. In order to monitor conditions and diagnose defects in a rolling element bearing, a new approach is developed, based on the characteristic defect octave frequencies. The characteristic defect frequencies make it possible to detect the presence of a defect and diagnose in what part of the bearing the defect appears. However, because the characteristic defect frequencies vary with rotational speed, it is difficult to extract feature information from data at variable rotational speeds. In this paper, the characteristic defect octave frequencies, which do not vary with rotation speed, are introduced to replace the characteristic defect frequencies. Therefore feature information can be easily extracted. Moreover, based on characteristic defect octave frequencies, an envelope spectrum array, which associates 3-D visualization technology with extremum envelope spectrum technology, is established. This method has great advantages in acquiring the characteristics and trends of the data and achieves a straightforward and creditable result.

  16. Extraction of indirectly captured information for use in a comparison of offline pH measurement technologies.

    Science.gov (United States)

    Ritchie, Elspeth K; Martin, Elaine B; Racher, Andy; Jaques, Colin

    2017-06-10

    Understanding the causes of discrepancies in pH readings of a sample can allow more robust pH control strategies to be implemented. It was found that 59.4% of differences between two offline pH measurement technologies for an historical dataset lay outside an expected instrument error range of ±0.02pH. A new variable, Osmo Res , was created using multiple linear regression (MLR) to extract information indirectly captured in the recorded measurements for osmolality. Principal component analysis and time series analysis were used to validate the expansion of the historical dataset with the new variable Osmo Res . MLR was used to identify variables strongly correlated (p<0.05) with differences in pH readings by the two offline pH measurement technologies. These included concentrations of specific chemicals (e.g. glucose) and Osmo Res, indicating culture medium and bolus feed additions as possible causes of discrepancies between the offline pH measurement technologies. Temperature was also identified as statistically significant. It is suggested that this was a result of differences in pH-temperature compensations employed by the pH measurement technologies. In summary, a method for extracting indirectly captured information has been demonstrated, and it has been shown that competing pH measurement technologies were not necessarily interchangeable at the desired level of control (±0.02pH). Copyright © 2017 Elsevier B.V. All rights reserved.

  17. Meeting Report: “Metagenomics, Metadata and Meta-analysis” (M3) Special Interest Group at ISMB 2009

    Science.gov (United States)

    Field, Dawn; Friedberg, Iddo; Sterk, Peter; Kottmann, Renzo; Glöckner, Frank Oliver; Hirschman, Lynette; Garrity, George M.; Cochrane, Guy; Wooley, John; Gilbert, Jack

    2009-01-01

    This report summarizes the proceedings of the “Metagenomics, Metadata and Meta-analysis” (M3) Special Interest Group (SIG) meeting held at the Intelligent Systems for Molecular Biology 2009 conference. The Genomic Standards Consortium (GSC) hosted this meeting to explore the bottlenecks and emerging solutions for obtaining biological insights through large-scale comparative analysis of metagenomic datasets. The M3 SIG included 16 talks, half of which were selected from submitted abstracts, a poster session and a panel discussion involving members of the GSC Board. This report summarizes this one-day SIG, attempts to identify shared themes and recapitulates community recommendations for the future of this field. The GSC will also host an M3 workshop at the Pacific Symposium on Biocomputing (PSB) in January 2010. Further information about the GSC and its range of activities can be found at http://gensc.org/. PMID:21304668

  18. You had me at "Hello": Rapid extraction of dialect information from spoken words.

    Science.gov (United States)

    Scharinger, Mathias; Monahan, Philip J; Idsardi, William J

    2011-06-15

    Research on the neuronal underpinnings of speaker identity recognition has identified voice-selective areas in the human brain with evolutionary homologues in non-human primates who have comparable areas for processing species-specific calls. Most studies have focused on estimating the extent and location of these areas. In contrast, relatively few experiments have investigated the time-course of speaker identity, and in particular, dialect processing and identification by electro- or neuromagnetic means. We show here that dialect extraction occurs speaker-independently, pre-attentively and categorically. We used Standard American English and African-American English exemplars of 'Hello' in a magnetoencephalographic (MEG) Mismatch Negativity (MMN) experiment. The MMN as an automatic change detection response of the brain reflected dialect differences that were not entirely reducible to acoustic differences between the pronunciations of 'Hello'. Source analyses of the M100, an auditory evoked response to the vowels suggested additional processing in voice-selective areas whenever a dialect change was detected. These findings are not only relevant for the cognitive neuroscience of language, but also for the social sciences concerned with dialect and race perception. Copyright © 2011 Elsevier Inc. All rights reserved.

  19. Singularity analysis of multisource geodatasets for information extraction and integration for mineral prospecting

    Science.gov (United States)

    Zhao, Jie

    2015-04-01

    Local singularity analysis (LSA) developed in the context of fractal and multifractals has been utilized as an efficient way to characterize non-linear geological phenomenon like flood, landslide, tectonic evolution, etc. In the past decade, this technique has been successfully applied for mineral exploration in many areas for various types of mineral deposits. Mineralization as a cascade geo-process is associated with multiple geological processes including tectono-magmatism, sedimentation, and metamorphisms which often depict non-linear properties Proper description of spatial distributions of these geological bodies can benefit mineral exploration. Current study is about identification of geological bodies in eastern Tianshan mineral district, a Gobi-desert area with thick over-burden. LSA technique was utilized to identify weak anomalies associated with mineralization from both weak and strong background. In addition to geochemical data, this study employs LSA to analyze geological and geophysical data for extracting geo-anomalies for mapping spatial distributions of felsic intrusions, intermediate-mafic volcanic strata, and faults that are associated with iron mineralization. Furthermore, these diverse geo-anomalies are integrated for delineating targets for iron mineral deposits in the covered areas.

  20. Contribution of exogenous genetic elements to the group A Streptococcus metagenome.

    Directory of Open Access Journals (Sweden)

    Stephen B Beres

    2007-08-01

    provide new information about the GAS metagenome and will assist studies of pathogenesis, antimicrobial resistance, and population genomics.

  1. An artificial functional family filter in homolog searching in next-generation sequencing metagenomics.

    Directory of Open Access Journals (Sweden)

    Ruofei Du

    Full Text Available In functional metagenomics, BLAST homology search is a common method to classify metagenomic reads into protein/domain sequence families such as Clusters of Orthologous Groups of proteins (COGs in order to quantify the abundance of each COG in the community. The resulting functional profile of the community is then used in downstream analysis to correlate the change in abundance to environmental perturbation, clinical variation, and so on. However, the short read length coupled with next-generation sequencing technologies poses a barrier in this approach, essentially because similarity significance cannot be discerned by searching with short reads. Consequently, artificial functional families are produced, in which those with a large number of reads assigned decreases the accuracy of functional profile dramatically. There is no method available to address this problem. We intended to fill this gap in this paper. We revealed that BLAST similarity scores of homologues for short reads from COG protein members coding sequences are distributed differently from the scores of those derived elsewhere. We showed that, by choosing an appropriate score cut-off, we are able to filter out most artificial families and simultaneously to preserve sufficient information in order to build the functional profile. We also showed that, by incorporated application of BLAST and RPS-BLAST, some artificial families with large read counts can be further identified after the score cutoff filtration. Evaluated on three experimental metagenomic datasets with different coverages, we found that the proposed method is robust against read coverage and consistently outperforms the other E-value cutoff methods currently used in literatures.

  2. Information extracting and processing with diffraction enhanced imaging of X-ray

    International Nuclear Information System (INIS)

    Chen Bo; Chinese Academy of Science, Beijing; Chen Chunchong; Jiang Fan; Chen Jie; Ming Hai; Shu Hang; Zhu Peiping; Wang Junyue; Yuan Qingxi; Wu Ziyu

    2006-01-01

    X-ray imaging at high energies has been used for many years in many fields. Conventional X-ray imaging is based on the different absorption within a sample. It is difficult to distinguish different tissues of a biological sample because of their small difference in absorption. The authors use the diffraction enhanced imaging (DEI) method. The authors took images of absorption, extinction, scattering and refractivity. In the end, the authors presented pictures of high resolution with all these information combined. (authors)

  3. CLASSIFICATION OF INFORMAL SETTLEMENTS THROUGH THE INTEGRATION OF 2D AND 3D FEATURES EXTRACTED FROM UAV DATA

    Directory of Open Access Journals (Sweden)

    C. M. Gevaert

    2016-06-01

    Full Text Available Unmanned Aerial Vehicles (UAVs are capable of providing very high resolution and up-to-date information to support informal settlement upgrading projects. In order to provide accurate basemaps, urban scene understanding through the identification and classification of buildings and terrain is imperative. However, common characteristics of informal settlements such as small, irregular buildings with heterogeneous roof material and large presence of clutter challenge state-of-the-art algorithms. Especially the dense buildings and steeply sloped terrain cause difficulties in identifying elevated objects. This work investigates how 2D radiometric and textural features, 2.5D topographic features, and 3D geometric features obtained from UAV imagery can be integrated to obtain a high classification accuracy in challenging classification problems for the analysis of informal settlements. It compares the utility of pixel-based and segment-based features obtained from an orthomosaic and DSM with point-based and segment-based features extracted from the point cloud to classify an unplanned settlement in Kigali, Rwanda. Findings show that the integration of 2D and 3D features leads to higher classification accuracies.

  4. Note on difference spectra for fast extraction of global image information.

    CSIR Research Space (South Africa)

    Van Wyk, BJ

    2007-06-01

    Full Text Available RESULTS The Difference Spectrum and c39c4cc51c46c48c51c57cb6c56c03 Linear Greyscale Pattern Spectrum [4] have been used to classify greyscale QuickBird satellite images over Soweto as formal suburbs or informal settlements. Since Vincent has... from selected Soweto suburbs, labelled by a built environment expert from the South African Centre for Scientific and Industrial Research, were equally divided into a training set and a test set. For all images N=M=200. Refer to Figures 1 and 2...

  5. High throughput whole rumen metagenome profiling using untargeted massively parallel sequencing

    Directory of Open Access Journals (Sweden)

    Ross Elizabeth M

    2012-07-01

    Full Text Available Abstract Background Variation of microorganism communities in the rumen of cattle (Bos taurus is of great interest because of possible links to economically or environmentally important traits, such as feed conversion efficiency or methane emission levels. The resolution of studies investigating this variation may be improved by utilizing untargeted massively parallel sequencing (MPS, that is, sequencing without targeted amplification of genes. The objective of this study was to develop a method which used MPS to generate “rumen metagenome profiles”, and to investigate if these profiles were repeatable among samples taken from the same cow. Given faecal samples are much easier to obtain than rumen fluid samples; we also investigated whether rumen metagenome profiles were predictive of faecal metagenome profiles. Results Rather than focusing on individual organisms within the rumen, our method used MPS data to generate quantitative rumen micro-biome profiles, regardless of taxonomic classifications. The method requires a previously assembled reference metagenome. A number of such reference metagenomes were considered, including two rumen derived metagenomes, a human faecal microflora metagenome and a reference metagenome made up of publically available prokaryote sequences. Sequence reads from each test sample were aligned to these references. The “rumen metagenome profile” was generated from the number of the reads that aligned to each contig in the database. We used this method to test the hypothesis that rumen fluid microbial community profiles vary more between cows than within multiple samples from the same cow. Rumen fluid samples were taken from three cows, at three locations within the rumen. DNA from the samples was sequenced on the Illumina GAIIx. When the reads were aligned to a rumen metagenome reference, the rumen metagenome profiles were repeatable (P  Conclusions We have presented a simple and high throughput method of

  6. Proteomics as the final step in the functional metagenomics study of antimicrobial resistance

    Directory of Open Access Journals (Sweden)

    Fiona eFouhy

    2015-03-01

    Full Text Available The majority of clinically applied antimicrobial agents are derived from natural products generated by soil microorganisms and therefore resistance is likely to be ubiquitous in such environments. This is supported by the fact that numerous clinically important resistance mechanisms are encoded within the chromosomes of such bacteria. Advances in genomic sequencing have enabled the in silico identification of putative resistance genes present in these microorganisms. However, it is not sufficient to rely on the identification of putative resistance genes, we must also determine if the resultant proteins confer a resistant phenotype. This will require an analysis pipeline that extends from the extraction of environmental DNA, to the identification and analysis of potential resistance genes and their resultant proteins and phenotypes. This review focuses on the application of functional metagenomics and proteomics to study antimicrobial resistance in diverse environments.

  7. EXTRACTION OF BENTHIC COVER INFORMATION FROM VIDEO TOWS AND PHOTOGRAPHS USING OBJECT-BASED IMAGE ANALYSIS

    Directory of Open Access Journals (Sweden)

    M. T. L. Estomata

    2012-07-01

    Full Text Available Mapping benthic cover in deep waters comprises a very small proportion of studies in the field of research. Majority of benthic cover mapping makes use of satellite images and usually, classification is carried out only for shallow waters. To map the seafloor in optically deep waters, underwater videos and photos are needed. Some researchers have applied this method on underwater photos, but made use of different classification methods such as: Neural Networks, and rapid classification via down sampling. In this study, accurate bathymetric data obtained using a multi-beam echo sounder (MBES was attempted to be used as complementary data with the underwater photographs. Due to the absence of a motion reference unit (MRU, which applies correction to the data gathered by the MBES, accuracy of the said depth data was compromised. Nevertheless, even with the absence of accurate bathymetric data, object-based image analysis (OBIA, which used rule sets based on information such as shape, size, area, relative distance, and spectral information, was still applied. Compared to pixel-based classifications, OBIA was able to classify more specific benthic cover types other than coral and sand, such as rubble and fish. Through the use of rule sets on area, less than or equal to 700 pixels for fish and between 700 to 10,000 pixels for rubble, as well as standard deviation values to distinguish texture, fish and rubble were identified. OBIA produced benthic cover maps that had higher overall accuracy, 93.78±0.85%, as compared to pixel-based methods that had an average accuracy of only 87.30±6.11% (p-value = 0.0001, α = 0.05.

  8. An algorithm for detecting eukaryotic sequences in metagenomic ...

    Indian Academy of Sciences (India)

    a BLAST search of all these sequences against a database containing sequences of a host genome (e.g. human genome) will take enormous amount of time and computing resources. In this article, we present a novel alignment-free algorithm, called Eu-Detect, that can detect eukaryotic sequences in metagenomic data ...

  9. Functional Metagenomic Investigations of the Human Intestinal Microbiota

    Directory of Open Access Journals (Sweden)

    Aimee Marguerite Moore

    2011-10-01

    Full Text Available The human intestinal microbiota encode multiple critical functions impacting human health, including, metabolism of dietary substrate, prevention of pathogen invasion, immune system modulation, and provision of a reservoir of antibiotic resistance genes accessible to pathogens. The complexity of this microbial community, its recalcitrance to standard cultivation and the immense diversity of its encoded genes has necessitated the development of novel molecular, microbiological, and genomic tools. Functional metagenomics is one such culture-independent technique used for decades to study environmental microorganisms but relatively recently applied to the study of the human commensal microbiota. Metagenomic functional screens characterize the functional capacity of a microbial community independent of identity to known genes by subjecting the metagenome to functional assays in a genetically tractable host. Here we highlight recent work applying this technique to study the functional diversity of the intestinal microbiota, and discuss how an approach combining high-throughput sequencing, cultivation, and metagenomic functional screens can improve our understanding of interactions between this complex community and its human host.

  10. An algorithm for detecting eukaryotic sequences in metagenomic ...

    Indian Academy of Sciences (India)

    species but also from accidental contamination from the genome of eukaryotic host cells. The latter scenario generally occurs in the case of host-associated metagenomes, e.g. microbes living in human gut. In such cases, one needs to identify and remove contaminating host DNA sequences, since the latter sequences will ...

  11. MetaPhinder-Identifying Bacteriophage Sequences in Metagenomic Data Sets

    DEFF Research Database (Denmark)

    Jurtz, Vanessa Isabell; Villarroel, Julia; Lund, Ole

    2016-01-01

    and understand them. Here we present MetaPhinder, a method to identify assembled genomic fragments (i.e. contigs) of phage origin in metage-nomic data sets. The method is based on a comparison to a database of whole genome bacteriophage sequences, integrating hits to multiple genomes to accomodate for the mosaic...

  12. Toward a standards-compliant genomic and metagenomic publication record.

    NARCIS (Netherlands)

    Garrity, G.M.; Field, D.; Kyrpides, N.; Hirschman, L.; Sansone, S.A.; Angiuoli, S.; Cole, J.R.; Glockner, F.O.; Kolker, E.; Kowalchuk, G.A.; Moran, M.A.; Ussery, D.; White, O.

    2008-01-01

    Increasingly, we are aware as a community of the growing need to manage the avalanche of genomic and metagenomic data, in addition to related data types like ribosomal RNA and barcode sequences, in a way that tightly integrates contextual data with traditional literature in a machine-readable way.

  13. Toward a Standards-Compliant Genomic and Metagenomic Publication Record

    NARCIS (Netherlands)

    Garrity, G.; Field, D.; Kyrpides, N.; Hirschman, L.; Sansone, S-A.; Angiuoli, S.V.; Cole, J.; Glöckner, F.O.; Kolker, E.; Kowalchuk, G.A.; Moran, M.A.; Ussery, D.; White, O.

    2008-01-01

    Increasingly, we are aware as a community of the growing need to manage the avalanche of genomic and metagenomic data, in addition to related data types like ribosomal RNA and barcode sequences, in a way that tightly integrates contextual data with traditional literature in a machine-readable way.

  14. Metagenomic species profiling using universal phylogenetic marker genes

    NARCIS (Netherlands)

    Sunagawa, S.; Mende, D.R.; Zeller, G.; Izquierdo-Carrasco, F.; Berger, S.A.; Kultima, J.R.; Coelho, L.P.; Arumugam, M.; Tap, J.; Nielsen, H.B.; Rasmussen, S.; Brunak, S.; Pedersen, O.; Guarner, F.; Vos, de W.M.; Wang, J.; Li, J.; Doré, J.; Ehrlich, S.D.; Stamatakis, A.; Bork, P.

    2013-01-01

    To quantify known and unknown microorganisms at species-level resolution using shotgun sequencing data, we developed a method that establishes metagenomic operational taxonomic units (mOTUs) based on single-copy phylogenetic marker genes. Applied to 252 human fecal samples, the method revealed that

  15. Metagenomic Analysis of the Ferret Fecal Viral Flora

    NARCIS (Netherlands)

    S.L. Smits (Saskia); V.S. Raj (Stalin); M. Oduber (Minoushka); C.M.E. Schapendonk (Claudia); R. Bodewes (Rogier); L.B.V. Provacia (Lisette); K.J. Stittelaar (Koert); A.D.M.E. Osterhaus (Albert); B.L. Haagmans (Bart)

    2013-01-01

    textabstractFerrets are widely used as a small animal model for a number of viral infections, including influenza A virus and SARS coronavirus. To further analyze the microbiological status of ferrets, their fecal viral flora was studied using a metagenomics approach. Novel viruses from the families

  16. MetaGenomic Assembly by Merging (MeGAMerge)

    Energy Technology Data Exchange (ETDEWEB)

    2015-08-03

    "MetaGenomic Assembly by Merging" (MeGAMerge)Is a novel method of merging of multiple genomic assembly or long read data sources for assembly by use of internal trimming/filtering of data, followed by use of two 3rd party tools to merge data by overlap based assembly.

  17. Metagenomic species profiling using universal phylogenetic marker genes

    DEFF Research Database (Denmark)

    Sunagawa, Shinichi; Mende, Daniel R; Zeller, Georg

    2013-01-01

    To quantify known and unknown microorganisms at species-level resolution using shotgun sequencing data, we developed a method that establishes metagenomic operational taxonomic units (mOTUs) based on single-copy phylogenetic marker genes. Applied to 252 human fecal samples, the method revealed...

  18. metaSNV: A tool for metagenomic strain level analysis.

    Directory of Open Access Journals (Sweden)

    Paul Igor Costea

    Full Text Available We present metaSNV, a tool for single nucleotide variant (SNV analysis in metagenomic samples, capable of comparing populations of thousands of bacterial and archaeal species. The tool uses as input nucleotide sequence alignments to reference genomes in standard SAM/BAM format, performs SNV calling for individual samples and across the whole data set, and generates various statistics for individual species including allele frequencies and nucleotide diversity per sample as well as distances and fixation indices across samples. Using published data from 676 metagenomic samples of different sites in the oral cavity, we show that the results of metaSNV are comparable to those of MIDAS, an alternative implementation for metagenomic SNV analysis, while data processing is faster and has a smaller storage footprint. Moreover, we implement a set of distance measures that allow the comparison of genomic variation across metagenomic samples and delineate sample-specific variants to enable the tracking of specific strain populations over time. The implementation of metaSNV is available at: http://metasnv.embl.de/.

  19. The BEL information extraction workflow (BELIEF): evaluation in the BioCreative V BEL and IAT track

    Science.gov (United States)

    Madan, Sumit; Hodapp, Sven; Senger, Philipp; Ansari, Sam; Szostak, Justyna; Hoeng, Julia; Peitsch, Manuel; Fluck, Juliane

    2016-01-01

    Network-based approaches have become extremely important in systems biology to achieve a better understanding of biological mechanisms. For network representation, the Biological Expression Language (BEL) is well designed to collate findings from the scientific literature into biological network models. To facilitate encoding and biocuration of such findings in BEL, a BEL Information Extraction Workflow (BELIEF) was developed. BELIEF provides a web-based curation interface, the BELIEF Dashboard, that incorporates text mining techniques to support the biocurator in the generation of BEL networks. The underlying UIMA-based text mining pipeline (BELIEF Pipeline) uses several named entity recognition processes and relationship extraction methods to detect concepts and BEL relationships in literature. The BELIEF Dashboard allows easy curation of the automatically generated BEL statements and their context annotations. Resulting BEL statements and their context annotations can be syntactically and semantically verified to ensure consistency in the BEL network. In summary, the workflow supports experts in different stages of systems biology network building. Based on the BioCreative V BEL track evaluation, we show that the BELIEF Pipeline automatically extracts relationships with an F-score of 36.4% and fully correct statements can be obtained with an F-score of 30.8%. Participation in the BioCreative V Interactive task (IAT) track with BELIEF revealed a systems usability scale (SUS) of 67. Considering the complexity of the task for new users—learning BEL, working with a completely new interface, and performing complex curation—a score so close to the overall SUS average highlights the usability of BELIEF. Database URL: BELIEF is available at http://www.scaiview.com/belief/ PMID:27694210

  20. EXTRACTION OF VINEYARDS OUT OF AERIAL ORTHO-IMAGE USING TEXTURE INFORMATION

    Directory of Open Access Journals (Sweden)

    A. Le Bris

    2012-07-01

    Full Text Available A cartography of vineyards is required by many mapping agencies, both to draw topographic maps and to complete the "vineyard" layer of large scale land cover databases. In this paper, two distinct approaches are proposed and tested to achieve a (semi-automatic detection of vineyards task out of 50cm ground resolution ortho-images. Both are object based approaches relying on image texture analysis in homogeneous land cover regions. Therefore, the first step (common to both approaches is a segmentation of the image into homogeneous land cover regions. These regions can then be classified as vineyards or not by the next approaches. A first approach consists in a frequency analysis of the image texture in each region. A semi-variogram is first calculated from the ortho-image for each region of the segmentation. A Fourier transform (FFT of this semi-variogram of the image is then considered. If a periodic signal with a high frequency (i.e. of which the frequency is upper than a threshold is identified, the region is labelled as a vineyard. The second approach is a supervised (per region land cover classification one. It uses texture indexes calculated from ortho-images as input image information. In particular, some texture indexes derived from SIFT descriptors calculated from ortho-images have been used in the experiments, giving good results.

  1. Optimization models for cancer classification: extracting gene interaction information from microarray expression data.

    Science.gov (United States)

    Antonov, Alexey V; Tetko, Igor V; Mader, Michael T; Budczies, Jan; Mewes, Hans W

    2004-03-22

    Microarray data appear particularly useful to investigate mechanisms in cancer biology and represent one of the most powerful tools to uncover the genetic mechanisms causing loss of cell cycle control. Recently, several different methods to employ microarray data as a diagnostic tool in cancer classification have been proposed. These procedures take changes in the expression of particular genes into account but do not consider disruptions in certain gene interactions caused by the tumor. It is probable that some genes participating in tumor development do not change their expression level dramatically. Thus, they cannot be detected by simple classification approaches used previously. For these reasons, a classification procedure exploiting information related to changes in gene interactions is needed. We propose a MAximal MArgin Linear Programming (MAMA) method for the classification of tumor samples based on microarray data. This procedure detects groups of genes and constructs models (features) that strongly correlate with particular tumor types. The detected features include genes whose functional relations are changed for particular cancer types. The proposed method was tested on two publicly available datasets and demonstrated a prediction ability superior to previously employed classification schemes. The MAMA system was developed using the linear programming system LINDO http://www.lindo.com. A Perl script that specifies the optimization problem for this software is available upon request from the authors.

  2. Comparative analysis of metagenomes of Italian top soil improvers

    International Nuclear Information System (INIS)

    Gigliucci, Federica; Brambilla, Gianfranco; Tozzoli, Rosangela; Michelacci, Valeria; Morabito, Stefano

    2017-01-01

    Biosolids originating from Municipal Waste Water Treatment Plants are proposed as top soil improvers (TSI) for their beneficial input of organic carbon on agriculture lands. Their use to amend soil is controversial, as it may lead to the presence of emerging hazards of anthropogenic or animal origin in the environment devoted to food production. In this study, we used a shotgun metagenomics sequencing as a tool to perform a characterization of the hazards related with the TSIs. The samples showed the presence of many virulence genes associated to different diarrheagenic E. coli pathotypes as well as of different antimicrobial resistance-associated genes. The genes conferring resistance to Fluoroquinolones was the most relevant class of antimicrobial resistance genes observed in all the samples tested. To a lesser extent traits associated with the resistance to Methicillin in Staphylococci and genes conferring resistance to Streptothricin, Fosfomycin and Vancomycin were also identified. The most represented metal resistance genes were cobalt-zinc-cadmium related, accounting for 15–50% of the sequence reads in the different metagenomes out of the total number of those mapping on the class of resistance to compounds determinants. Moreover the taxonomic analysis performed by comparing compost-based samples and biosolids derived from municipal sewage-sludges treatments divided the samples into separate populations, based on the microbiota composition. The results confirm that the metagenomics is efficient to detect genomic traits associated with pathogens and antimicrobial resistance in complex matrices and this approach can be efficiently used for the traceability of TSI samples using the microorganisms’ profiles as indicators of their origin. - Highlights: • Sludge- and green- based biosolids analysed by metagenomics. • Biosolids may introduce microbial hazards in the food chain. • Metagenomics enables tracking biosolids’ sources.

  3. Functional metagenomics to decipher food-microbe-host crosstalk.

    Science.gov (United States)

    Larraufie, Pierre; de Wouters, Tomas; Potocki-Veronese, Gabrielle; Blottière, Hervé M; Doré, Joël

    2015-02-01

    The recent developments of metagenomics permit an extremely high-resolution molecular scan of the intestinal microbiota giving new insights and opening perspectives for clinical applications. Beyond the unprecedented vision of the intestinal microbiota given by large-scale quantitative metagenomics studies, such as the EU MetaHIT project, functional metagenomics tools allow the exploration of fine interactions between food constituents, microbiota and host, leading to the identification of signals and intimate mechanisms of crosstalk, especially between bacteria and human cells. Cloning of large genome fragments, either from complex intestinal communities or from selected bacteria, allows the screening of these biological resources for bioactivity towards complex plant polymers or functional food such as prebiotics. This permitted identification of novel carbohydrate-active enzyme families involved in dietary fibre and host glycan breakdown, and highlighted unsuspected bacterial players at the top of the intestinal microbial food chain. Similarly, exposure of fractions from genomic and metagenomic clones onto human cells engineered with reporter systems to track modulation of immune response, cell proliferation or cell metabolism has allowed the identification of bioactive clones modulating key cell signalling pathways or the induction of specific genes. This opens the possibility to decipher mechanisms by which commensal bacteria or candidate probiotics can modulate the activity of cells in the intestinal epithelium or even in distal organs such as the liver, adipose tissue or the brain. Hence, in spite of our inability to culture many of the dominant microbes of the human intestine, functional metagenomics open a new window for the exploration of food-microbe-host crosstalk.

  4. Phylogeny and phylogeography of functional genes shared among seven terrestrial subsurface metagenomes reveal N-cycling and microbial evolutionary relationships

    Directory of Open Access Journals (Sweden)

    Maggie CY Lau

    2014-10-01

    Full Text Available Comparative studies on community phylogenetics and phylogeography of microorganisms living in extreme environments are rare. Terrestrial subsurface habitats are valuable for studying microbial biogeographical patterns due to their isolation and the restricted dispersal mechanisms. Since the taxonomic identity of a microorganism does not always correspond well with its functional role in a particular community, the use of taxonomic assignments or patterns may give limited inference on how microbial functions are affected by historical, geographical and environmental factors. With seven metagenomic libraries generated from fracture water samples collected from five South African mines, this study was carried out to (1 screen for ubiquitous functions or pathways of biogeochemical cycling of CH4, S and N; (2 to characterize the biodiversity represented by the common functional genes; (3 to investigate the subsurface biogeography as revealed by this subset of genes; and (4 to explore the possibility of using metagenomic data for evolutionary study. The ubiquitous functional genes are NarV, NPD, PAP reductase, NifH, NifD, NifK, NifE and NifN genes. Although these 8 common functional genes were taxonomically and phylogenetically diverse and distinct from each other, the dissimilarity between samples did not correlate strongly with either geographical, environmental or residence time of the water. Por genes homologous to those of Thermodesulfovibrio yellowstonii detected in all metagenomes were deep lineages of Nitrospirae, suggesting that subsurface habitats have preserved ancestral genetic signatures that inform the study of the origin and evolution of prokaryotes.

  5. Eodataservice.org: Big Data Platform to Enable Multi-disciplinary Information Extraction from Geospatial Data

    Science.gov (United States)

    Natali, S.; Mantovani, S.; Barboni, D.; Hogan, P.

    2017-12-01

    In 1999, US Vice-President Al Gore outlined the concept of `Digital Earth' as a multi-resolution, three-dimensional representation of the planet to find, visualise and make sense of vast amounts of geo- referenced information on physical and social environments, allowing to navigate through space and time, accessing historical and forecast data to support scientists, policy-makers, and any other user. The eodataservice platform (http://eodataservice.org/) implements the Digital Earth Concept: eodatasevice is a cross-domain platform that makes available a large set of multi-year global environmental collections allowing data discovery, visualization, combination, processing and download. It implements a "virtual datacube" approach where data stored on distributed data centers are made available via standardized OGC-compliant interfaces. Dedicated web-based Graphic User Interfaces (based on the ESA-NASA WebWorldWind technology) as well as web-based notebooks (e.g. Jupyter notebook), deskop GIS tools and command line interfaces can be used to access and manipulate the data. The platform can be fully customized on users' needs. So far eodataservice has been used for the following thematic applications: High resolution satellite data distribution Land surface monitoring using SAR surface deformation data Atmosphere, ocean and climate applications Climate-health applications Urban Environment monitoring Safeguard of cultural heritage sites Support to farmers and (re)-insurances in the agriculturés field In the current work, the EO Data Service concept is presented as key enabling technology; furthermore various examples are provided to demonstrate the high level of interdisciplinarity of the platform.

  6. Comparative analysis of the sensitivity of metagenomic sequencing and PCR to detect a biowarfare simulant (Bacillus atrophaeus in soil samples.

    Directory of Open Access Journals (Sweden)

    Delphine Plaire

    Full Text Available To evaluate the sensitivity of high-throughput DNA sequencing for monitoring biowarfare agents in the environment, we analysed soil samples inoculated with different amounts of Bacillus atrophaeus, a surrogate organism for Bacillus anthracis. The soil samples considered were a poorly carbonated soil of the silty sand class, and a highly carbonated soil of the silt class. Control soil samples and soil samples inoculated with 10, 103, or 105 cfu were processed for DNA extraction. About 1% of the DNA extracts was analysed through the sequencing of more than 108 reads. Similar amounts of extracts were also studied for Bacillus atrophaeus DNA content by real-time PCR. We demonstrate that, for both soils, high-throughput sequencing is at least equally sensitive than real-time PCR to detect Bacillus atrophaeus DNA. We conclude that metagenomics allows the detection of less than 10 ppm of DNA from a biowarfare simulant in complex environmental samples.

  7. Comparative analysis of the sensitivity of metagenomic sequencing and PCR to detect a biowarfare simulant (Bacillus atrophaeus) in soil samples.

    Science.gov (United States)

    Plaire, Delphine; Puaud, Simon; Marsolier-Kergoat, Marie-Claude; Elalouf, Jean-Marc

    2017-01-01

    To evaluate the sensitivity of high-throughput DNA sequencing for monitoring biowarfare agents in the environment, we analysed soil samples inoculated with different amounts of Bacillus atrophaeus, a surrogate organism for Bacillus anthracis. The soil samples considered were a poorly carbonated soil of the silty sand class, and a highly carbonated soil of the silt class. Control soil samples and soil samples inoculated with 10, 103, or 105 cfu were processed for DNA extraction. About 1% of the DNA extracts was analysed through the sequencing of more than 108 reads. Similar amounts of extracts were also studied for Bacillus atrophaeus DNA content by real-time PCR. We demonstrate that, for both soils, high-throughput sequencing is at least equally sensitive than real-time PCR to detect Bacillus atrophaeus DNA. We conclude that metagenomics allows the detection of less than 10 ppm of DNA from a biowarfare simulant in complex environmental samples.

  8. Analysis of bacterial xylose isomerase gene diversity using gene-targeted metagenomics.

    Science.gov (United States)

    Nurdiani, Dini; Ito, Michihiro; Maruyama, Toru; Terahara, Takeshi; Mori, Tetsushi; Ugawa, Shin; Takeyama, Haruko

    2015-08-01

    Bacterial xylose isomerases (XI) are promising resources for efficient biofuel production from xylose in lignocellulosic biomass. Here, we investigated xylose isomerase gene (xylA) diversity in three soil metagenomes differing in plant vegetation and geographical location, using an amplicon pyrosequencing approach and two newly-designed primer sets. A total of 158,555 reads from three metagenomic DNA replicates for each soil sample were classified into 1127 phylotypes, detected in triplicate and defined by 90% amino acid identity. The phylotype coverage was estimated to be within the range of 84.0-92.7%. The xylA gene phylotypes obtained were phylogenetically distributed across the two known xylA groups. They shared 49-100% identities with their closest-related XI sequences in GenBank. Phylotypes demonstrating soil sample were significantly smaller than they were between different soils based on a UniFrac distance analysis, suggesting soil-specific xylA genotypes and taxonomic compositions. The differences among xylA members and their compositions in the soil were strongly correlated with 16S rRNA variation between soil samples, also assessed by amplicon pyrosequencing. This is the first report of xylA diversity in environmental samples assessed by amplicon pyrosequencing. Our data provide information regarding xylA diversity in nature, and can be a basis for the screening of novel xylA genotypes for practical applications. Copyright © 2015. Published by Elsevier B.V.

  9. [Mini review] metagenomic studies of the Red Sea

    KAUST Repository

    Behzad, Hayedeh

    2015-10-23

    Metagenomics has significantly advanced the field of marine microbial ecology, revealing the vast diversity of previously unknown microbial life forms in different marine niches. The tremendous amount of data generated has enabled identification of a large number of microbial genes (metagenomes), their community interactions, adaptation mechanisms, and their potential applications in pharmaceutical and biotechnology-based industries. Comparative metagenomics reveals that microbial diversity is a function of the local environment, meaning that unique or unusual environments typically harbor novel microbial species with unique genes and metabolic pathways. The Red Sea has an abundance of unique characteristics; however, its microbiota is one of the least studied amongst marine environments. The Red Sea harbors approximately 25 hot anoxic brine pools, plus a vibrant coral reef ecosystem. Physiochemical studies describe the Red Sea as an oligotrophic environment that contains one of the warmest and saltiest waters in the world with year-round high UV radiations. These characteristics are believed to have shaped the evolution of microbial communities in the Red Sea. Over-representation of genes involved in DNA repair, high-intensity light responses, and osmolyte C1 oxidation were found in the Red Sea metagenomic databases suggesting acquisition of specific environmental adaptation by the Red Sea microbiota. The Red Sea brine pools harbor a diverse range of halophilic and thermophilic bacterial and archaeal communities, which are potential sources of enzymes for pharmaceutical and biotechnology-based application. Understanding the mechanisms of these adaptations and their function within the larger ecosystem could also prove useful in light of predicted global warming scenarios where global ocean temperatures are expected to rise by 1–3 °C in the next few decades. In this review, we provide an overview of the published metagenomic studies that were conducted in the

  10. High throughtput comparisons and profiling of metagenomes for industrially relevant enzymes

    KAUST Repository

    Alam, Intikhab

    2016-01-26

    More and more genomes and metagenomes are being sequenced since the advent of Next Generation Sequencing Technologies (NGS). Many metagenomic samples are collected from a variety of environments, each exhibiting a different environmental profile, e.g. temperature, environmental chemistry, etc… These metagenomes can be profiled to unearth enzymes relevant to several industries based on specific enzyme properties such as ability to work on extreme conditions, such as extreme temperatures, salinity, anaerobically, etc.. In this work, we present the DMAP platform comprising of a high-throughput metagenomic annotation pipeline and a data-warehouse for comparisons and profiling across large number of metagenomes. We developed two reference databases for profiling of important genes, one containing enzymes related to different industries and the other containing genes with potential bioactivity roles. In this presentation we describe an example analysis of a large number of publicly available metagenomic sample from TARA oceans study (Science 2015) that covers significant part of world oceans.

  11. Isolation and characterization of novel lipases from a metagenomic library of the microbial community in the pitcher fluid of the carnivorous plant Nepenthes hybrida.

    Science.gov (United States)

    Morohoshi, Tomohiro; Oikawa, Manabu; Sato, Shoko; Kikuchi, Noriko; Kato, Norihiro; Ikeda, Tsukasa

    2011-10-01

    Members of the genus Nepenthes are carnivorous plants that use the pitfall method of insect capture as a supplementary nutritional source. We extracted metagenomic DNA from the microbial community found in the pitcher fluid of Nepenthes and constructed a plasmid-based metagenomic library. An activity-based screening method enabled the isolation of two lipase genes, lip1 and lip2. Both Lip1 and Lip2 belong to a novel family or subfamily of lipases and show lipase activities in acidic conditions, such as those found in pitcher fluid. This study was conducted under the assumption that the secreted Lip1 and Lip2 were capable of enzymatic activity in the acidic pitcher fluid. Copyright © 2011 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.

  12. Metagenomic analysis of lysogeny in Tampa Bay: implications for prophage gene expression.

    Directory of Open Access Journals (Sweden)

    Lauren McDaniel

    Full Text Available Phage integrase genes often play a role in the establishment of lysogeny in temperate phage by catalyzing the integration of the phage into one of the host's replicons. To investigate temperate phage gene expression, an induced viral metagenome from Tampa Bay was sequenced by 454/Pyrosequencing. The sequencing yielded 294,068 reads with 6.6% identifiable. One hundred-three sequences had significant similarity to integrases by BLASTX analysis (e < or =0.001. Four sequences with strongest amino-acid level similarity to integrases were selected and real-time PCR primers and probes were designed. Initial testing with microbial fraction DNA from Tampa Bay revealed 1.9 x 10(7, and 1300 gene copies of Vibrio-like integrase and Oceanicola-like integrase L(-1 respectively. The other two integrases were not detected. The integrase assay was then tested on microbial fraction RNA extracted from 200 ml of Tampa Bay water sampled biweekly over a 12 month time series. Vibrio-like integrase gene expression was detected in three samples, with estimated copy numbers of 2.4-1280 L(-1. Clostridium-like integrase gene expression was detected in 6 samples, with estimated copy numbers of 37 to 265 L(-1. In all cases, detection of integrase gene expression corresponded to the occurrence of lysogeny as detected by prophage induction. Investigation of the environmental distribution of the two expressed integrases in the Global Ocean Survey Database found the Vibrio-like integrase was present in genome equivalents of 3.14% of microbial libraries and all four viral metagenomes. There were two similar genes in the library from British Columbia and one similar gene was detected in both the Gulf of Mexico and Sargasso Sea libraries. In contrast, in the Arctic library eleven similar genes were observed. The Clostridium-like integrase was less prevalent, being found in 0.58% of the microbial and none of the viral libraries. These results underscore the value of metagenomic data

  13. Metagenomes from two microbial consortia associated with Santa Barbara seep oil.

    Science.gov (United States)

    Hawley, Erik R; Malfatti, Stephanie A; Pagani, Ioanna; Huntemann, Marcel; Chen, Amy; Foster, Brian; Copeland, Alexander; del Rio, Tijana Glavina; Pati, Amrita; Jansson, Janet R; Gilbert, Jack A; Tringe, Susannah Green; Lorenson, Thomas D; Hess, Matthias

    2014-12-01

    The metagenomes from two microbial consortia associated with natural oils seeping into the Pacific Ocean offshore the coast of Santa Barbara (California, USA) were determined to complement already existing metagenomes generated from microbial communities associated with hydrocarbons that pollute the marine ecosystem. This genomics resource article is the first of two publications reporting a total of four new metagenomes from oils that seep into the Santa Barbara Channel. Copyright © 2014 Elsevier B.V. All rights reserved.

  14. Extracting information on the spatial variability in erosion rate stored in detrital cooling age distributions in river sands

    Science.gov (United States)

    Braun, Jean; Gemignani, Lorenzo; van der Beek, Peter

    2018-03-01

    One of the main purposes of detrital thermochronology is to provide constraints on the regional-scale exhumation rate and its spatial variability in actively eroding mountain ranges. Procedures that use cooling age distributions coupled with hypsometry and thermal models have been developed in order to extract quantitative estimates of erosion rate and its spatial distribution, assuming steady state between tectonic uplift and erosion. This hypothesis precludes the use of these procedures to assess the likely transient response of mountain belts to changes in tectonic or climatic forcing. Other methods are based on an a priori knowledge of the in situ distribution of ages to interpret the detrital age distributions. In this paper, we describe a simple method that, using the observed detrital mineral age distributions collected along a river, allows us to extract information about the relative distribution of erosion rates in an eroding catchment without relying on a steady-state assumption, the value of thermal parameters or an a priori knowledge of in situ age distributions. The model is based on a relatively low number of parameters describing lithological variability among the various sub-catchments and their sizes and only uses the raw ages. The method we propose is tested against synthetic age distributions to demonstrate its accuracy and the optimum conditions for it use. In order to illustrate the method, we invert age distributions collected along the main trunk of the Tsangpo-Siang-Brahmaputra river system in the eastern Himalaya. From the inversion of the cooling age distributions we predict present-day erosion rates of the catchments along the Tsangpo-Siang-Brahmaputra river system, as well as some of its tributaries. We show that detrital age distributions contain dual information about present-day erosion rate, i.e., from the predicted distribution of surface ages within each catchment and from the relative contribution of any given catchment to the

  15. MetAnnotate: function-specific taxonomic profiling and comparison of metagenomes.

    Science.gov (United States)

    Petrenko, Pavel; Lobb, Briallen; Kurtz, Daniel A; Neufeld, Josh D; Doxey, Andrew C

    2015-11-05

    Metagenomes provide access to the taxonomic composition and functional capabilities of microbial communities. Although metagenomic analysis methods exist for estimating overall community composition or metabolic potential, identifying specific taxa that encode specific functions or pathways of interest can be more challenging. Here we present MetAnnotate, which addresses the common question: "which organisms perform my function of interest within my metagenome(s) of interest?" MetAnnotate uses profile hidden Markov models to analyze shotgun metagenomes for genes and pathways of interest, classifies retrieved sequences either through a phylogenetic placement or best hit approach, and enables comparison of these profiles between metagenomes. Based on a simulated metagenome dataset, the tool achieves high taxonomic classification accuracy for a broad range of genes, including both markers of community abundance and specific biological pathways. Lastly, we demonstrate MetAnnotate by analyzing for cobalamin (vitamin B12) synthesis genes across hundreds of aquatic metagenomes in a fraction of the time required by the commonly used Basic Local Alignment Search Tool top hit approach. MetAnnotate is multi-threaded and installable as a local web application or command-line tool on Linux systems. Metannotate is a useful framework for general and/or function-specific taxonomic profiling and comparison of metagenomes.

  16. The new science of metagenomics: revealing the secrets of our microbial planet

    National Research Council Canada - National Science Library

    Committee on Metagenomics; National Research Council; Division on Earth and Life Studies; National Research Council

    2007-01-01

    .... The emerging field of metagenomics offers a new way of exploring the microbial world that will transform modern microbiology and lead to practical applications in medicine, agriculture, alternative...

  17. Metagenomic discovery of polybrominated diphenyl ether biosynthesis by marine sponges

    Science.gov (United States)

    Podell, Sheila; Taton, Arnaud; Schorn, Michelle A.; Busch, Julia; Lin, Zhenjian; Schmidt, Eric W.; Jensen, Paul R.; Paul, Valerie J.; Biggs, Jason S.; Golden, James W.; Allen, Eric E.; Moore, Bradley S.

    2017-01-01

    Naturally produced polybrominated diphenyl ethers (PBDEs) pervade the marine environment and structurally resemble toxic man-made brominated flame retardants. PBDEs bioaccumulate in marine animals and are likely transferred to the human food chain. However, the biogenic basis for PBDE production in one of their most prolific sources, marine sponges of the order Dysideidae, remains unidentified. Here, we report the discovery of PBDE biosynthetic gene clusters within sponge microbiome-associated cyanobacterial endosymbionts by employing an unbiased metagenome mining approach. By expression of PBDE biosynthetic genes in heterologous cyanobacterial hosts, we correlate the structural diversity of naturally produced PBDEs to modifications within PBDE biosynthetic gene clusters in multiple sponge holobionts. Our results establish the genetic and molecular foundation for the production of PBDEs in one of the most abundant natural sources of these molecules, further setting the stage for a metagenomic-based inventory of other PBDE sources in the marine environment. PMID:28319100

  18. A metagenomics portal for a democratized sequencing world.

    Science.gov (United States)

    Wilke, Andreas; Glass, Elizabeth M; Bartels, Daniela; Bischof, Jared; Braithwaite, Daniel; D'Souza, Mark; Gerlach, Wolfgang; Harrison, Travis; Keegan, Kevin; Matthews, Hunter; Kottmann, Renzo; Paczian, Tobias; Tang, Wei; Trimble, William L; Yilmaz, Pelin; Wilkening, Jared; Desai, Narayan; Meyer, Folker

    2013-01-01

    The democratized world of sequencing is leading to numerous data analysis challenges; MG-RAST addresses many of these challenges for diverse datasets, including amplicon datasets, shotgun metagenomes, and metatranscriptomes. The changes from version 2 to version 3 include the addition of a dedicated gene calling stage using FragGenescan, clustering of predicted proteins at 90% identity, and the use of BLAT for the computation of similarities. Together with changes in the underlying software infrastructure, this has enabled the dramatic scaling up of pipeline throughput while remaining on a limited hardware budget. The Web-based service allows upload, fully automated analysis, and visualization of results. As a result of the plummeting cost of sequencing and the readily available analytical power of MG-RAST, over 78,000 metagenomic datasets have been analyzed, with over 12,000 of them publicly available in MG-RAST. © 2013 Elsevier Inc. All rights reserved.

  19. Metagenome of a Versatile Chemolithoautotroph from Expanding Oceanic Dead Zones

    Energy Technology Data Exchange (ETDEWEB)

    Walsh, David A.; Zaikova, Elena; Howes, Charles L.; Song, Young; Wright, Jody; Tringe, Susannah G.; Tortell, Philippe D.; Hallam, Steven J.

    2009-07-15

    Oxygen minimum zones (OMZs), also known as oceanic"dead zones", are widespread oceanographic features currently expanding due to global warming and coastal eutrophication. Although inhospitable to metazoan life, OMZs support a thriving but cryptic microbiota whose combined metabolic activity is intimately connected to nutrient and trace gas cycling within the global ocean. Here we report time-resolved metagenomic analyses of a ubiquitous and abundant but uncultivated OMZ microbe (SUP05) closely related to chemoautotrophic gill symbionts of deep-sea clams and mussels. The SUP05 metagenome harbors a versatile repertoire of genes mediating autotrophic carbon assimilation, sulfur-oxidation and nitrate respiration responsive to a wide range of water column redox states. Thus, SUP05 plays integral roles in shaping nutrient and energy flow within oxygen-deficient oceanic waters via carbon sequestration, sulfide detoxification and biological nitrogen loss with important implications for marine productivity and atmospheric greenhouse control.

  20. A metagenomic framework for the study of airborne microbial communities.

    Directory of Open Access Journals (Sweden)

    Shibu Yooseph

    Full Text Available Understanding the microbial content of the air has important scientific, health, and economic implications. While studies have primarily characterized the taxonomic content of air samples by sequencing the 16S or 18S ribosomal RNA gene, direct analysis of the genomic content of airborne microorganisms has not been possible due to the extremely low density of biological material in airborne environments. We developed sampling and amplification methods to enable adequate DNA recovery to allow metagenomic profiling of air samples collected from indoor and outdoor environments. Air samples were collected from a large urban building, a medical center, a house, and a pier. Analyses of metagenomic data generated from these samples reveal airborne communities with a high degree of diversity and different genera abundance profiles. The identities of many of the taxonomic groups and protein families also allows for the identification of the likely sources of the sampled airborne bacteria.

  1. Metagenome of a versatile chemolithoautotroph from expanding oceanic dead zones.

    Science.gov (United States)

    Walsh, David A; Zaikova, Elena; Howes, Charles G; Song, Young C; Wright, Jody J; Tringe, Susannah G; Tortell, Philippe D; Hallam, Steven J

    2009-10-23

    Oxygen minimum zones, also known as oceanic "dead zones," are widespread oceanographic features currently expanding because of global warming. Although inhospitable to metazoan life, they support a cryptic microbiota whose metabolic activities affect nutrient and trace gas cycling within the global ocean. Here, we report metagenomic analyses of a ubiquitous and abundant but uncultivated oxygen minimum zone microbe (SUP05) related to chemoautotrophic gill symbionts of deep-sea clams and mussels. The SUP05 metagenome harbors a versatile repertoire of genes mediating autotrophic carbon assimilation, sulfur oxidation, and nitrate respiration responsive to a wide range of water-column redox states. Our analysis provides a genomic foundation for understanding the ecological and biogeochemical role of pelagic SUP05 in oxygen-deficient oceanic waters and its potential sensitivity to environmental changes.

  2. Extremozymes from metagenome: Potential applications in food processing.

    Science.gov (United States)

    Khan, Mahejibin; Sathya, T A

    2017-06-12

    The long-established use of enzymes for food processing and product formulation has resulted in an increased enzyme market compounding to 7.0% annual growth rate. Advancements in molecular biology and recognition that enzymes with specific properties have application for industrial production of infant, baby and functional foods boosted research toward sourcing the genes of microorganisms for enzymes with distinctive properties. In this regard, functional metagenomics for extremozymes has gained attention on the premise that such enzymes can catalyze specific reactions. Hence, metagenomics that can isolate functional genes of unculturable extremophilic microorganisms has expanded attention as a promising tool. Developments in this field of research in relation to food sector are reviewed.

  3. Fast and sensitive taxonomic classification for metagenomics with Kaiju

    DEFF Research Database (Denmark)

    Menzel, Peter; Ng, Kim Lee; Krogh, Anders

    2016-01-01

    reads in ten real metagenomes compared to programs based on genomic k-mers. Kaiju can process up to millions of reads per minute, and its memory footprint is below 5 GB of RAM, allowing the analysis on a standard PC. The program is available under the GPL3 license at: github.com/bioinformatics-centre/kaiju...... and genomes in the reference database. Here, we present the novel metagenome classifier Kaiju for fast assignment of reads to taxa. Kaiju finds maximum exact matches on the protein-level using the Borrows-Wheeler transform, and can optionally allow amino acid substitutions in the search using a greedy...... heuristic. We show in a genome exclusion study that Kaiju can classify more reads with higher sensitivity and similar precision compared to fast k-mer based classifiers, especially in genera that are underrepresented in reference databases. We also demonstrate that Kaiju classifies more than twice as many...

  4. Metagenomic Survey of a Military-Impacted Lagoon in Puerto Rico

    Science.gov (United States)

    Davila-Santiago, L.; DeLeon-Rodriguez, N.; LaSanta-Pagan, K. Y.; Kurt, Z.; Padilla-Crespo, E.; Hatt, J.; Spain, J.; Konstantinidis, K.; Massol-Deya, A.

    2016-02-01

    Military practices have left a legacy of contamination worldwide. In Puerto Rico, the east part of the populated Vieques Island was used for over fifty years as a bombing range by the Navy. A year after the base was closed in 2003, the impacted area was designated as a Superfund site. Previous studies have shown elevated levels of heavy metals, explosives (e.g. RDX, TNT, HMX), and other toxic chemicals at the site. The Anones Lagoon, located in the middle of the bombing range is one of the most polluted spots within the site. Intermittently, the lagoon is connected through a channel to the Caribbean Sea. In order to describe the microbial diversity and its potential contribution to natural attenuation of explosives, sediment samples have been collected since 2005. Sediment from reference lagoons (San Juan and Cabo Rojo) have also been sampled and analyzed in parallel for comparisons. Total DNA was extracted and sequenced using Ilumina My-Seq platform. Results indicate that Gammaproteobacteria were abundant in all lagoons samples but the Vieques lagoon harbors overall different microbial taxa. Alpha diversity analysis showed that Anones was less diverse compared to the pristine Cabo Rojo lagoon. Importantly, a clear shift was seen in the Anones Lagoon in 2013 compared to 2005, were Halomonas spp. became dominant (up to 25%) while other groups like Marinobacter showed signs of enrichment as well. Interestingly, these groups have been shown to degrade explosive-related chemicals in tropical sediments. Functional gene annotation of the Anones metagenome showed the presence of RDX degradation genes such as cytochrome p450. This study is the first comparative metagenomic survey of lagoons in Puerto Rico that explored the microbial diversity and biodegradation potential at Vieques.

  5. Metagenomic Analysis of Milk of Healthy and Mastitis-Suffering Women.

    Science.gov (United States)

    Jiménez, Esther; de Andrés, Javier; Manrique, Marina; Pareja-Tobes, Pablo; Tobes, Raquel; Martínez-Blanch, Juan F; Codoñer, Francisco M; Ramón, Daniel; Fernández, Leónides; Rodríguez, Juan M

    2015-08-01

    Some studies have been conducted to assess the composition of the bacterial communities inhabiting human milk, but they did not evaluate the presence of other microorganisms, such as fungi, archaea, protozoa, or viruses. This study aimed to compare the metagenome of human milk samples provided by healthy and mastitis-suffering women. DNA was isolated from human milk samples collected from 10 healthy women and 10 women with symptoms of lactational mastitis. Shotgun libraries from total extracted DNA were constructed and the libraries were sequenced by 454 pyrosequencing. The amount of human DNA sequences was ≥ 90% in all the samples. Among the bacterial sequences, the predominant phyla were Proteobacteria, Firmicutes, and Bacteroidetes. The healthy core microbiome included the genera Staphylococcus, Streptococcus, Bacteroides, Faecalibacterium, Ruminococcus, Lactobacillus, and Propionibacterium. At the species level, a high degree of inter-individual variability was observed among healthy women. In contrast, Staphylococcus aureus clearly dominated the microbiome in the samples from the women with acute mastitis whereas high increases in Staphylococcus epidermidis-related reads were observed in the milk of those suffering from subacute mastitis. Fungal and protozoa-related reads were identified in most of the samples, whereas Archaea reads were absent in samples from women with mastitis. Some viral-related sequence reads were also detected. Human milk contains a complex microbial metagenome constituted by the genomes of bacteria, archaea, viruses, fungi, and protozoa. In mastitis cases, the milk microbiome reflects a loss of bacterial diversity and a high increase of the sequences related to the presumptive etiological agents. © The Author(s) 2015.

  6. Application of metagenomics in the human gut microbiome

    OpenAIRE

    Wang, Wei-Lin; Xu, Shao-Yan; Ren, Zhi-Gang; Tao, Liang; Jiang, Jian-Wen; Zheng, Shu-Sen

    2015-01-01

    There are more than 1000 microbial species living in the complex human intestine. The gut microbial community plays an important role in protecting the host against pathogenic microbes, modulating immunity, regulating metabolic processes, and is even regarded as an endocrine organ. However, traditional culture methods are very limited for identifying microbes. With the application of molecular biologic technology in the field of the intestinal microbiome, especially metagenomic sequencing of ...

  7. Metagenome-derived haloalkane dehalogenases with novel catalytic properties

    Czech Academy of Sciences Publication Activity Database

    Kotík, Michael; Vaňáček, P.; Kuňka, A.; Prokop, Z.; Dambrovský, J.

    2017-01-01

    Roč. 101, č. 16 (2017), s. 6385-6397 ISSN 0175-7598 R&D Projects: GA ČR GAP504/10/0137; GA MŠk(CZ) LM2015047; GA MŠk(CZ) LM2015055 Institutional support: RVO:61388971 Keywords : Haloalkane dehalogenase * Metagenomic DNA * Heterologous production Subject RIV: CE - Biochemistry OBOR OECD: Biochemistry and molecular biology Impact factor: 3.420, year: 2016

  8. Quantitative metagenomics reveals unique gut microbiome biomarkers in ankylosing spondylitis

    OpenAIRE

    Le Chatelier, Emmanuelle; He, Zhixing; Zhong, Wendi; Fan, Yongsheng; Zhang, Linshuang; Li, Haichang; Wu, Chunyan; Hu, Changfeng; Xu, Qian; Zhou, Jia; Cai, Shunfeng; Wang, Dawei; Huang, Yun; Breban, Maxime; Qin, Nan

    2017-01-01

    Background The assessment and characterization of the gut microbiome has become a focus of research in the area of human autoimmune diseases. Ankylosing spondylitis is an inflammatory autoimmune disease and evidence showed that ankylosing spondylitis may be a microbiome-driven disease. Results To investigate the relationship between the gut microbiome and ankylosing spondylitis, a quantitative metagenomics study based on deep shotgun sequencing was performed, using gut microbial DNA from 211 ...

  9. Forest harvesting reduces the soil metagenomic potential for biomass decomposition.

    Science.gov (United States)

    Cardenas, Erick; Kranabetter, J M; Hope, Graeme; Maas, Kendra R; Hallam, Steven; Mohn, William W

    2015-11-01

    Soil is the key resource that must be managed to ensure sustainable forest productivity. Soil microbial communities mediate numerous essential ecosystem functions, and recent studies show that forest harvesting alters soil community composition. From a long-term soil productivity study site in a temperate coniferous forest in British Columbia, 21 forest soil shotgun metagenomes were generated, totaling 187 Gb. A method to analyze unassembled metagenome reads from the complex community was optimized and validated. The subsequent metagenome analysis revealed that, 12 years after forest harvesting, there were 16% and 8% reductions in relative abundances of biomass decomposition genes in the organic and mineral soil layers, respectively. Organic and mineral soil layers differed markedly in genetic potential for biomass degradation, with the organic layer having greater potential and being more strongly affected by harvesting. Gene families were disproportionately affected, and we identified 41 gene families consistently affected by harvesting, including families involved in lignin, cellulose, hemicellulose and pectin degradation. The results strongly suggest that harvesting profoundly altered below-ground cycling of carbon and other nutrients at this site, with potentially important consequences for forest regeneration. Thus, it is important to determine whether these changes foreshadow long-term changes in forest productivity or resilience and whether these changes are broadly characteristic of harvested forests.

  10. PhyloSift: phylogenetic analysis of genomes and metagenomes.

    Science.gov (United States)

    Darling, Aaron E; Jospin, Guillaume; Lowe, Eric; Matsen, Frederick A; Bik, Holly M; Eisen, Jonathan A

    2014-01-01

    Like all organisms on the planet, environmental microbes are subject to the forces of molecular evolution. Metagenomic sequencing provides a means to access the DNA sequence of uncultured microbes. By combining DNA sequencing of microbial communities with evolutionary modeling and phylogenetic analysis we might obtain new insights into microbiology and also provide a basis for practical tools such as forensic pathogen detection. In this work we present an approach to leverage phylogenetic analysis of metagenomic sequence data to conduct several types of analysis. First, we present a method to conduct phylogeny-driven Bayesian hypothesis tests for the presence of an organism in a sample. Second, we present a means to compare community structure across a collection of many samples and develop direct associations between the abundance of certain organisms and sample metadata. Third, we apply new tools to analyze the phylogenetic diversity of microbial communities and again demonstrate how this can be associated to sample metadata. These analyses are implemented in an open source software pipeline called PhyloSift. As a pipeline, PhyloSift incorporates several other programs including LAST, HMMER, and pplacer to automate phylogenetic analysis of protein coding and RNA sequences in metagenomic datasets generated by modern sequencing platforms (e.g., Illumina, 454).

  11. PhyloSift: phylogenetic analysis of genomes and metagenomes

    Directory of Open Access Journals (Sweden)

    Aaron E. Darling

    2014-01-01

    Full Text Available Like all organisms on the planet, environmental microbes are subject to the forces of molecular evolution. Metagenomic sequencing provides a means to access the DNA sequence of uncultured microbes. By combining DNA sequencing of microbial communities with evolutionary modeling and phylogenetic analysis we might obtain new insights into microbiology and also provide a basis for practical tools such as forensic pathogen detection.In this work we present an approach to leverage phylogenetic analysis of metagenomic sequence data to conduct several types of analysis. First, we present a method to conduct phylogeny-driven Bayesian hypothesis tests for the presence of an organism in a sample. Second, we present a means to compare community structure across a collection of many samples and develop direct associations between the abundance of certain organisms and sample metadata. Third, we apply new tools to analyze the phylogenetic diversity of microbial communities and again demonstrate how this can be associated to sample metadata.These analyses are implemented in an open source software pipeline called PhyloSift. As a pipeline, PhyloSift incorporates several other programs including LAST, HMMER, and pplacer to automate phylogenetic analysis of protein coding and RNA sequences in metagenomic datasets generated by modern sequencing platforms (e.g., Illumina, 454.

  12. Bioinformatic approaches reveal metagenomic characterization of soil microbial community.

    Directory of Open Access Journals (Sweden)

    Zhuofei Xu

    Full Text Available As is well known, soil is a complex ecosystem harboring the most prokaryotic biodiversity on the Earth. In recent years, the advent of high-throughput sequencing techniques has greatly facilitated the progress of soil ecological studies. However, how to effectively understand the underlying biological features of large-scale sequencing data is a new challenge. In the present study, we used 33 publicly available metagenomes from diverse soil sites (i.e. grassland, forest soil, desert, Arctic soil, and mangrove sediment and integrated some state-of-the-art computational tools to explore the phylogenetic and functional characterizations of the microbial communities in soil. Microbial composition and metabolic potential in soils were comprehensively illustrated at the metagenomic level. A spectrum of metagenomic biomarkers containing 46 taxa and 33 metabolic modules were detected to be significantly differential that could be used as indicators to distinguish at least one of five soil communities. The co-occurrence associations between complex microbial compositions and functions were inferred by network-based approaches. Our results together with the established bioinformatic pipelines should provide a foundation for future research into the relation between soil biodiversity and ecosystem function.

  13. Challenges of the Unknown: Clinical Application of Microbial Metagenomics

    Directory of Open Access Journals (Sweden)

    Graham Rose

    2015-01-01

    Full Text Available Availability of fast, high throughput and low cost whole genome sequencing holds great promise within public health microbiology, with applications ranging from outbreak detection and tracking transmission events to understanding the role played by microbial communities in health and disease. Within clinical metagenomics, identifying microorganisms from a complex and host enriched background remains a central computational challenge. As proof of principle, we sequenced two metagenomic samples, a known viral mixture of 25 human pathogens and an unknown complex biological model using benchtop technology. The datasets were then analysed using a bioinformatic pipeline developed around recent fast classification methods. A targeted approach was able to detect 20 of the viruses against a background of host contamination from multiple sources and bacterial contamination. An alternative untargeted identification method was highly correlated with these classifications, and over 1,600 species were identified when applied to the complex biological model, including several species captured at over 50% genome coverage. In summary, this study demonstrates the great potential of applying metagenomics within the clinical laboratory setting and that this can be achieved using infrastructure available to nondedicated sequencing centres.

  14. Revealing large metagenomic regions through long DNA fragment hybridization capture.

    Science.gov (United States)

    Gasc, Cyrielle; Peyret, Pierre

    2017-03-14

    High-throughput DNA sequencing technologies have revolutionized genomic analysis, including the de novo assembly of whole genomes from single organisms or metagenomic samples. However, due to the limited capacity of short-read sequence data to assemble complex or low coverage regions, genomes are typically fragmented, leading to draft genomes with numerous underexplored large genomic regions. Revealing these missing sequences is a major goal to resolve concerns in numerous biological studies. To overcome these limitations, we developed an innovative target enrichment method for the reconstruction of large unknown genomic regions. Based on a hybridization capture strategy, this approach enables the enrichment of large genomic regions allowing the reconstruction of tens of kilobase pairs flanking a short, targeted DNA sequence. Applied to a metagenomic soil sample targeting the linA gene, the biomarker of hexachlorocyclohexane (HCH) degradation, our method permitted the enrichment of the gene and its flanking regions leading to the reconstruction of several contigs and complete plasmids exceeding tens of kilobase pairs surrounding linA. Thus, through gene association and genome reconstruction, we identified microbial species involved in HCH degradation which constitute targets to improve biostimulation treatments. This new hybridization capture strategy makes surveying and deconvoluting complex genomic regions possible through large genomic regions enrichment and allows the efficient exploration of metagenomic diversity. Indeed, this approach enables to assign identity and function to microorganisms in natural environments, one of the ultimate goals of microbial ecology.

  15. Application of Text Information Extraction System for Real-Time Cancer Case Identification in an Integrated Healthcare Organization.

    Science.gov (United States)

    Xie, Fagen; Lee, Janet; Munoz-Plaza, Corrine E; Hahn, Erin E; Chen, Wansu

    2017-01-01

    Surgical pathology reports (SPR) contain rich clinical diagnosis information. The text information extraction system (TIES) is an end-to-end application leveraging natural language processing technologies and focused on the processing of pathology and/or radiology reports. We deployed the TIES system and integrated SPRs into the TIES system on a daily basis at Kaiser Permanente Southern California. The breast cancer cases diagnosed in December 2013 from the Cancer Registry (CANREG) were used to validate the performance of the TIES system. The National Cancer Institute Metathesaurus (NCIM) concept terms and codes to describe breast cancer were identified through the Unified Medical Language System Terminology Service (UTS) application. The identified NCIM codes were used to search for the coded SPRs in the back-end datastore directly. The identified cases were then compared with the breast cancer patients pulled from CANREG. A total of 437 breast cancer concept terms and 14 combinations of "breast"and "cancer" terms were identified from the UTS application. A total of 249 breast cancer cases diagnosed in December 2013 was pulled from CANREG. Out of these 249 cases, 241 were successfully identified by the TIES system from a total of 457 reports. The TIES system also identified an additional 277 cases that were not part of the validation sample. Out of the 277 cases, 11% were determined as highly likely to be cases after manual examinations, and 86% were in CANREG but were diagnosed in months other than December of 2013. The study demonstrated that the TIES system can effectively identify potential breast cancer cases in our care setting. Identified potential cases can be easily confirmed by reviewing the corresponding annotated reports through the front-end visualization interface. The TIES system is a great tool for identifying potential various cancer cases in a timely manner and on a regular basis in support of clinical research studies.

  16. Application of text information extraction system for real-time cancer case identification in an integrated healthcare organization

    Directory of Open Access Journals (Sweden)

    Fagen Xie

    2017-01-01

    Full Text Available Background: Surgical pathology reports (SPR contain rich clinical diagnosis information. The text information extraction system (TIES is an end-to-end application leveraging natural language processing technologies and focused on the processing of pathology and/or radiology reports. Methods: We deployed the TIES system and integrated SPRs into the TIES system on a daily basis at Kaiser Permanente Southern California. The breast cancer cases diagnosed in December 2013 from the Cancer Registry (CANREG were used to validate the performance of the TIES system. The National Cancer Institute Metathesaurus (NCIM concept terms and codes to describe breast cancer were identified through the Unified Medical Language System Terminology Service (UTS application. The identified NCIM codes were used to search for the coded SPRs in the back-end datastore directly. The identified cases were then compared with the breast cancer patients pulled from CANREG. Results: A total of 437 breast cancer concept terms and 14 combinations of “breast” and “cancer” terms were identified from the UTS application. A total of 249 breast cancer cases diagnosed in December 2013 was pulled from CANREG. Out of these 249 cases, 241 were successfully identified by the TIES system from a total of 457 reports. The TIES system also identified an additional 277 cases that were not part of the validation sample. Out of the 277 cases, 11% were determined as highly likely to be cases after manual examinations, and 86% were in CANREG but were diagnosed in months other than December of 2013. Conclusions: The study demonstrated that the TIES system can effectively identify potential breast cancer cases in our care setting. Identified potential cases can be easily confirmed by reviewing the corresponding annotated reports through the front-end visualization interface. The TIES system is a great tool for identifying potential various cancer cases in a timely manner and on a regular basis

  17. Cloning, expression and characteristics of a novel alkalistable and thermostable xylanase encoding gene (Mxyl retrieved from compost-soil metagenome.

    Directory of Open Access Journals (Sweden)

    Digvijay Verma

    Full Text Available BACKGROUND: The alkalistable and thermostable xylanases are in high demand for pulp bleaching in paper industry and generating xylooligosaccharides by hydrolyzing xylan component of agro-residues. The compost-soil samples, one of the hot environments, are expected to be a rich source of microbes with thermostable enzymes. METHODOLOGY/PRINCIPAL FINDINGS: Metagenomic DNA from hot environmental samples could be a rich source of novel biocatalysts. While screening metagenomic library constructed from DNA extracted from the compost-soil in the p18GFP vector, a clone (TSDV-MX1 was detected that exhibited clear zone of xylan hydrolysis on RBB xylan plate. The sequencing of 6.321 kb DNA insert and its BLAST analysis detected the presence of xylanase gene that comprised 1077 bp. The deduced protein sequence (358 amino acids displayed homology with glycosyl hydrolase (GH family 11 xylanases. The gene was subcloned into pET28a vector and expressed in E. coli BL21 (DE3. The recombinant xylanase (rMxyl exhibited activity over a broad range of pH and temperature with optima at pH 9.0 and 80°C. The recombinant xylanase is highly thermostable having T1/2 of 2 h at 80°C and 15 min at 90°C. CONCLUSION/SIGNIFICANCE: This is the first report on the retrieval of xylanase gene through metagenomic approach that encodes an enzyme with alkalistability and thermostability. The recombinant xylanase has a potential application in paper and pulp industry in pulp bleaching and generating xylooligosaccharides from the abundantly available agro-residues.

  18. Deconvoluting simulated metagenomes: the performance of hard- and soft- clustering algorithms applied to metagenomic chromosome conformation capture (3C

    Directory of Open Access Journals (Sweden)

    Matthew Z. DeMaere

    2016-11-01

    Full Text Available Background Chromosome conformation capture, coupled with high throughput DNA sequencing in protocols like Hi-C and 3C-seq, has been proposed as a viable means of generating data to resolve the genomes of microorganisms living in naturally occuring environments. Metagenomic Hi-C and 3C-seq datasets have begun to emerge, but the feasibility of resolving genomes when closely related organisms (strain-level diversity are present in the sample has not yet been systematically characterised. Methods We developed a computational simulation pipeline for metagenomic 3C and Hi-C sequencing to evaluate the accuracy of genomic reconstructions at, above, and below an operationally defined species boundary. We simulated datasets and measured accuracy over a wide range of parameters. Five clustering algorithms were evaluated (2 hard, 3 soft using an adaptation of the extended B-cubed validation measure. Results When all genomes in a sample are below 95% sequence identity, all of the tested clustering algorithms performed well. When sequence data contains genomes above 95% identity (our operational definition of strain-level diversity, a naive soft-clustering extension of the Louvain method achieves the highest performance. Discussion Previously, only hard-clustering algorithms have been applied to metagenomic 3C and Hi-C data, yet none of these perform well when strain-level diversity exists in a metagenomic sample. Our simple extension of the Louvain method performed the best in these scenarios, however, accuracy remained well below the levels observed for samples without strain-level diversity. Strain resolution is also highly dependent on the amount of available 3C sequence data, suggesting that depth of sequencing must be carefully considered during experimental design. Finally, there appears to be great scope to improve the accuracy of strain resolution through further algorithm development.

  19. Data on metagenomic profiles of activated sludge from a full-scale wastewater treatment plant

    Directory of Open Access Journals (Sweden)

    Jianhua Guo

    2017-12-01

    Full Text Available The data in this article mainly present the sequences of activated sludge from a full-scale municipal wastewater treatment plant (WWTP carrying out simultaneous nitrogen and phosphorous removal in Beijing, China. Data include the operational conditions and performance, dominant microbes and taxonomic analysis in this WWTP, and function annotation results based on SEED, Clusters of Orthologous Groups (COG, and Kyoto Encyclopedia of Genes and Genomes (KEGG databases. Sequencing data were generated by using Illumina HiSeq. 2000 platform according to the recommendations of the manufacturer. The sequencing data have been deposited in MG-RAST server (project ID: mgm4735473.3. For more information, see “Unraveling microbial structure and diversity of activated sludge in a full-scale simultaneous nitrogen and phosphorus removal plant using metagenomic sequencing” by Guo et al. (2017 [1].

  20. Metagenomics Study on the Polymorphism of Gut Microbiota and Their Function on Human Health

    DEFF Research Database (Denmark)

    Feng, Qiang

    gut catalog revealed that only 4% of the genes were shared by the human counterpart, highlighting the significant difference between these two gut microbiome datasets. To understand the establishing process of gut microbiome during the first year of life after birth, fecal samples from 98 infants......As the key component of the human micro-ecosystem, intestinal microorganisms transfer energies and exchange information with the human body. While they play essential roles in maintaining homeostasis in our bodies, until recently, we have had very limited understanding of the extent of taxonomic...... diversity and functional complexity of the gut microbiome. Facilitated by the Next Generation Sequencing (NGS) technologies and the progress of bioinformatics in the past decade, we have acquired substantial achievements in metagenomic studies on human gut microbiome and established the fundamentals of our...