WorldWideScience

Sample records for genomics approach matching

  1. Matching phenotypes to whole genomes: Lessons learned from four iterations of the personal genome project community challenges.

    Science.gov (United States)

    Cai, Binghuang; Li, Biao; Kiga, Nikki; Thusberg, Janita; Bergquist, Timothy; Chen, Yun-Ching; Niknafs, Noushin; Carter, Hannah; Tokheim, Collin; Beleva-Guthrie, Violeta; Douville, Christopher; Bhattacharya, Rohit; Yeo, Hui Ting Grace; Fan, Jean; Sengupta, Sohini; Kim, Dewey; Cline, Melissa; Turner, Tychele; Diekhans, Mark; Zaucha, Jan; Pal, Lipika R; Cao, Chen; Yu, Chen-Hsin; Yin, Yizhou; Carraro, Marco; Giollo, Manuel; Ferrari, Carlo; Leonardi, Emanuela; Tosatto, Silvio C E; Bobe, Jason; Ball, Madeleine; Hoskins, Roger A; Repo, Susanna; Church, George; Brenner, Steven E; Moult, John; Gough, Julian; Stanke, Mario; Karchin, Rachel; Mooney, Sean D

    2017-09-01

    The advent of next-generation sequencing has dramatically decreased the cost for whole-genome sequencing and increased the viability for its application in research and clinical care. The Personal Genome Project (PGP) provides unrestricted access to genomes of individuals and their associated phenotypes. This resource enabled the Critical Assessment of Genome Interpretation (CAGI) to create a community challenge to assess the bioinformatics community's ability to predict traits from whole genomes. In the CAGI PGP challenge, researchers were asked to predict whether an individual had a particular trait or profile based on their whole genome. Several approaches were used to assess submissions, including ROC AUC (area under receiver operating characteristic curve), probability rankings, the number of correct predictions, and statistical significance simulations. Overall, we found that prediction of individual traits is difficult, relying on a strong knowledge of trait frequency within the general population, whereas matching genomes to trait profiles relies heavily upon a small number of common traits including ancestry, blood type, and eye color. When a rare genetic disorder is present, profiles can be matched when one or more pathogenic variants are identified. Prediction accuracy has improved substantially over the last 6 years due to improved methodology and a better understanding of features. © 2017 Wiley Periodicals, Inc.

  2. A Perfect Match Genomic Landscape Provides a Unified Framework for the Precise Detection of Variation in Natural and Synthetic Haploid Genomes.

    Science.gov (United States)

    Palacios-Flores, Kim; García-Sotelo, Jair; Castillo, Alejandra; Uribe, Carina; Aguilar, Luis; Morales, Lucía; Gómez-Romero, Laura; Reyes, José; Garciarubio, Alejandro; Boege, Margareta; Dávila, Guillermo

    2018-04-01

    We present a conceptually simple, sensitive, precise, and essentially nonstatistical solution for the analysis of genome variation in haploid organisms. The generation of a Perfect Match Genomic Landscape (PMGL), which computes intergenome identity with single nucleotide resolution, reveals signatures of variation wherever a query genome differs from a reference genome. Such signatures encode the precise location of different types of variants, including single nucleotide variants, deletions, insertions, and amplifications, effectively introducing the concept of a general signature of variation. The precise nature of variants is then resolved through the generation of targeted alignments between specific sets of sequence reads and known regions of the reference genome. Thus, the perfect match logic decouples the identification of the location of variants from the characterization of their nature, providing a unified framework for the detection of genome variation. We assessed the performance of the PMGL strategy via simulation experiments. We determined the variation profiles of natural genomes and of a synthetic chromosome, both in the context of haploid yeast strains. Our approach uncovered variants that have previously escaped detection. Moreover, our strategy is ideally suited for further refining high-quality reference genomes. The source codes for the automated PMGL pipeline have been deposited in a public repository. Copyright © 2018 by the Genetics Society of America.

  3. A computational approach to distinguish somatic vs. germline origin of genomic alterations from deep sequencing of cancer specimens without a matched normal.

    Directory of Open Access Journals (Sweden)

    James X Sun

    2018-02-01

    Full Text Available A key constraint in genomic testing in oncology is that matched normal specimens are not commonly obtained in clinical practice. Thus, while well-characterized genomic alterations do not require normal tissue for interpretation, a significant number of alterations will be unknown in whether they are germline or somatic, in the absence of a matched normal control. We introduce SGZ (somatic-germline-zygosity, a computational method for predicting somatic vs. germline origin and homozygous vs. heterozygous or sub-clonal state of variants identified from deep massively parallel sequencing (MPS of cancer specimens. The method does not require a patient matched normal control, enabling broad application in clinical research. SGZ predicts the somatic vs. germline status of each alteration identified by modeling the alteration's allele frequency (AF, taking into account the tumor content, tumor ploidy, and the local copy number. Accuracy of the prediction depends on the depth of sequencing and copy number model fit, which are achieved in our clinical assay by sequencing to high depth (>500x using MPS, covering 394 cancer-related genes and over 3,500 genome-wide single nucleotide polymorphisms (SNPs. Calls are made using a statistic based on read depth and local variability of SNP AF. To validate the method, we first evaluated performance on samples from 30 lung and colon cancer patients, where we sequenced tumors and matched normal tissue. We examined predictions for 17 somatic hotspot mutations and 20 common germline SNPs in 20,182 clinical cancer specimens. To assess the impact of stromal admixture, we examined three cell lines, which were titrated with their matched normal to six levels (10-75%. Overall, predictions were made in 85% of cases, with 95-99% of variants predicted correctly, a significantly superior performance compared to a basic approach based on AF alone. We then applied the SGZ method to the COSMIC database of known somatic variants

  4. Approaches for Stereo Matching

    Directory of Open Access Journals (Sweden)

    Takouhi Ozanian

    1995-04-01

    Full Text Available This review focuses on the last decade's development of the computational stereopsis for recovering three-dimensional information. The main components of the stereo analysis are exposed: image acquisition and camera modeling, feature selection, feature matching and disparity interpretation. A brief survey is given of the well known feature selection approaches and the estimation parameters for this selection are mentioned. The difficulties in identifying correspondent locations in the two images are explained. Methods as to how effectively to constrain the search for correct solution of the correspondence problem are discussed, as are strategies for the whole matching process. Reasons for the occurrence of matching errors are considered. Some recently proposed approaches, employing new ideas in the modeling of stereo matching in terms of energy minimization, are described. Acknowledging the importance of computation time for real-time applications, special attention is paid to parallelism as a way to achieve the required level of performance. The development of trinocular stereo analysis as an alternative to the conventional binocular one, is described. Finally a classification based on the test images for verification of the stereo matching algorithms, is supplied.

  5. Fast and accurate phylogeny reconstruction using filtered spaced-word matches

    Science.gov (United States)

    Sohrabi-Jahromi, Salma; Morgenstern, Burkhard

    2017-01-01

    Abstract Motivation: Word-based or ‘alignment-free’ algorithms are increasingly used for phylogeny reconstruction and genome comparison, since they are much faster than traditional approaches that are based on full sequence alignments. Existing alignment-free programs, however, are less accurate than alignment-based methods. Results: We propose Filtered Spaced Word Matches (FSWM), a fast alignment-free approach to estimate phylogenetic distances between large genomic sequences. For a pre-defined binary pattern of match and don’t-care positions, FSWM rapidly identifies spaced word-matches between input sequences, i.e. gap-free local alignments with matching nucleotides at the match positions and with mismatches allowed at the don’t-care positions. We then estimate the number of nucleotide substitutions per site by considering the nucleotides aligned at the don’t-care positions of the identified spaced-word matches. To reduce the noise from spurious random matches, we use a filtering procedure where we discard all spaced-word matches for which the overall similarity between the aligned segments is below a threshold. We show that our approach can accurately estimate substitution frequencies even for distantly related sequences that cannot be analyzed with existing alignment-free methods; phylogenetic trees constructed with FSWM distances are of high quality. A program run on a pair of eukaryotic genomes of a few hundred Mb each takes a few minutes. Availability and Implementation: The program source code for FSWM including a documentation, as well as the software that we used to generate artificial genome sequences are freely available at http://fswm.gobics.de/ Contact: chris.leimeister@stud.uni-goettingen.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28073754

  6. A multiscale approach to mutual information matching

    NARCIS (Netherlands)

    Pluim, J.P.W.; Maintz, J.B.A.; Viergever, M.A.; Hanson, K.M.

    1998-01-01

    Methods based on mutual information have shown promising results for matching of multimodal brain images. This paper discusses a multiscale approach to mutual information matching, aiming for an acceleration of the matching process while considering the accuracy and robustness of the method. Scaling

  7. Genome-wide association analysis accounting for environmental factors through propensity-score matching: application to stressful live events in major depressive disorder.

    Science.gov (United States)

    Power, Robert A; Cohen-Woods, Sarah; Ng, Mandy Y; Butler, Amy W; Craddock, Nick; Korszun, Ania; Jones, Lisa; Jones, Ian; Gill, Michael; Rice, John P; Maier, Wolfgang; Zobel, Astrid; Mors, Ole; Placentino, Anna; Rietschel, Marcella; Aitchison, Katherine J; Tozzi, Federica; Muglia, Pierandrea; Breen, Gerome; Farmer, Anne E; McGuffin, Peter; Lewis, Cathryn M; Uher, Rudolf

    2013-09-01

    Stressful life events are an established trigger for depression and may contribute to the heterogeneity within genome-wide association analyses. With depression cases showing an excess of exposure to stressful events compared to controls, there is difficulty in distinguishing between "true" cases and a "normal" response to a stressful environment. This potential contamination of cases, and that from genetically at risk controls that have not yet experienced environmental triggers for onset, may reduce the power of studies to detect causal variants. In the RADIANT sample of 3,690 European individuals, we used propensity score matching to pair cases and controls on exposure to stressful life events. In 805 case-control pairs matched on stressful life event, we tested the influence of 457,670 common genetic variants on the propensity to depression under comparable level of adversity with a sign test. While this analysis produced no significant findings after genome-wide correction for multiple testing, we outline a novel methodology and perspective for providing environmental context in genetic studies. We recommend contextualizing depression by incorporating environmental exposure into genome-wide analyses as a complementary approach to testing gene-environment interactions. Possible explanations for negative findings include a lack of statistical power due to small sample size and conditional effects, resulting from the low rate of adequate matching. Our findings underscore the importance of collecting information on environmental risk factors in studies of depression and other complex phenotypes, so that sufficient sample sizes are available to investigate their effect in genome-wide association analysis. Copyright © 2013 Wiley Periodicals, Inc.

  8. RGmatch: matching genomic regions to proximal genes in omics data integration

    Directory of Open Access Journals (Sweden)

    Pedro Furió-Tarí

    2016-11-01

    Full Text Available Abstract Background The integrative analysis of multiple genomics data often requires that genome coordinates-based signals have to be associated with proximal genes. The relative location of a genomic region with respect to the gene (gene area is important for functional data interpretation; hence algorithms that match regions to genes should be able to deliver insight into this information. Results In this work we review the tools that are publicly available for making region-to-gene associations. We also present a novel method, RGmatch, a flexible and easy-to-use Python tool that computes associations either at the gene, transcript, or exon level, applying a set of rules to annotate each region-gene association with the region location within the gene. RGmatch can be applied to any organism as long as genome annotation is available. Furthermore, we qualitatively and quantitatively compare RGmatch to other tools. Conclusions RGmatch simplifies the association of a genomic region with its closest gene. At the same time, it is a powerful tool because the rules used to annotate these associations are very easy to modify according to the researcher’s specific interests. Some important differences between RGmatch and other similar tools already in existence are RGmatch’s flexibility, its wide range of user options, compatibility with any annotatable organism, and its comprehensive and user-friendly output.

  9. 3-D FEATURE-BASED MATCHING BY RSTG APPROACH

    Directory of Open Access Journals (Sweden)

    J.-J. Jaw

    2012-07-01

    Full Text Available 3-D feature matching is the essential kernel in a fully automated feature-based LiDAR point cloud registration. After feasible procedures of feature acquisition, connecting corresponding features in different data frames is imperative to be solved. The objective addressed in this paper is developing an approach coined RSTG to retrieve corresponding counterparts of unsorted multiple 3-D features extracted from sets of LiDAR point clouds. RSTG stands for the four major processes, "Rotation alignment"; "Scale estimation"; "Translation alignment" and "Geometric check," strategically formulated towards finding out matching solution with high efficiency and leading to accomplishing the 3-D similarity transformation among all sets. The workable types of features to RSTG comprise points, lines, planes and clustered point groups. Each type of features can be employed exclusively or combined with others, if sufficiently supplied, throughout the matching scheme. The paper gives a detailed description of the matching methodology and discusses on the matching effects based on the statistical assessment which revealed that the RSTG approach reached an average matching rate of success up to 93% with around 6.6% of statistical type 1 error. Notably, statistical type 2 error, the critical indicator of matching reliability, was kept 0% throughout all the experiments.

  10. Patient-controlled encrypted genomic data: an approach to advance clinical genomics

    Directory of Open Access Journals (Sweden)

    Trakadis Yannis J

    2012-07-01

    Full Text Available Abstract Background The revolution in DNA sequencing technologies over the past decade has made it feasible to sequence an individual’s whole genome at a relatively low cost. The potential value of the information generated by genomic technologies for medicine and society is enormous. However, in order for exome sequencing, and eventually whole genome sequencing, to be implemented clinically, a number of major challenges need to be overcome. For instance, obtaining meaningful informed-consent, managing incidental findings and the great volume of data generated (including multiple findings with uncertain clinical significance, re-interpreting the genomic data and providing additional counselling to patients as genetic knowledge evolves are issues that need to be addressed. It appears that medical genetics is shifting from the present “phenotype-first” medical model to a “data-first” model which leads to multiple complexities. Discussion This manuscript discusses the different challenges associated with integrating genomic technologies into clinical practice and describes a “phenotype-first” approach, namely, “Individualized Mutation-weighed Phenotype Search”, and its benefits. The proposed approach allows for a more efficient prioritization of the genes to be tested in a clinical lab based on both the patient’s phenotype and his/her entire genomic data. It simplifies “informed-consent” for clinical use of genomic technologies and helps to protect the patient’s autonomy and privacy. Overall, this approach could potentially render widespread use of genomic technologies, in the immediate future, practical, ethical and clinically useful. Summary The “Individualized Mutation-weighed Phenotype Search” approach allows for an incremental integration of genomic technologies into clinical practice. It ensures that we do not over-medicalize genomic data but, rather, continue our current medical model which is based on serving

  11. Applying Agrep to r-NSA to solve multiple sequences approximate matching.

    Science.gov (United States)

    Ni, Bing; Wong, Man-Hon; Lam, Chi-Fai David; Leung, Kwong-Sak

    2014-01-01

    This paper addresses the approximate matching problem in a database consisting of multiple DNA sequences, where the proposed approach applies Agrep to a new truncated suffix array, r-NSA. The construction time of the structure is linear to the database size, and the computations of indexing a substring in the structure are constant. The number of characters processed in applying Agrep is analysed theoretically, and the theoretical upper-bound can approximate closely the empirical number of characters, which is obtained through enumerating the characters in the actual structure built. Experiments are carried out using (synthetic) random DNA sequences, as well as (real) genome sequences including Hepatitis-B Virus and X-chromosome. Experimental results show that, compared to the straight-forward approach that applies Agrep to multiple sequences individually, the proposed approach solves the matching problem in much shorter time. The speed-up of our approach depends on the sequence patterns, and for highly similar homologous genome sequences, which are the common cases in real-life genomes, it can be up to several orders of magnitude.

  12. Reduced representation approaches to interrogate genome diversity in large repetitive plant genomes.

    Science.gov (United States)

    Hirsch, Cory D; Evans, Joseph; Buell, C Robin; Hirsch, Candice N

    2014-07-01

    Technology and software improvements in the last decade now provide methodologies to access the genome sequence of not only a single accession, but also multiple accessions of plant species. This provides a means to interrogate species diversity at the genome level. Ample diversity among accessions in a collection of species can be found, including single-nucleotide polymorphisms, insertions and deletions, copy number variation and presence/absence variation. For species with small, non-repetitive rich genomes, re-sequencing of query accessions is robust, highly informative, and economically feasible. However, for species with moderate to large sized repetitive-rich genomes, technical and economic barriers prevent en masse genome re-sequencing of accessions. Multiple approaches to access a focused subset of loci in species with larger genomes have been developed, including reduced representation sequencing, exome capture and transcriptome sequencing. Collectively, these approaches have enabled interrogation of diversity on a genome scale for large plant genomes, including crop species important to worldwide food security. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  13. Local Search Approaches in Stable Matching Problems

    Directory of Open Access Journals (Sweden)

    Toby Walsh

    2013-10-01

    Full Text Available The stable marriage (SM problem has a wide variety of practical applications, ranging from matching resident doctors to hospitals, to matching students to schools or, more generally, to any two-sided market. In the classical formulation, n men and n women express their preferences (via a strict total order over the members of the other sex. Solving an SM problem means finding a stable marriage where stability is an envy-free notion: no man and woman who are not married to each other would both prefer each other to their partners or to being single. We consider both the classical stable marriage problem and one of its useful variations (denoted SMTI (Stable Marriage with Ties and Incomplete lists where the men and women express their preferences in the form of an incomplete preference list with ties over a subset of the members of the other sex. Matchings are permitted only with people who appear in these preference lists, and we try to find a stable matching that marries as many people as possible. Whilst the SM problem is polynomial to solve, the SMTI problem is NP-hard. We propose to tackle both problems via a local search approach, which exploits properties of the problems to reduce the size of the neighborhood and to make local moves efficiently. We empirically evaluate our algorithm for SM problems by measuring its runtime behavior and its ability to sample the lattice of all possible stable marriages. We evaluate our algorithm for SMTI problems in terms of both its runtime behavior and its ability to find a maximum cardinality stable marriage. Experimental results suggest that for SM problems, the number of steps of our algorithm grows only as O(n log(n, and that it samples very well the set of all stable marriages. It is thus a fair and efficient approach to generate stable marriages. Furthermore, our approach for SMTI problems is able to solve large problems, quickly returning stable matchings of large and often optimal size, despite the

  14. Genome-Wide Approaches to Drosophila Heart Development

    Directory of Open Access Journals (Sweden)

    Manfred Frasch

    2016-05-01

    Full Text Available The development of the dorsal vessel in Drosophila is one of the first systems in which key mechanisms regulating cardiogenesis have been defined in great detail at the genetic and molecular level. Due to evolutionary conservation, these findings have also provided major inputs into studies of cardiogenesis in vertebrates. Many of the major components that control Drosophila cardiogenesis were discovered based on candidate gene approaches and their functions were defined by employing the outstanding genetic tools and molecular techniques available in this system. More recently, approaches have been taken that aim to interrogate the entire genome in order to identify novel components and describe genomic features that are pertinent to the regulation of heart development. Apart from classical forward genetic screens, the availability of the thoroughly annotated Drosophila genome sequence made new genome-wide approaches possible, which include the generation of massive numbers of RNA interference (RNAi reagents that were used in forward genetic screens, as well as studies of the transcriptomes and proteomes of the developing heart under normal and experimentally manipulated conditions. Moreover, genome-wide chromatin immunoprecipitation experiments have been performed with the aim to define the full set of genomic binding sites of the major cardiogenic transcription factors, their relevant target genes, and a more complete picture of the regulatory network that drives cardiogenesis. This review will give an overview on these genome-wide approaches to Drosophila heart development and on computational analyses of the obtained information that ultimately aim to provide a description of this process at the systems level.

  15. Genome Editing: A New Approach to Human Therapeutics.

    Science.gov (United States)

    Porteus, Matthew

    2016-01-01

    The ability to manipulate the genome with precise spatial and nucleotide resolution (genome editing) has been a powerful research tool. In the past decade, the tools and expertise for using genome editing in human somatic cells and pluripotent cells have increased to such an extent that the approach is now being developed widely as a strategy to treat human disease. The fundamental process depends on creating a site-specific DNA double-strand break (DSB) in the genome and then allowing the cell's endogenous DSB repair machinery to fix the break such that precise nucleotide changes are made to the DNA sequence. With the development and discovery of several different nuclease platforms and increasing knowledge of the parameters affecting different genome editing outcomes, genome editing frequencies now reach therapeutic relevance for a wide variety of diseases. Moreover, there is a series of complementary approaches to assessing the safety and toxicity of any genome editing process, irrespective of the underlying nuclease used. Finally, the development of genome editing has raised the issue of whether it should be used to engineer the human germline. Although such an approach could clearly prevent the birth of people with devastating and destructive genetic diseases, questions remain about whether human society is morally responsible enough to use this tool.

  16. Functional Associations by Response Overlap (FARO, a functional genomics approach matching gene expression phenotypes.

    Directory of Open Access Journals (Sweden)

    Henrik Bjørn Nielsen

    2007-08-01

    Full Text Available The systematic comparison of transcriptional responses of organisms is a powerful tool in functional genomics. For example, mutants may be characterized by comparing their transcript profiles to those obtained in other experiments querying the effects on gene expression of many experimental factors including treatments, mutations and pathogen infections. Similarly, drugs may be discovered by the relationship between the transcript profiles effectuated or impacted by a candidate drug and by the target disease. The integration of such data enables systems biology to predict the interplay between experimental factors affecting a biological system. Unfortunately, direct comparisons of gene expression profiles obtained in independent, publicly available microarray experiments are typically compromised by substantial, experiment-specific biases. Here we suggest a novel yet conceptually simple approach for deriving 'Functional Association(s by Response Overlap' (FARO between microarray gene expression studies. The transcriptional response is defined by the set of differentially expressed genes independent from the magnitude or direction of the change. This approach overcomes the limited comparability between studies that is typical for methods that rely on correlation in gene expression. We apply FARO to a compendium of 242 diverse Arabidopsis microarray experimental factors, including phyto-hormones, stresses and pathogens, growth conditions/stages, tissue types and mutants. We also use FARO to confirm and further delineate the functions of Arabidopsis MAP kinase 4 in disease and stress responses. Furthermore, we find that a large, well-defined set of genes responds in opposing directions to different stress conditions and predict the effects of different stress combinations. This demonstrates the usefulness of our approach for exploiting public microarray data to derive biologically meaningful associations between experimental factors. Finally, our

  17. Gene-trait matching across the Bifidobacterium longum pan-genome reveals considerable diversity in carbohydrate catabolism among human infant strains.

    LENUS (Irish Health Repository)

    Arboleya, Silvia

    2018-01-08

    Bifidobacterium longum is a common member of the human gut microbiota and is frequently present at high numbers in the gut microbiota of humans throughout life, thus indicative of a close symbiotic host-microbe relationship. Different mechanisms may be responsible for the high competitiveness of this taxon in its human host to allow stable establishment in the complex and dynamic intestinal microbiota environment. The objective of this study was to assess the genetic and metabolic diversity in a set of 20 B. longum strains, most of which had previously been isolated from infants, by performing whole genome sequencing and comparative analysis, and to analyse their carbohydrate utilization abilities using a gene-trait matching approach.

  18. Microbial genome analysis: the COG approach.

    Science.gov (United States)

    Galperin, Michael Y; Kristensen, David M; Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V

    2017-09-14

    For the past 20 years, the Clusters of Orthologous Genes (COG) database had been a popular tool for microbial genome annotation and comparative genomics. Initially created for the purpose of evolutionary classification of protein families, the COG have been used, apart from straightforward functional annotation of sequenced genomes, for such tasks as (i) unification of genome annotation in groups of related organisms; (ii) identification of missing and/or undetected genes in complete microbial genomes; (iii) analysis of genomic neighborhoods, in many cases allowing prediction of novel functional systems; (iv) analysis of metabolic pathways and prediction of alternative forms of enzymes; (v) comparison of organisms by COG functional categories; and (vi) prioritization of targets for structural and functional characterization. Here we review the principles of the COG approach and discuss its key advantages and drawbacks in microbial genome analysis. Published by Oxford University Press 2017. This work is written by US Government employees and is in the public domain in the US.

  19. Cross-species genomics matches driver mutations and cell compartments to model ependymoma

    Science.gov (United States)

    Johnson, Robert A.; Wright, Karen D.; Poppleton, Helen; Mohankumar, Kumarasamypet M.; Finkelstein, David; Pounds, Stanley B.; Rand, Vikki; Leary, Sarah E.S.; White, Elsie; Eden, Christopher; Hogg, Twala; Northcott, Paul; Mack, Stephen; Neale, Geoffrey; Wang, Yong-Dong; Coyle, Beth; Atkinson, Jennifer; DeWire, Mariko; Kranenburg, Tanya A.; Gillespie, Yancey; Allen, Jeffrey C.; Merchant, Thomas; Boop, Fredrick A.; Sanford, Robert. A.; Gajjar, Amar; Ellison, David W.; Taylor, Michael D.; Grundy, Richard G.; Gilbertson, Richard J.

    2010-01-01

    Understanding the biology that underlies histologically similar but molecularly distinct subgroups of cancer has proven difficult since their defining genetic alterations are often numerous, and the cellular origins of most cancers remain unknown1–3. We sought to decipher this heterogeneity by integrating matched genetic alterations and candidate cells of origin to generate accurate disease models. First, we identified subgroups of human ependymoma, a form of neural tumor that arises throughout the central nervous system (CNS). Subgroup specific alterations included amplifications and homozygous deletions of genes not yet implicated in ependymoma. To select cellular compartments most likely to give rise to subgroups of ependymoma, we matched the transcriptomes of human tumors to those of mouse neural stem cells (NSCs), isolated from different regions of the CNS at different developmental stages, with an intact or deleted Ink4a/Arf locus. The transcriptome of human cerebral ependymomas with amplified EPHB2 and deleted INK4A/ARF matched only that of embryonic cerebral Ink4a/Arf−/− NSCs. Remarkably, activation of Ephb2 signaling in these, but not other NSCs, generated the first mouse model of ependymoma, which is highly penetrant and accurately models the histology and transcriptome of one subgroup of human cerebral tumor. Further comparative analysis of matched mouse and human tumors revealed selective deregulation in the expression and copy number of genes that control synaptogenesis, pinpointing disruption of this pathway as a critical event in the production of this ependymoma subgroup. Our data demonstrate the power of cross-species genomics to meticulously match subgroup specific driver mutations with cellular compartments to model and interrogate cancer subgroups. PMID:20639864

  20. An efficient approach to BAC based assembly of complex genomes.

    Science.gov (United States)

    Visendi, Paul; Berkman, Paul J; Hayashi, Satomi; Golicz, Agnieszka A; Bayer, Philipp E; Ruperao, Pradeep; Hurgobin, Bhavna; Montenegro, Juan; Chan, Chon-Kit Kenneth; Staňková, Helena; Batley, Jacqueline; Šimková, Hana; Doležel, Jaroslav; Edwards, David

    2016-01-01

    There has been an exponential growth in the number of genome sequencing projects since the introduction of next generation DNA sequencing technologies. Genome projects have increasingly involved assembly of whole genome data which produces inferior assemblies compared to traditional Sanger sequencing of genomic fragments cloned into bacterial artificial chromosomes (BACs). While whole genome shotgun sequencing using next generation sequencing (NGS) is relatively fast and inexpensive, this method is extremely challenging for highly complex genomes, where polyploidy or high repeat content confounds accurate assembly, or where a highly accurate 'gold' reference is required. Several attempts have been made to improve genome sequencing approaches by incorporating NGS methods, to variable success. We present the application of a novel BAC sequencing approach which combines indexed pools of BACs, Illumina paired read sequencing, a sequence assembler specifically designed for complex BAC assembly, and a custom bioinformatics pipeline. We demonstrate this method by sequencing and assembling BAC cloned fragments from bread wheat and sugarcane genomes. We demonstrate that our assembly approach is accurate, robust, cost effective and scalable, with applications for complete genome sequencing in large and complex genomes.

  1. Omics and Environmental Science Genomic Approaches With Natural Fish Populations From Polluted Environments

    Science.gov (United States)

    Bozinovic, Goran; Oleksiak, Marjorie F.

    2010-01-01

    Transcriptomics and population genomics are two complementary genomic approaches that can be used to gain insight into pollutant effects in natural populations. Transcriptomics identify altered gene expression pathways while population genomics approaches more directly target the causative genomic polymorphisms. Neither approach is restricted to a pre-determined set of genes or loci. Instead, both approaches allow a broad overview of genomic processes. Transcriptomics and population genomic approaches have been used to explore genomic responses in populations of fish from polluted environments and have identified sets of candidate genes and loci that appear biologically important in response to pollution. Often differences in gene expression or loci between polluted and reference populations are not conserved among polluted populations suggesting a biological complexity that we do not yet fully understand. As genomic approaches become less expensive with the advent of new sequencing and genotyping technologies, they will be more widely used in complimentary studies. However, while these genomic approaches are immensely powerful for identifying candidate gene and loci, the challenge of determining biological mechanisms that link genotypes and phenotypes remains. PMID:21072843

  2. PATTERN CLASSIFICATION APPROACHES TO MATCHING BUILDING POLYGONS AT MULTIPLE SCALES

    Directory of Open Access Journals (Sweden)

    X. Zhang

    2012-07-01

    Full Text Available Matching of building polygons with different levels of detail is crucial in the maintenance and quality assessment of multi-representation databases. Two general problems need to be addressed in the matching process: (1 Which criteria are suitable? (2 How to effectively combine different criteria to make decisions? This paper mainly focuses on the second issue and views data matching as a supervised pattern classification. Several classifiers (i.e. decision trees, Naive Bayes and support vector machines are evaluated for the matching task. Four criteria (i.e. position, size, shape and orientation are used to extract information for these classifiers. Evidence shows that these classifiers outperformed the weighted average approach.

  3. An evaluation of the genetic-matched pair study design using genome-wide SNP data from the European population

    DEFF Research Database (Denmark)

    Lu, Timothy Tehua; Lao, Oscar; Nothnagel, Michael

    2009-01-01

    of cases (76.0%), the BOM of a given individual, based on the complete marker set, came from a different recruitment site than the individual itself. A second marker set, specifically selected for ancestry sensitivity using singular value decomposition, performed even more poorly and was no more capable......Genetic matching potentially provides a means to alleviate the effects of incomplete Mendelian randomization in population-based gene-disease association studies. We therefore evaluated the genetic-matched pair study design on the basis of genome-wide SNP data (309,790 markers; Affymetrix Gene......Chip Human Mapping 500K Array) from 2457 individuals, sampled at 23 different recruitment sites across Europe. Using pair-wise identity-by-state (IBS) as a matching criterion, we tried to derive a subset of markers that would allow identification of the best overall matching (BOM) partner for a given...

  4. Functional Associations by Response Overlap (FARO), a functional genomics approach matching gene expression phenotypes

    DEFF Research Database (Denmark)

    Nielsen, Henrik Bjørn; Mundy, J.; Willenbrock, Hanni

    2007-01-01

    The systematic comparison of transcriptional responses of organisms is a powerful tool in functional genomics. For example, mutants may be characterized by comparing their transcript profiles to those obtained in other experiments querying the effects on gene expression of many experimental facto...

  5. Evolving approaches to the ethical management of genomic data.

    Science.gov (United States)

    McEwen, Jean E; Boyer, Joy T; Sun, Kathie Y

    2013-06-01

    The ethical landscape in the field of genomics is rapidly shifting. Plummeting sequencing costs, along with ongoing advances in bioinformatics, now make it possible to generate an enormous volume of genomic data about vast numbers of people. The informational richness, complexity, and frequently uncertain meaning of these data, coupled with evolving norms surrounding the sharing of data and samples and persistent privacy concerns, have generated a range of approaches to the ethical management of genomic information. As calls increase for the expanded use of broad or even open consent, and as controversy grows about how best to handle incidental genomic findings, these approaches, informed by normative analysis and empirical data, will continue to evolve alongside the science. Published by Elsevier Ltd.

  6. De-anonymizing Genomic Databases Using Phenotypic Traits

    Directory of Open Access Journals (Sweden)

    Humbert Mathias

    2015-06-01

    Full Text Available People increasingly have their genomes sequenced and some of them share their genomic data online. They do so for various purposes, including to find relatives and to help advance genomic research. An individual’s genome carries very sensitive, private information such as its owner’s susceptibility to diseases, which could be used for discrimination. Therefore, genomic databases are often anonymized. However, an individual’s genotype is also linked to visible phenotypic traits, such as eye or hair color, which can be used to re-identify users in anonymized public genomic databases, thus raising severe privacy issues. For instance, an adversary can identify a target’s genome using known her phenotypic traits and subsequently infer her susceptibility to Alzheimer’s disease. In this paper, we quantify, based on various phenotypic traits, the extent of this threat in several scenarios by implementing de-anonymization attacks on a genomic database of OpenSNP users sequenced by 23andMe. Our experimental results show that the proportion of correct matches reaches 23% with a supervised approach in a database of 50 participants. Our approach outperforms the baseline by a factor of four, in terms of the proportion of correct matches, in most scenarios. We also evaluate the adversary’s ability to predict individuals’ predisposition to Alzheimer’s disease, and we observe that the inference error can be halved compared to the baseline. We also analyze the effect of the number of known phenotypic traits on the success rate of the attack. As progress is made in genomic research, especially for genotype-phenotype associations, the threat presented in this paper will become more serious.

  7. AN AERIAL-IMAGE DENSE MATCHING APPROACH BASED ON OPTICAL FLOW FIELD

    Directory of Open Access Journals (Sweden)

    W. Yuan

    2016-06-01

    Full Text Available Dense matching plays an important role in many fields, such as DEM (digital evaluation model producing, robot navigation and 3D environment reconstruction. Traditional approaches may meet the demand of accuracy. But the calculation time and out puts density is hardly be accepted. Focus on the matching efficiency and complex terrain surface matching feasibility an aerial image dense matching method based on optical flow field is proposed in this paper. First, some high accurate and uniformed control points are extracted by using the feature based matching method. Then the optical flow is calculated by using these control points, so as to determine the similar region between two images. Second, the optical flow field is interpolated by using the multi-level B-spline interpolation in the similar region and accomplished the pixel by pixel coarse matching. Final, the results related to the coarse matching refinement based on the combined constraint, which recognizes the same points between images. The experimental results have shown that our method can achieve per-pixel dense matching points, the matching accuracy achieves sub-pixel level, and fully meet the three-dimensional reconstruction and automatic generation of DSM-intensive matching’s requirements. The comparison experiments demonstrated that our approach’s matching efficiency is higher than semi-global matching (SGM and Patch-based multi-view stereo matching (PMVS which verifies the feasibility and effectiveness of the algorithm.

  8. Genomic Approaches in Marine Biodiversity and Aquaculture

    Directory of Open Access Journals (Sweden)

    Jorge A Huete-Pérez

    2013-01-01

    Full Text Available Recent advances in genomic and post-genomic technologies have now established the new standard in medical and biotechnological research. The introduction of next-generation sequencing, NGS,has resulted in the generation of thousands of genomes from all domains of life, including the genomes of complex uncultured microbial communities revealed through metagenomics. Although the application of genomics to marine biodiversity remains poorly developed overall, some noteworthy progress has been made in recent years. The genomes of various model marine organisms have been published and a few more are underway. In addition, the recent large-scale analysis of marine microbes, along with transcriptomic and proteomic approaches to the study of teleost fishes, mollusks and crustaceans, to mention a few, has provided a better understanding of phenotypic variability and functional genomics. The past few years have also seen advances in applications relevant to marine aquaculture and fisheries. In this review we introduce several examples of recent discoveries and progress made towards engendering genomic resources aimed at enhancing our understanding of marine biodiversity and promoting the development of aquaculture. Finally, we discuss the need for auspicious science policies to address challenges confronting smaller nations in the appropriate oversight of this growing domain as they strive to guarantee food security and conservation of their natural resources.

  9. mEBT: multiple-matching Evidence-based Translator of Murine Genomic Responses for Human Immunity Studies.

    Science.gov (United States)

    Tae, Donghyun; Seok, Junhee

    2018-05-29

    In this paper, we introduce multiple-matching Evidence-based Translator (mEBT) to discover genomic responses from murine expression data for human immune studies, which are significant in the given condition of mice and likely have similar responses in the corresponding condition of human. mEBT is evaluated over multiple data sets and shows improved inter-species agreement. mEBT is expected to be useful for research groups who use murine models to study human immunity. http://cdal.korea.ac.kr/mebt/. jseok14@korea.ac.kr. Supplementary data are available at Bioinformatics online.

  10. Approaches for in silico finishing of microbial genome sequences

    Directory of Open Access Journals (Sweden)

    Frederico Schmitt Kremer

    Full Text Available Abstract The introduction of next-generation sequencing (NGS had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as “drafts”, incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases tools that are available to facilitate genome finishing.

  11. Approaches for in silico finishing of microbial genome sequences.

    Science.gov (United States)

    Kremer, Frederico Schmitt; McBride, Alan John Alexander; Pinto, Luciano da Silva

    The introduction of next-generation sequencing (NGS) had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as "drafts", incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases) tools that are available to facilitate genome finishing.

  12. A matching approach to communicate through the plasma sheath surrounding a hypersonic vehicle

    International Nuclear Information System (INIS)

    Gao, Xiaotian; Jiang, Binhao

    2015-01-01

    In order to overcome the communication blackout problem suffered by hypersonic vehicles, a matching approach has been proposed for the first time in this paper. It utilizes a double-positive (DPS) material layer surrounding a hypersonic vehicle antenna to match with the plasma sheath enclosing the vehicle. Analytical analysis and numerical results indicate a resonance between the matched layer and the plasma sheath will be formed to mitigate the blackout problem in some conditions. The calculated results present a perfect radiated performance of the antenna, when the match is exactly built between these two layers. The effects of the parameters of the plasma sheath have been researched by numerical methods. Based on these results, the proposed approach is easier to realize and more flexible to the varying radiated conditions in hypersonic flight comparing with other methods

  13. Genome-Wide Association Studies In Plant Pathosystems: Toward an Ecological Genomics Approach

    Directory of Open Access Journals (Sweden)

    Claudia Bartoli

    2017-05-01

    Full Text Available The emergence and re-emergence of plant pathogenic microorganisms are processes that imply perturbations in both host and pathogen ecological niches. Global change is largely assumed to drive the emergence of new etiological agents by altering the equilibrium of the ecological habitats which in turn places hosts more in contact with pathogen reservoirs. In this context, the number of epidemics is expected to increase dramatically in the next coming decades both in wild and crop plants. Under these considerations, the identification of the genetic variants underlying natural variation of resistance is a pre-requisite to estimate the adaptive potential of wild plant populations and to develop new breeding resistant cultivars. On the other hand, the prediction of pathogen's genetic determinants underlying disease emergence can help to identify plant resistance alleles. In the genomic era, whole genome sequencing combined with the development of statistical methods led to the emergence of Genome Wide Association (GWA mapping, a powerful tool for detecting genomic regions associated with natural variation of disease resistance in both wild and cultivated plants. However, GWA mapping has been less employed for the detection of genetic variants associated with pathogenicity in microbes. Here, we reviewed GWA studies performed either in plants or in pathogenic microorganisms (bacteria, fungi and oomycetes. In addition, we highlighted the benefits and caveats of the emerging joint GWA mapping approach that allows for the simultaneous identification of genes interacting between genomes of both partners. Finally, based on co-evolutionary processes in wild populations, we highlighted a phenotyping-free joint GWA mapping approach as a promising tool for describing the molecular landscape underlying plant - microbe interactions.

  14. VERSE: a novel approach to detect virus integration in host genomes through reference genome customization.

    Science.gov (United States)

    Wang, Qingguo; Jia, Peilin; Zhao, Zhongming

    2015-01-01

    Fueled by widespread applications of high-throughput next generation sequencing (NGS) technologies and urgent need to counter threats of pathogenic viruses, large-scale studies were conducted recently to investigate virus integration in host genomes (for example, human tumor genomes) that may cause carcinogenesis or other diseases. A limiting factor in these studies, however, is rapid virus evolution and resulting polymorphisms, which prevent reads from aligning readily to commonly used virus reference genomes, and, accordingly, make virus integration sites difficult to detect. Another confounding factor is host genomic instability as a result of virus insertions. To tackle these challenges and improve our capability to identify cryptic virus-host fusions, we present a new approach that detects Virus intEgration sites through iterative Reference SEquence customization (VERSE). To the best of our knowledge, VERSE is the first approach to improve detection through customizing reference genomes. Using 19 human tumors and cancer cell lines as test data, we demonstrated that VERSE substantially enhanced the sensitivity of virus integration site detection. VERSE is implemented in the open source package VirusFinder 2 that is available at http://bioinfo.mc.vanderbilt.edu/VirusFinder/.

  15. Genotyping-by-sequencing for Populus population genomics: an assessment of genome sampling patterns and filtering approaches.

    Directory of Open Access Journals (Sweden)

    Martin P Schilling

    Full Text Available Continuing advances in nucleotide sequencing technology are inspiring a suite of genomic approaches in studies of natural populations. Researchers are faced with data management and analytical scales that are increasing by orders of magnitude. With such dramatic advances comes a need to understand biases and error rates, which can be propagated and magnified in large-scale data acquisition and processing. Here we assess genomic sampling biases and the effects of various population-level data filtering strategies in a genotyping-by-sequencing (GBS protocol. We focus on data from two species of Populus, because this genus has a relatively small genome and is emerging as a target for population genomic studies. We estimate the proportions and patterns of genomic sampling by examining the Populus trichocarpa genome (Nisqually-1, and demonstrate a pronounced bias towards coding regions when using the methylation-sensitive ApeKI restriction enzyme in this species. Using population-level data from a closely related species (P. tremuloides, we also investigate various approaches for filtering GBS data to retain high-depth, informative SNPs that can be used for population genetic analyses. We find a data filter that includes the designation of ambiguous alleles resulted in metrics of population structure and Hardy-Weinberg equilibrium that were most consistent with previous studies of the same populations based on other genetic markers. Analyses of the filtered data (27,910 SNPs also resulted in patterns of heterozygosity and population structure similar to a previous study using microsatellites. Our application demonstrates that technically and analytically simple approaches can readily be developed for population genomics of natural populations.

  16. mpscan: Fast Localisation of Multiple Reads in Genomes

    Science.gov (United States)

    Rivals, Eric; Salmela, Leena; Kiiskinen, Petteri; Kalsi, Petri; Tarhio, Jorma

    With Next Generation Sequencers, sequence based transcriptomic or epigenomic assays yield millions of short sequence reads that need to be mapped back on a reference genome. The upcoming versions of these sequencers promise even higher sequencing capacities; this may turn the read mapping task into a bottleneck for which alternative pattern matching approaches must be experimented. We present an algorithm and its implementation, called mpscan, which uses a sophisticated filtration scheme to match a set of patterns/reads exactly on a sequence. mpscan can search for millions of reads in a single pass through the genome without indexing its sequence. Moreover, we show that mpscan offers an optimal average time complexity, which is sublinear in the text length, meaning that it does not need to examine all sequence positions. Comparisons with BLAT-like tools and with six specialised read mapping programs (like bowtie or zoom) demonstrate that mpscan also is the fastest algorithm in practice for exact matching. Our accuracy and scalability comparisons reveal that some tools are inappropriate for read mapping. Moreover, we provide evidence suggesting that exact matching may be a valuable solution in some read mapping applications. As most read mapping programs somehow rely on exact matching procedures to perform approximate pattern mapping, the filtration scheme we experimented may reveal useful in the design of future algorithms. The absence of genome index gives mpscan its low memory requirement and flexibility that let it run on a desktop computer and avoids a time-consuming genome preprocessing.

  17. Genomics approaches in the understanding of Entamoeba ...

    African Journals Online (AJOL)

    STORAGESEVER

    2009-04-20

    Apr 20, 2009 ... Here, we reviewed recent advances in the efforts to understand ... expression regulation in E. histolytica by using genomic approaches based on microarray technology ... tic abscesses that result in approximately 70,000 -.

  18. A hybrid clustering approach to recognition of protein families in 114 microbial genomes

    Directory of Open Access Journals (Sweden)

    Gogarten J Peter

    2004-04-01

    Full Text Available Abstract Background Grouping proteins into sequence-based clusters is a fundamental step in many bioinformatic analyses (e.g., homology-based prediction of structure or function. Standard clustering methods such as single-linkage clustering capture a history of cluster topologies as a function of threshold, but in practice their usefulness is limited because unrelated sequences join clusters before biologically meaningful families are fully constituted, e.g. as the result of matches to so-called promiscuous domains. Use of the Markov Cluster algorithm avoids this non-specificity, but does not preserve topological or threshold information about protein families. Results We describe a hybrid approach to sequence-based clustering of proteins that combines the advantages of standard and Markov clustering. We have implemented this hybrid approach over a relational database environment, and describe its application to clustering a large subset of PDB, and to 328577 proteins from 114 fully sequenced microbial genomes. To demonstrate utility with difficult problems, we show that hybrid clustering allows us to constitute the paralogous family of ATP synthase F1 rotary motor subunits into a single, biologically interpretable hierarchical grouping that was not accessible using either single-linkage or Markov clustering alone. We describe validation of this method by hybrid clustering of PDB and mapping SCOP families and domains onto the resulting clusters. Conclusion Hybrid (Markov followed by single-linkage clustering combines the advantages of the Markov Cluster algorithm (avoidance of non-specific clusters resulting from matches to promiscuous domains and single-linkage clustering (preservation of topological information as a function of threshold. Within the individual Markov clusters, single-linkage clustering is a more-precise instrument, discerning sub-clusters of biological relevance. Our hybrid approach thus provides a computationally efficient

  19. Genomic and Functional Approaches to Understanding Cancer Aneuploidy.

    Science.gov (United States)

    Taylor, Alison M; Shih, Juliann; Ha, Gavin; Gao, Galen F; Zhang, Xiaoyang; Berger, Ashton C; Schumacher, Steven E; Wang, Chen; Hu, Hai; Liu, Jianfang; Lazar, Alexander J; Cherniack, Andrew D; Beroukhim, Rameen; Meyerson, Matthew

    2018-04-09

    Aneuploidy, whole chromosome or chromosome arm imbalance, is a near-universal characteristic of human cancers. In 10,522 cancer genomes from The Cancer Genome Atlas, aneuploidy was correlated with TP53 mutation, somatic mutation rate, and expression of proliferation genes. Aneuploidy was anti-correlated with expression of immune signaling genes, due to decreased leukocyte infiltrates in high-aneuploidy samples. Chromosome arm-level alterations show cancer-specific patterns, including loss of chromosome arm 3p in squamous cancers. We applied genome engineering to delete 3p in lung cells, causing decreased proliferation rescued in part by chromosome 3 duplication. This study defines genomic and phenotypic correlates of cancer aneuploidy and provides an experimental approach to study chromosome arm aneuploidy. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  20. An Assessment of Different Genomic Approaches for Inferring Phylogeny of Listeria monocytogenes

    Directory of Open Access Journals (Sweden)

    Clémentine Henri

    2017-11-01

    Full Text Available Background/objectives: Whole genome sequencing (WGS has proven to be a powerful subtyping tool for foodborne pathogenic bacteria like L. monocytogenes. The interests of genome-scale analysis for national surveillance, outbreak detection or source tracking has been largely documented. The genomic data however can be exploited with many different bioinformatics methods like single nucleotide polymorphism (SNP, core-genome multi locus sequence typing (cgMLST, whole-genome multi locus sequence typing (wgMLST or multi locus predicted protein sequence typing (MLPPST on either core-genome (cgMLPPST or pan-genome (wgMLPPST. Currently, there are little comparisons studies of these different analytical approaches. Our objective was to assess and compare different genomic methods that can be implemented in order to cluster isolates of L. monocytogenes.Methods: The clustering methods were evaluated on a collection of 207 L. monocytogenes genomes of food origin representative of the genetic diversity of the Anses collection. The trees were then compared using robust statistical analyses.Results: The backward comparability between conventional typing methods and genomic methods revealed a near-perfect concordance. The importance of selecting a proper reference when calling SNPs was highlighted, although distances between strains remained identical. The analysis also revealed that the topology of the phylogenetic trees between wgMLST and cgMLST were remarkably similar. The comparison between SNP and cgMLST or SNP and wgMLST approaches showed that the topologies of phylogenic trees were statistically similar with an almost equivalent clustering.Conclusion: Our study revealed high concordance between wgMLST, cgMLST, and SNP approaches which are all suitable for typing of L. monocytogenes. The comparable clustering is an important observation considering that the two approaches have been variously implemented among reference laboratories.

  1. Matching theory

    CERN Document Server

    Plummer, MD

    1986-01-01

    This study of matching theory deals with bipartite matching, network flows, and presents fundamental results for the non-bipartite case. It goes on to study elementary bipartite graphs and elementary graphs in general. Further discussed are 2-matchings, general matching problems as linear programs, the Edmonds Matching Algorithm (and other algorithmic approaches), f-factors and vertex packing.

  2. Genomic profiling of pediatric acute myeloid leukemia reveals a changing mutational landscape from disease diagnosis to relapse | Office of Cancer Genomics

    Science.gov (United States)

    The genomic and clinical information used to develop and implement therapeutic approaches for AML originated primarily from adult patients and has been generalized to patients with pediatric AML. However, age-specific molecular alterations are becoming more evident and may signify the need to age-stratify treatment regimens. The NCI/COG TARGET-AML initiative employed whole exome capture sequencing (WXS) to interrogate the genomic landscape of matched trios representing specimens collected upon diagnosis, remission, and relapse from 20 cases of de novo childhood AML.

  3. CRISPR/Cas9: A Practical Approach in Date Palm Genome Editing

    Directory of Open Access Journals (Sweden)

    Muhammad N. Sattar

    2017-08-01

    Full Text Available The genetic modifications through breeding of crop plants have long been used to improve the yield and quality. However, precise genome editing (GE could be a very useful supplementary tool for improvement of crop plants by targeted genome modifications. Various GE techniques including ZFNs (zinc finger nucleases, TALENs (transcription activator-like effector nucleases, and most recently clustered regularly interspaced short palindromic repeats (CRISPR/Cas9 (CRISPR-associated protein 9-based approaches have been successfully employed for various crop plants including fruit trees. CRISPR/Cas9-based approaches hold great potential in GE due to their simplicity, competency, and versatility over other GE techniques. However, to the best of our knowledge no such genetic improvement has ever been developed in date palm—an important fruit crop in Oasis agriculture. The applications of CRISPR/Cas9 can be a challenging task in date palm GE due to its large and complex genome, high rate of heterozygosity and outcrossing, in vitro regeneration and screening of mutants, high frequency of single-nucleotide polymorphism in the genome and ultimately genetic instability. In this review, we addressed the potential application of CRISPR/Cas9-based approaches in date palm GE to improve the sustainable date palm production. The availability of the date palm whole genome sequence has made it feasible to use CRISPR/Cas9 GE approach for genetic improvement in this species. Moreover, the future prospects of GE application in date palm are also addressed in this review.

  4. Combining genomic and proteomic approaches for epigenetics research

    Science.gov (United States)

    Han, Yumiao; Garcia, Benjamin A

    2014-01-01

    Epigenetics is the study of changes in gene expression or cellular phenotype that do not change the DNA sequence. In this review, current methods, both genomic and proteomic, associated with epigenetics research are discussed. Among them, chromatin immunoprecipitation (ChIP) followed by sequencing and other ChIP-based techniques are powerful techniques for genome-wide profiling of DNA-binding proteins, histone post-translational modifications or nucleosome positions. However, mass spectrometry-based proteomics is increasingly being used in functional biological studies and has proved to be an indispensable tool to characterize histone modifications, as well as DNA–protein and protein–protein interactions. With the development of genomic and proteomic approaches, combination of ChIP and mass spectrometry has the potential to expand our knowledge of epigenetics research to a higher level. PMID:23895656

  5. Patterns of somatic alterations between matched primary and metastatic colorectal tumors characterized by whole-genome sequencing.

    Science.gov (United States)

    Xie, Tao; Cho, Yong Beom; Wang, Kai; Huang, Donghui; Hong, Hye Kyung; Choi, Yoon-La; Ko, Young Hyeh; Nam, Do-Hyun; Jin, Juyoun; Yang, Heekyoung; Fernandez, Julio; Deng, Shibing; Rejto, Paul A; Lee, Woo Yong; Mao, Mao

    2014-10-01

    Colorectal cancer (CRC) patients have poor prognosis after formation of distant metastasis. Understanding the molecular mechanisms by which genetic changes facilitate metastasis is critical for the development of targeted therapeutic strategies aimed at controlling disease progression while minimizing toxic side effects. A comprehensive portrait of somatic alterations in CRC and the changes between primary and metastatic tumors has yet to be developed. We performed whole genome sequencing of two primary CRC tumors and their matched liver metastases. By comparing to matched germline DNA, we catalogued somatic alterations at multiple scales, including single nucleotide variations, small insertions and deletions, copy number aberrations and structural variations in both the primary and matched metastasis. We found that the majority of these somatic alterations are present in both sites. Despite the overall similarity, several de novo alterations in the metastases were predicted to be deleterious, in genes including FBXW7, DCLK1 and FAT2, which might contribute to the initiation and progression of distant metastasis. Through careful examination of the mutation prevalence among tumor cells at each site, we also proposed distinct clonal evolution patterns between primary and metastatic tumors in the two cases. These results suggest that somatic alterations may play an important role in driving the development of colorectal cancer metastasis and present challenges and opportunities when considering the choice of treatment. Copyright © 2014 Elsevier Inc. All rights reserved.

  6. Functional genomics approaches in parasitic helminths.

    Science.gov (United States)

    Hagen, J; Lee, E F; Fairlie, W D; Kalinna, B H

    2012-01-01

    As research on parasitic helminths is moving into the post-genomic era, an enormous effort is directed towards deciphering gene function and to achieve gene annotation. The sequences that are available in public databases undoubtedly hold information that can be utilized for new interventions and control but the exploitation of these resources has until recently remained difficult. Only now, with the emergence of methods to genetically manipulate and transform parasitic worms will it be possible to gain a comprehensive understanding of the molecular mechanisms involved in nutrition, metabolism, developmental switches/maturation and interaction with the host immune system. This review focuses on functional genomics approaches in parasitic helminths that are currently used, to highlight potential applications of these technologies in the areas of cell biology, systems biology and immunobiology of parasitic helminths. © 2011 Blackwell Publishing Ltd.

  7. Privacy‐Preserving Friend Matching Protocol approach for Pre‐match in Social Networks

    DEFF Research Database (Denmark)

    Ople, Shubhangi S.; Deshmukh, Aaradhana A.; Mihovska, Albena Dimitrova

    2016-01-01

    Social services make the most use of the user profile matching to help the users to discover friends with similar social attributes (e.g. interests, location, age). However, there are many privacy concerns that prevent to enable this functionality. Privacy preserving encryption is not suitable...... for use in social networks due to its data sharing problems and information leakage. In this paper, we propose a novel framework for privacy–preserving profile matching. We implement both the client and server portion of the secure match and evaluate its performance network dataset. The results show...

  8. MinGenome: An In Silico Top-Down Approach for the Synthesis of Minimized Genomes.

    Science.gov (United States)

    Wang, Lin; Maranas, Costas D

    2018-02-16

    Genome minimized strains offer advantages as production chassis by reducing transcriptional cost, eliminating competing functions and limiting unwanted regulatory interactions. Existing approaches for identifying stretches of DNA to remove are largely ad hoc based on information on presumably dispensable regions through experimentally determined nonessential genes and comparative genomics. Here we introduce a versatile genome reduction algorithm MinGenome that implements a mixed-integer linear programming (MILP) algorithm to identify in size descending order all dispensable contiguous sequences without affecting the organism's growth or other desirable traits. Known essential genes or genes that cause significant fitness or performance loss can be flagged and their deletion can be prohibited. MinGenome also preserves needed transcription factors and promoter regions ensuring that retained genes will be properly transcribed while also avoiding the simultaneous deletion of synthetic lethal pairs. The potential benefit of removing even larger contiguous stretches of DNA if only one or two essential genes (to be reinserted elsewhere) are within the deleted sequence is explored. We applied the algorithm to design a minimized E. coli strain and found that we were able to recapitulate the long deletions identified in previous experimental studies and discover alternative combinations of deletions that have not yet been explored in vivo.

  9. Phylogeny-guided (meta)genome mining approach for the targeted discovery of new microbial natural products.

    Science.gov (United States)

    Kang, Hahk-Soo

    2017-02-01

    Genomics-based methods are now commonplace in natural products research. A phylogeny-guided mining approach provides a means to quickly screen a large number of microbial genomes or metagenomes in search of new biosynthetic gene clusters of interest. In this approach, biosynthetic genes serve as molecular markers, and phylogenetic trees built with known and unknown marker gene sequences are used to quickly prioritize biosynthetic gene clusters for their metabolites characterization. An increase in the use of this approach has been observed for the last couple of years along with the emergence of low cost sequencing technologies. The aim of this review is to discuss the basic concept of a phylogeny-guided mining approach, and also to provide examples in which this approach was successfully applied to discover new natural products from microbial genomes and metagenomes. I believe that the phylogeny-guided mining approach will continue to play an important role in genomics-based natural products research.

  10. An Assessment of Different Genomic Approaches for Inferring Phylogeny of Listeria monocytogenes

    DEFF Research Database (Denmark)

    Henri, Clementine; Leekitcharoenphon, Pimlapas; Carleton, Heather A.

    2017-01-01

    Background/objectives: Whole genome sequencing (WGS) has proven to be a powerful subtyping tool for foodborne pathogenic bacteria like L. monocytogenes. The interests of genome-scale analysis for national surveillance, outbreak detection or source tracking has been largely documented. The genomic......MLPPST) or pan genome (wgMLPPST). Currently, there are little comparisons studies of these different analytical approaches. Our objective was to assess and compare different genomic methods that can be implemented in order to cluster isolates of L monocytogenes.Methods: The clustering methods were evaluated...... on a collection of 207 L. monocytogenes genomes of food origin representative of the genetic diversity of the Anses collection. The trees were then compared using robust statistical analyses.Results: The backward comparability between conventional typing methods and genomic methods revealed a near...

  11. Precise detection of de novo single nucleotide variants in human genomes.

    Science.gov (United States)

    Gómez-Romero, Laura; Palacios-Flores, Kim; Reyes, José; García, Delfino; Boege, Margareta; Dávila, Guillermo; Flores, Margarita; Schatz, Michael C; Palacios, Rafael

    2018-05-07

    The precise determination of de novo genetic variants has enormous implications across different fields of biology and medicine, particularly personalized medicine. Currently, de novo variations are identified by mapping sample reads from a parent-offspring trio to a reference genome, allowing for a certain degree of differences. While widely used, this approach often introduces false-positive (FP) results due to misaligned reads and mischaracterized sequencing errors. In a previous study, we developed an alternative approach to accurately identify single nucleotide variants (SNVs) using only perfect matches. However, this approach could be applied only to haploid regions of the genome and was computationally intensive. In this study, we present a unique approach, coverage-based single nucleotide variant identification (COBASI), which allows the exploration of the entire genome using second-generation short sequence reads without extensive computing requirements. COBASI identifies SNVs using changes in coverage of exactly matching unique substrings, and is particularly suited for pinpointing de novo SNVs. Unlike other approaches that require population frequencies across hundreds of samples to filter out any methodological biases, COBASI can be applied to detect de novo SNVs within isolated families. We demonstrate this capability through extensive simulation studies and by studying a parent-offspring trio we sequenced using short reads. Experimental validation of all 58 candidate de novo SNVs and a selection of non-de novo SNVs found in the trio confirmed zero FP calls. COBASI is available as open source at https://github.com/Laura-Gomez/COBASI for any researcher to use. Copyright © 2018 the Author(s). Published by PNAS.

  12. Face recognition using elastic grid matching through photoshop: A new approach

    Directory of Open Access Journals (Sweden)

    Manavpreet Kaur

    2015-12-01

    Full Text Available Computing grids propose to be a very efficacious, economic and ascendable way of image identification. In this paper, we propose a grid based face recognition overture employing a general template matching method to solve the timeconsuming face recognition problem. A new approach has been employed in which the grid was prepared for a specific individual over his photograph using Adobe Photoshop CS5 software. The background was later removed and the grid prepared by merging layers was used as a template for image matching or comparison. This overture is computationally efficient, has high recognition rates and is able to identify a person with minimal efforts and in short time even from photographs taken at different magnifications and from different distances.

  13. Indonesian name matching using machine learning supervised approach

    Science.gov (United States)

    Alifikri, Mohamad; Arif Bijaksana, Moch.

    2018-03-01

    Most existing name matching methods are developed for English language and so they cover the characteristics of this language. Up to this moment, there is no specific one has been designed and implemented for Indonesian names. The purpose of this thesis is to develop Indonesian name matching dataset as a contribution to academic research and to propose suitable feature set by utilizing combination of context of name strings and its permute-winkler score. Machine learning classification algorithms is taken as the method for performing name matching. Based on the experiments, by using tuned Random Forest algorithm and proposed features, there is an improvement of matching performance by approximately 1.7% and it is able to reduce until 70% misclassification result of the state of the arts methods. This improving performance makes the matching system more effective and reduces the risk of misclassified matches.

  14. An approach to improve the match-on-card fingerprint authentication system security

    CSIR Research Space (South Africa)

    Nair, Kishor Krishnan

    2016-07-01

    Full Text Available -on-Card (TOC), Match-on- Card (MOC), Work-Sharing On-Card (WSOC), and System-on-Card (SOC). Out of these four approaches, the SOC is considered as the most secure and expensive, whereas the TOC is considered as the least secure and least expensive. The MOC...

  15. An Approach to Improve the Match-on-Card ngerprint Authentication System Security

    CSIR Research Space (South Africa)

    Nair, Kishor Krishnan

    2016-08-18

    Full Text Available -on-Card (TOC), Match-on-Card (MOC), Work-Sharing On-Card (WSOC), and System-on-Card (SOC). Out of these four approaches, the SOC is considered as the most secure and expensive, whereas the TOC is considered as the least secure and least expensive. The MOC...

  16. A BAC clone fingerprinting approach to the detection of human genome rearrangements

    Science.gov (United States)

    Krzywinski, Martin; Bosdet, Ian; Mathewson, Carrie; Wye, Natasja; Brebner, Jay; Chiu, Readman; Corbett, Richard; Field, Matthew; Lee, Darlene; Pugh, Trevor; Volik, Stas; Siddiqui, Asim; Jones, Steven; Schein, Jacquie; Collins, Collin; Marra, Marco

    2007-01-01

    We present a method, called fingerprint profiling (FPP), that uses restriction digest fingerprints of bacterial artificial chromosome clones to detect and classify rearrangements in the human genome. The approach uses alignment of experimental fingerprint patterns to in silico digests of the sequence assembly and is capable of detecting micro-deletions (1-5 kb) and balanced rearrangements. Our method has compelling potential for use as a whole-genome method for the identification and characterization of human genome rearrangements. PMID:17953769

  17. An Alternative Methodological Approach for Cost-Effectiveness Analysis and Decision Making in Genomic Medicine.

    Science.gov (United States)

    Fragoulakis, Vasilios; Mitropoulou, Christina; van Schaik, Ron H; Maniadakis, Nikolaos; Patrinos, George P

    2016-05-01

    Genomic Medicine aims to improve therapeutic interventions and diagnostics, the quality of life of patients, but also to rationalize healthcare costs. To reach this goal, careful assessment and identification of evidence gaps for public health genomics priorities are required so that a more efficient healthcare environment is created. Here, we propose a public health genomics-driven approach to adjust the classical healthcare decision making process with an alternative methodological approach of cost-effectiveness analysis, which is particularly helpful for genomic medicine interventions. By combining classical cost-effectiveness analysis with budget constraints, social preferences, and patient ethics, we demonstrate the application of this model, the Genome Economics Model (GEM), based on a previously reported genome-guided intervention from a developing country environment. The model and the attendant rationale provide a practical guide by which all major healthcare stakeholders could ensure the sustainability of funding for genome-guided interventions, their adoption and coverage by health insurance funds, and prioritization of Genomic Medicine research, development, and innovation, given the restriction of budgets, particularly in developing countries and low-income healthcare settings in developed countries. The implications of the GEM for the policy makers interested in Genomic Medicine and new health technology and innovation assessment are also discussed.

  18. A LDA-based approach to promoting ranking diversity for genomics information retrieval.

    Science.gov (United States)

    Chen, Yan; Yin, Xiaoshi; Li, Zhoujun; Hu, Xiaohua; Huang, Jimmy Xiangji

    2012-06-11

    In the biomedical domain, there are immense data and tremendous increase of genomics and biomedical relevant publications. The wealth of information has led to an increasing amount of interest in and need for applying information retrieval techniques to access the scientific literature in genomics and related biomedical disciplines. In many cases, the desired information of a query asked by biologists is a list of a certain type of entities covering different aspects that are related to the question, such as cells, genes, diseases, proteins, mutations, etc. Hence, it is important of a biomedical IR system to be able to provide relevant and diverse answers to fulfill biologists' information needs. However traditional IR model only concerns with the relevance between retrieved documents and user query, but does not take redundancy between retrieved documents into account. This will lead to high redundancy and low diversity in the retrieval ranked lists. In this paper, we propose an approach which employs a topic generative model called Latent Dirichlet Allocation (LDA) to promoting ranking diversity for biomedical information retrieval. Different from other approaches or models which consider aspects on word level, our approach assumes that aspects should be identified by the topics of retrieved documents. We present LDA model to discover topic distribution of retrieval passages and word distribution of each topic dimension, and then re-rank retrieval results with topic distribution similarity between passages based on N-size slide window. We perform our approach on TREC 2007 Genomics collection and two distinctive IR baseline runs, which can achieve 8% improvement over the highest Aspect MAP reported in TREC 2007 Genomics track. The proposed method is the first study of adopting topic model to genomics information retrieval, and demonstrates its effectiveness in promoting ranking diversity as well as in improving relevance of ranked lists of genomics search

  19. Horsetail matching: a flexible approach to optimization under uncertainty

    Science.gov (United States)

    Cook, L. W.; Jarrett, J. P.

    2018-04-01

    It is important to design engineering systems to be robust with respect to uncertainties in the design process. Often, this is done by considering statistical moments, but over-reliance on statistical moments when formulating a robust optimization can produce designs that are stochastically dominated by other feasible designs. This article instead proposes a formulation for optimization under uncertainty that minimizes the difference between a design's cumulative distribution function and a target. A standard target is proposed that produces stochastically non-dominated designs, but the formulation also offers enough flexibility to recover existing approaches for robust optimization. A numerical implementation is developed that employs kernels to give a differentiable objective function. The method is applied to algebraic test problems and a robust transonic airfoil design problem where it is compared to multi-objective, weighted-sum and density matching approaches to robust optimization; several advantages over these existing methods are demonstrated.

  20. Pep2Path: automated mass spectrometry-guided genome mining of peptidic natural products.

    Directory of Open Access Journals (Sweden)

    Marnix H Medema

    2014-09-01

    Full Text Available Nonribosomally and ribosomally synthesized bioactive peptides constitute a source of molecules of great biomedical importance, including antibiotics such as penicillin, immunosuppressants such as cyclosporine, and cytostatics such as bleomycin. Recently, an innovative mass-spectrometry-based strategy, peptidogenomics, has been pioneered to effectively mine microbial strains for novel peptidic metabolites. Even though mass-spectrometric peptide detection can be performed quite fast, true high-throughput natural product discovery approaches have still been limited by the inability to rapidly match the identified tandem mass spectra to the gene clusters responsible for the biosynthesis of the corresponding compounds. With Pep2Path, we introduce a software package to fully automate the peptidogenomics approach through the rapid Bayesian probabilistic matching of mass spectra to their corresponding biosynthetic gene clusters. Detailed benchmarking of the method shows that the approach is powerful enough to correctly identify gene clusters even in data sets that consist of hundreds of genomes, which also makes it possible to match compounds from unsequenced organisms to closely related biosynthetic gene clusters in other genomes. Applying Pep2Path to a data set of compounds without known biosynthesis routes, we were able to identify candidate gene clusters for the biosynthesis of five important compounds. Notably, one of these clusters was detected in a genome from a different subphylum of Proteobacteria than that in which the molecule had first been identified. All in all, our approach paves the way towards high-throughput discovery of novel peptidic natural products. Pep2Path is freely available from http://pep2path.sourceforge.net/, implemented in Python, licensed under the GNU General Public License v3 and supported on MS Windows, Linux and Mac OS X.

  1. Accelerating String Set Matching in FPGA Hardware for Bioinformatics Research

    Directory of Open Access Journals (Sweden)

    Burgess Shane C

    2008-04-01

    Full Text Available Abstract Background This paper describes techniques for accelerating the performance of the string set matching problem with particular emphasis on applications in computational proteomics. The process of matching peptide sequences against a genome translated in six reading frames is part of a proteogenomic mapping pipeline that is used as a case-study. The Aho-Corasick algorithm is adapted for execution in field programmable gate array (FPGA devices in a manner that optimizes space and performance. In this approach, the traditional Aho-Corasick finite state machine (FSM is split into smaller FSMs, operating in parallel, each of which matches up to 20 peptides in the input translated genome. Each of the smaller FSMs is further divided into five simpler FSMs such that each simple FSM operates on a single bit position in the input (five bits are sufficient for representing all amino acids and special symbols in protein sequences. Results This bit-split organization of the Aho-Corasick implementation enables efficient utilization of the limited random access memory (RAM resources available in typical FPGAs. The use of on-chip RAM as opposed to FPGA logic resources for FSM implementation also enables rapid reconfiguration of the FPGA without the place and routing delays associated with complex digital designs. Conclusion Experimental results show storage efficiencies of over 80% for several data sets. Furthermore, the FPGA implementation executing at 100 MHz is nearly 20 times faster than an implementation of the traditional Aho-Corasick algorithm executing on a 2.67 GHz workstation.

  2. Prokaryote genome fluidity: toward a system approach of the mobilome.

    Science.gov (United States)

    Toussaint, Ariane; Chandler, Mick

    2012-01-01

    The importance of horizontal/lateral gene transfer (LGT) in shaping the genomes of prokaryotic organisms has been recognized in recent years as a result of analysis of the increasing number of available genome sequences. LGT is largely due to the transfer and recombination activities of mobile genetic elements (MGEs). Bacterial and archaeal genomes are mosaics of vertically and horizontally transmitted DNA segments. This generates reticulate relationships between members of the prokaryotic world that are better represented by networks than by "classical" phylogenetic trees. In this review we summarize the nature and activities of MGEs, and the problems that presently limit their analysis on a large scale. We propose routes to improve their annotation in the flow of genomic and metagenomic sequences that currently exist and those that become available. We describe network analysis of evolutionary relationships among some MGE categories and sketch out possible developments of this type of approach to get more insight into the role of the mobilome in bacterial adaptation and evolution.

  3. A New Approach to Dissect Nuclear Organization: TALE-Mediated Genome Visualization (TGV).

    Science.gov (United States)

    Miyanari, Yusuke

    2016-01-01

    Spatiotemporal organization of chromatin within the nucleus has so far remained elusive. Live visualization of nuclear remodeling could be a promising approach to understand its functional relevance in genome functions and mechanisms regulating genome architecture. Recent technological advances in live imaging of chromosomes begun to explore the biological roles of the movement of the chromatin within the nucleus. Here I describe a new technique, called TALE-mediated genome visualization (TGV), which allows us to visualize endogenous repetitive sequence including centromeric, pericentromeric, and telomeric repeats in living cells.

  4. Functional Genomics Approaches to Studying Symbioses between Legumes and Nitrogen-Fixing Rhizobia.

    Science.gov (United States)

    Lardi, Martina; Pessi, Gabriella

    2018-05-18

    Biological nitrogen fixation gives legumes a pronounced growth advantage in nitrogen-deprived soils and is of considerable ecological and economic interest. In exchange for reduced atmospheric nitrogen, typically given to the plant in the form of amides or ureides, the legume provides nitrogen-fixing rhizobia with nutrients and highly specialised root structures called nodules. To elucidate the molecular basis underlying physiological adaptations on a genome-wide scale, functional genomics approaches, such as transcriptomics, proteomics, and metabolomics, have been used. This review presents an overview of the different functional genomics approaches that have been performed on rhizobial symbiosis, with a focus on studies investigating the molecular mechanisms used by the bacterial partner to interact with the legume. While rhizobia belonging to the alpha-proteobacterial group (alpha-rhizobia) have been well studied, few studies to date have investigated this process in beta-proteobacteria (beta-rhizobia).

  5. Ethical considerations of research policy for personal genome analysis: the approach of the Genome Science Project in Japan.

    Science.gov (United States)

    Minari, Jusaku; Shirai, Tetsuya; Kato, Kazuto

    2014-12-01

    As evidenced by high-throughput sequencers, genomic technologies have recently undergone radical advances. These technologies enable comprehensive sequencing of personal genomes considerably more efficiently and less expensively than heretofore. These developments present a challenge to the conventional framework of biomedical ethics; under these changing circumstances, each research project has to develop a pragmatic research policy. Based on the experience with a new large-scale project-the Genome Science Project-this article presents a novel approach to conducting a specific policy for personal genome research in the Japanese context. In creating an original informed-consent form template for the project, we present a two-tiered process: making the draft of the template following an analysis of national and international policies; refining the draft template in conjunction with genome project researchers for practical application. Through practical use of the template, we have gained valuable experience in addressing challenges in the ethical review process, such as the importance of sharing details of the latest developments in genomics with members of research ethics committees. We discuss certain limitations of the conventional concept of informed consent and its governance system and suggest the potential of an alternative process using information technology.

  6. Accounting for linkage disequilibrium in genome scans for selection without individual genotypes: The local score approach.

    Science.gov (United States)

    Fariello, María Inés; Boitard, Simon; Mercier, Sabine; Robelin, David; Faraut, Thomas; Arnould, Cécile; Recoquillay, Julien; Bouchez, Olivier; Salin, Gérald; Dehais, Patrice; Gourichon, David; Leroux, Sophie; Pitel, Frédérique; Leterrier, Christine; SanCristobal, Magali

    2017-07-01

    Detecting genomic footprints of selection is an important step in the understanding of evolution. Accounting for linkage disequilibrium in genome scans increases detection power, but haplotype-based methods require individual genotypes and are not applicable on pool-sequenced samples. We propose to take advantage of the local score approach to account for linkage disequilibrium in genome scans for selection, cumulating (possibly small) signals from single markers over a genomic segment, to clearly pinpoint a selection signal. Using computer simulations, we demonstrate that this approach detects selection with higher power than several state-of-the-art single-marker, windowing or haplotype-based approaches. We illustrate this on two benchmark data sets including individual genotypes, for which we obtain similar results with the local score and one haplotype-based approach. Finally, we apply the local score approach to Pool-Seq data obtained from a divergent selection experiment on behaviour in quail and obtain precise and biologically coherent selection signals: while competing methods fail to highlight any clear selection signature, our method detects several regions involving genes known to act on social responsiveness or autistic traits. Although we focus here on the detection of positive selection from multiple population data, the local score approach is general and can be applied to other genome scans for selection or other genomewide analyses such as GWAS. © 2017 John Wiley & Sons Ltd.

  7. A Polygon and Point-Based Approach to Matching Geospatial Features

    Directory of Open Access Journals (Sweden)

    Juan J. Ruiz-Lendínez

    2017-12-01

    Full Text Available A methodology for matching bidimensional entities is presented in this paper. The matching is proposed for both area and point features extracted from geographical databases. The procedure used to obtain homologous entities is achieved in a two-step process: The first matching, polygon to polygon matching (inter-element matching, is obtained by means of a genetic algorithm that allows the classifying of area features from two geographical databases. After this, we apply a point to point matching (intra-element matching based on the comparison of changes in their turning functions. This study shows that genetic algorithms are suitable for matching polygon features even if these features are quite different. Our results show up to 40% of matched polygons with differences in geometrical attributes. With regards to point matching, the vertex from homologous polygons, the function and threshold values proposed in this paper show a useful method for obtaining precise vertex matching.

  8. Integrating Genomic Data Sets for Knowledge Discovery: An Informed Approach to Management of Captive Endangered Species

    Directory of Open Access Journals (Sweden)

    Kristopher J. L. Irizarry

    2016-01-01

    Full Text Available Many endangered captive populations exhibit reduced genetic diversity resulting in health issues that impact reproductive fitness and quality of life. Numerous cost effective genomic sequencing and genotyping technologies provide unparalleled opportunity for incorporating genomics knowledge in management of endangered species. Genomic data, such as sequence data, transcriptome data, and genotyping data, provide critical information about a captive population that, when leveraged correctly, can be utilized to maximize population genetic variation while simultaneously reducing unintended introduction or propagation of undesirable phenotypes. Current approaches aimed at managing endangered captive populations utilize species survival plans (SSPs that rely upon mean kinship estimates to maximize genetic diversity while simultaneously avoiding artificial selection in the breeding program. However, as genomic resources increase for each endangered species, the potential knowledge available for management also increases. Unlike model organisms in which considerable scientific resources are used to experimentally validate genotype-phenotype relationships, endangered species typically lack the necessary sample sizes and economic resources required for such studies. Even so, in the absence of experimentally verified genetic discoveries, genomics data still provides value. In fact, bioinformatics and comparative genomics approaches offer mechanisms for translating these raw genomics data sets into integrated knowledge that enable an informed approach to endangered species management.

  9. Integrating Genomic Data Sets for Knowledge Discovery: An Informed Approach to Management of Captive Endangered Species.

    Science.gov (United States)

    Irizarry, Kristopher J L; Bryant, Doug; Kalish, Jordan; Eng, Curtis; Schmidt, Peggy L; Barrett, Gini; Barr, Margaret C

    2016-01-01

    Many endangered captive populations exhibit reduced genetic diversity resulting in health issues that impact reproductive fitness and quality of life. Numerous cost effective genomic sequencing and genotyping technologies provide unparalleled opportunity for incorporating genomics knowledge in management of endangered species. Genomic data, such as sequence data, transcriptome data, and genotyping data, provide critical information about a captive population that, when leveraged correctly, can be utilized to maximize population genetic variation while simultaneously reducing unintended introduction or propagation of undesirable phenotypes. Current approaches aimed at managing endangered captive populations utilize species survival plans (SSPs) that rely upon mean kinship estimates to maximize genetic diversity while simultaneously avoiding artificial selection in the breeding program. However, as genomic resources increase for each endangered species, the potential knowledge available for management also increases. Unlike model organisms in which considerable scientific resources are used to experimentally validate genotype-phenotype relationships, endangered species typically lack the necessary sample sizes and economic resources required for such studies. Even so, in the absence of experimentally verified genetic discoveries, genomics data still provides value. In fact, bioinformatics and comparative genomics approaches offer mechanisms for translating these raw genomics data sets into integrated knowledge that enable an informed approach to endangered species management.

  10. Annotating the human genome with Disease Ontology

    Science.gov (United States)

    Osborne, John D; Flatow, Jared; Holko, Michelle; Lin, Simon M; Kibbe, Warren A; Zhu, Lihua (Julie); Danila, Maria I; Feng, Gang; Chisholm, Rex L

    2009-01-01

    Background The human genome has been extensively annotated with Gene Ontology for biological functions, but minimally computationally annotated for diseases. Results We used the Unified Medical Language System (UMLS) MetaMap Transfer tool (MMTx) to discover gene-disease relationships from the GeneRIF database. We utilized a comprehensive subset of UMLS, which is disease-focused and structured as a directed acyclic graph (the Disease Ontology), to filter and interpret results from MMTx. The results were validated against the Homayouni gene collection using recall and precision measurements. We compared our results with the widely used Online Mendelian Inheritance in Man (OMIM) annotations. Conclusion The validation data set suggests a 91% recall rate and 97% precision rate of disease annotation using GeneRIF, in contrast with a 22% recall and 98% precision using OMIM. Our thesaurus-based approach allows for comparisons to be made between disease containing databases and allows for increased accuracy in disease identification through synonym matching. The much higher recall rate of our approach demonstrates that annotating human genome with Disease Ontology and GeneRIF for diseases dramatically increases the coverage of the disease annotation of human genome. PMID:19594883

  11. High throughput sequencing and proteomics to identify immunogenic proteins of a new pathogen: the dirty genome approach.

    Science.gov (United States)

    Greub, Gilbert; Kebbi-Beghdadi, Carole; Bertelli, Claire; Collyn, François; Riederer, Beat M; Yersin, Camille; Croxatto, Antony; Raoult, Didier

    2009-12-23

    With the availability of new generation sequencing technologies, bacterial genome projects have undergone a major boost. Still, chromosome completion needs a costly and time-consuming gap closure, especially when containing highly repetitive elements. However, incomplete genome data may be sufficiently informative to derive the pursued information. For emerging pathogens, i.e. newly identified pathogens, lack of release of genome data during gap closure stage is clearly medically counterproductive. We thus investigated the feasibility of a dirty genome approach, i.e. the release of unfinished genome sequences to develop serological diagnostic tools. We showed that almost the whole genome sequence of the emerging pathogen Parachlamydia acanthamoebae was retrieved even with relatively short reads from Genome Sequencer 20 and Solexa. The bacterial proteome was analyzed to select immunogenic proteins, which were then expressed and used to elaborate the first steps of an ELISA. This work constitutes the proof of principle for a dirty genome approach, i.e. the use of unfinished genome sequences of pathogenic bacteria, coupled with proteomics to rapidly identify new immunogenic proteins useful to develop in the future specific diagnostic tests such as ELISA, immunohistochemistry and direct antigen detection. Although applied here to an emerging pathogen, this combined dirty genome sequencing/proteomic approach may be used for any pathogen for which better diagnostics are needed. These genome sequences may also be very useful to develop DNA based diagnostic tests. All these diagnostic tools will allow further evaluations of the pathogenic potential of this obligate intracellular bacterium.

  12. HANDBOOK OF SOCCER MATCH ANALYSIS: A SYSTEMATIC APPROACH TO IMPROVING PERFORMANCE

    Directory of Open Access Journals (Sweden)

    Christopher Carling

    2006-03-01

    Analysis Tells Us about Successful Strategy and Tactics in Soccer, 8. From Technical and Tactical Performance Analysis to Training Drills, 9. The Future of Soccer Match Analysis. ASSESSMENT The authors have assembled an essential reading for all who are interested in understanding and doing better coaching and improving the performance in soccer. To this purpose, there is a strong practical approach in the book by giving plenty of examples along with a satisfactory scientific analysis of the subject area. It is concise and well organized in its presentation, creating an effective textbook. I believe, therefore, the book will serve as a first-rate teaching tool and reference for coaches, athletes and professionals in the human performance sciences.

  13. Efficient line matching with homography

    Science.gov (United States)

    Shen, Yan; Dai, Yuxing; Zhu, Zhiliang

    2018-03-01

    In this paper, we propose a novel approach to line matching based on homography. The basic idea is to use cheaply obtainable matched points to boost the similarity between two images. Two types of homography method, which are estimated by direct linear transformation, transform images and extract their similar parts, laying a foundation for the use of optical flow tracking. The merit of the similarity is that rapid matching can be achieved by regionalizing line segments and local searching. For multiple homography estimation that can perform better than one global homography, we introduced the rank-one modification method of singular value decomposition to reduce the computation cost. The proposed approach results in point-to-point matches, which can be utilized with state-of-the-art point-match-based structures from motion (SfM) frameworks seamlessly. The outstanding performance and feasible robustness of our approach are demonstrated in this paper.

  14. AN INTEGRATED RANSAC AND GRAPH BASED MISMATCH ELIMINATION APPROACH FOR WIDE-BASELINE IMAGE MATCHING

    Directory of Open Access Journals (Sweden)

    M. Hasheminasab

    2015-12-01

    Full Text Available In this paper we propose an integrated approach in order to increase the precision of feature point matching. Many different algorithms have been developed as to optimizing the short-baseline image matching while because of illumination differences and viewpoints changes, wide-baseline image matching is so difficult to handle. Fortunately, the recent developments in the automatic extraction of local invariant features make wide-baseline image matching possible. The matching algorithms which are based on local feature similarity principle, using feature descriptor as to establish correspondence between feature point sets. To date, the most remarkable descriptor is the scale-invariant feature transform (SIFT descriptor , which is invariant to image rotation and scale, and it remains robust across a substantial range of affine distortion, presence of noise, and changes in illumination. The epipolar constraint based on RANSAC (random sample consensus method is a conventional model for mismatch elimination, particularly in computer vision. Because only the distance from the epipolar line is considered, there are a few false matches in the selected matching results based on epipolar geometry and RANSAC. Aguilariu et al. proposed Graph Transformation Matching (GTM algorithm to remove outliers which has some difficulties when the mismatched points surrounded by the same local neighbor structure. In this study to overcome these limitations, which mentioned above, a new three step matching scheme is presented where the SIFT algorithm is used to obtain initial corresponding point sets. In the second step, in order to reduce the outliers, RANSAC algorithm is applied. Finally, to remove the remained mismatches, based on the adjacent K-NN graph, the GTM is implemented. Four different close range image datasets with changes in viewpoint are utilized to evaluate the performance of the proposed method and the experimental results indicate its robustness and

  15. High throughput sequencing and proteomics to identify immunogenic proteins of a new pathogen: the dirty genome approach.

    Directory of Open Access Journals (Sweden)

    Gilbert Greub

    Full Text Available BACKGROUND: With the availability of new generation sequencing technologies, bacterial genome projects have undergone a major boost. Still, chromosome completion needs a costly and time-consuming gap closure, especially when containing highly repetitive elements. However, incomplete genome data may be sufficiently informative to derive the pursued information. For emerging pathogens, i.e. newly identified pathogens, lack of release of genome data during gap closure stage is clearly medically counterproductive. METHODS/PRINCIPAL FINDINGS: We thus investigated the feasibility of a dirty genome approach, i.e. the release of unfinished genome sequences to develop serological diagnostic tools. We showed that almost the whole genome sequence of the emerging pathogen Parachlamydia acanthamoebae was retrieved even with relatively short reads from Genome Sequencer 20 and Solexa. The bacterial proteome was analyzed to select immunogenic proteins, which were then expressed and used to elaborate the first steps of an ELISA. CONCLUSIONS/SIGNIFICANCE: This work constitutes the proof of principle for a dirty genome approach, i.e. the use of unfinished genome sequences of pathogenic bacteria, coupled with proteomics to rapidly identify new immunogenic proteins useful to develop in the future specific diagnostic tests such as ELISA, immunohistochemistry and direct antigen detection. Although applied here to an emerging pathogen, this combined dirty genome sequencing/proteomic approach may be used for any pathogen for which better diagnostics are needed. These genome sequences may also be very useful to develop DNA based diagnostic tests. All these diagnostic tools will allow further evaluations of the pathogenic potential of this obligate intracellular bacterium.

  16. Assessing the short term impact of air pollution on mortality: a matching approach.

    Science.gov (United States)

    Baccini, Michela; Mattei, Alessandra; Mealli, Fabrizia; Bertazzi, Pier Alberto; Carugno, Michele

    2017-02-10

    The opportunity to assess short term impact of air pollution relies on the causal interpretation of the exposure-response association. However, up to now few studies explicitly faced this issue within a causal inference framework. In this paper, we reformulated the problem of assessing the short term impact of air pollution on health using the potential outcome approach to causal inference. We considered the impact of high daily levels of particulate matter ≤10 μm in diameter (PM 10 ) on mortality within two days from the exposure in the metropolitan area of Milan (Italy), during the period 2003-2006. Our research focus was the causal impact of a hypothetical intervention setting daily air pollution levels under a pre-fixed threshold. We applied a matching procedure based on propensity score to estimate the total number of attributable deaths (AD) during the study period. After defining the number of attributable deaths in terms of difference between potential outcomes, we used the estimated propensity score to match each high exposure day, namely each day with a level of exposure higher than 40 μg/m 3 , with a day with similar background characteristics but a level of exposure lower than 40 μg/m 3 . Then, we estimated the impact by comparing mortality between matched days. During the study period daily exposures larger than 40 μg/m 3 were responsible for 1079 deaths (90% CI: 116; 2042). The impact was more evident among the elderly than in the younger age classes. Exposures ≥ 40 μg/m 3 were responsible, among the elderly, for 1102 deaths (90% CI: 388, 1816), of which 797 from cardiovascular causes and 243 from respiratory causes. Clear evidence of an impact on respiratory mortality was found also in the age class 65-74, with 87 AD (90% CI: 11, 163). The propensity score matching turned out to be an appealing method to assess historical impacts in this field, which guarantees that the estimated total number of AD can be derived directly as sum

  17. New Approaches and Technologies to Sequence de novo Plant reference Genomes (2013 DOE JGI Genomics of Energy and Environment 8th Annual User Meeting)

    Energy Technology Data Exchange (ETDEWEB)

    Schmutz, Jeremy

    2013-03-01

    Jeremy Schmutz of the HudsonAlpha Institute for Biotechnology on New approaches and technologies to sequence de novo plant reference genomes at the 8th Annual Genomics of Energy Environment Meeting on March 27, 2013 in Walnut Creek, CA.

  18. The role of duplications in the evolution of genomes highlights the need for evolutionary-based approaches in comparative genomics

    Directory of Open Access Journals (Sweden)

    Levasseur Anthony

    2011-02-01

    Full Text Available Abstract Understanding the evolutionary plasticity of the genome requires a global, comparative approach in which genetic events are considered both in a phylogenetic framework and with regard to population genetics and environmental variables. In the mechanisms that generate adaptive and non-adaptive changes in genomes, segmental duplications (duplication of individual genes or genomic regions and polyploidization (whole genome duplications are well-known driving forces. The probability of fixation and maintenance of duplicates depends on many variables, including population sizes and selection regimes experienced by the corresponding genes: a combination of stochastic and adaptive mechanisms has shaped all genomes. A survey of experimental work shows that the distinction made between fixation and maintenance of duplicates still needs to be conceptualized and mathematically modeled. Here we review the mechanisms that increase or decrease the probability of fixation or maintenance of duplicated genes, and examine the outcome of these events on the adaptation of the organisms. Reviewers This article was reviewed by Dr. Etienne Joly, Dr. Lutz Walter and Dr. W. Ford Doolittle.

  19. Matching Two-dimensional Gel Electrophoresis' Spots

    DEFF Research Database (Denmark)

    Dos Anjos, António; AL-Tam, Faroq; Shahbazkia, Hamid Reza

    2012-01-01

    This paper describes an approach for matching Two-Dimensional Electrophoresis (2-DE) gels' spots, involving the use of image registration. The number of false positive matches produced by the proposed approach is small, when compared to academic and commercial state-of-the-art approaches. This ar...

  20. A viral metagenomic approach on a nonmetagenomic experiment

    DEFF Research Database (Denmark)

    Bovo, Samuele; Mazzoni, Gianluca; Ribani, Anisa

    2017-01-01

    Shot-gun next generation sequencing (NGS) on whole DNA extracted from specimens collected from mammals often produces reads that are not mapped (i.e. unmapped reads) on the host reference genome and that are usually discarded as by-products of the experiments. In this study, we mined Ion Torrent...... reads obtained by sequencing DNA isolated from archived blood samples collected from 100 performance tested Italian Large White pigs. Two reduced representation libraries were prepared from two DNA pools constructed each from 50 equimolar DNA samples. Bioinformatic analyses were carried out to mine...... unmapped reads on the reference pig genome that were obtained from the two NGS datasets. In silico analyses included read mapping and sequence assembly approaches for a viral metagenomic analysis using the NCBI Viral Genome Resource. Our approach identified sequences matching several viruses...

  1. Comprehensive evaluation of genome-wide 5-hydroxymethylcytosine profiling approaches in human DNA.

    Science.gov (United States)

    Skvortsova, Ksenia; Zotenko, Elena; Luu, Phuc-Loi; Gould, Cathryn M; Nair, Shalima S; Clark, Susan J; Stirzaker, Clare

    2017-01-01

    The discovery that 5-methylcytosine (5mC) can be oxidized to 5-hydroxymethylcytosine (5hmC) by the ten-eleven translocation (TET) proteins has prompted wide interest in the potential role of 5hmC in reshaping the mammalian DNA methylation landscape. The gold-standard bisulphite conversion technologies to study DNA methylation do not distinguish between 5mC and 5hmC. However, new approaches to mapping 5hmC genome-wide have advanced rapidly, although it is unclear how the different methods compare in accurately calling 5hmC. In this study, we provide a comparative analysis on brain DNA using three 5hmC genome-wide approaches, namely whole-genome bisulphite/oxidative bisulphite sequencing (WG Bis/OxBis-seq), Infinium HumanMethylation450 BeadChip arrays coupled with oxidative bisulphite (HM450K Bis/OxBis) and antibody-based immunoprecipitation and sequencing of hydroxymethylated DNA (hMeDIP-seq). We also perform loci-specific TET-assisted bisulphite sequencing (TAB-seq) for validation of candidate regions. We show that whole-genome single-base resolution approaches are advantaged in providing precise 5hmC values but require high sequencing depth to accurately measure 5hmC, as this modification is commonly in low abundance in mammalian cells. HM450K arrays coupled with oxidative bisulphite provide a cost-effective representation of 5hmC distribution, at CpG sites with 5hmC levels >~10%. However, 5hmC analysis is restricted to the genomic location of the probes, which is an important consideration as 5hmC modification is commonly enriched at enhancer elements. Finally, we show that the widely used hMeDIP-seq method provides an efficient genome-wide profile of 5hmC and shows high correlation with WG Bis/OxBis-seq 5hmC distribution in brain DNA. However, in cell line DNA with low levels of 5hmC, hMeDIP-seq-enriched regions are not detected by WG Bis/OxBis or HM450K, either suggesting misinterpretation of 5hmC calls by hMeDIP or lack of sensitivity of the latter methods. We

  2. Comparisons of single-stage and two-stage approaches to genomic selection.

    Science.gov (United States)

    Schulz-Streeck, Torben; Ogutu, Joseph O; Piepho, Hans-Peter

    2013-01-01

    Genomic selection (GS) is a method for predicting breeding values of plants or animals using many molecular markers that is commonly implemented in two stages. In plant breeding the first stage usually involves computation of adjusted means for genotypes which are then used to predict genomic breeding values in the second stage. We compared two classical stage-wise approaches, which either ignore or approximate correlations among the means by a diagonal matrix, and a new method, to a single-stage analysis for GS using ridge regression best linear unbiased prediction (RR-BLUP). The new stage-wise method rotates (orthogonalizes) the adjusted means from the first stage before submitting them to the second stage. This makes the errors approximately independently and identically normally distributed, which is a prerequisite for many procedures that are potentially useful for GS such as machine learning methods (e.g. boosting) and regularized regression methods (e.g. lasso). This is illustrated in this paper using componentwise boosting. The componentwise boosting method minimizes squared error loss using least squares and iteratively and automatically selects markers that are most predictive of genomic breeding values. Results are compared with those of RR-BLUP using fivefold cross-validation. The new stage-wise approach with rotated means was slightly more similar to the single-stage analysis than the classical two-stage approaches based on non-rotated means for two unbalanced datasets. This suggests that rotation is a worthwhile pre-processing step in GS for the two-stage approaches for unbalanced datasets. Moreover, the predictive accuracy of stage-wise RR-BLUP was higher (5.0-6.1%) than that of componentwise boosting.

  3. Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency

    Directory of Open Access Journals (Sweden)

    Rodrigo Aniceto

    2015-01-01

    Full Text Available Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB.

  4. Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency

    Science.gov (United States)

    Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio

    2015-01-01

    Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB. PMID:26558254

  5. Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency.

    Science.gov (United States)

    Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio

    2015-01-01

    Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB.

  6. A genome-wide approach to children's aggressive behavior: The EAGLE consortium

    NARCIS (Netherlands)

    Pappa, I.; St Pourcain, B.; Benke, K.S.; Cavadino, A.; Hakulinen, C.; Nivard, M.G.; Nolte, I.M.; Tiesler, C.M.T.; Bakermans-Kranenburg, M.J.; Davies, G.E.; Evans, D.M.; Geoffroy, M.C.; Grallert, H.; Blokhuis, M.M.; Hudziak, J.J.; Kemp, J.P.; Keltikangas-Järvinen, L.; McMahon, G.; Mileva-Seitz, V.R.; Motazedi, E.; Power, C.; Raitakari, O.T.; Ring, S.M.; Rivadeneira, F.; Rodriguez, A.; Scheet, P.; Seppälä, I.; Snieder, H.; Standl, M.; Thiering, E.; Timpson, N.J.; Veenstra, R.; Velders, F.P.; Whitehouse, A.J.O.; Davey Smith, G.; Heinrich, J.; Hypponen, E.; Lehtimäki, T.; Middeldorp, C.M.; Oldehinkel, A.J.; Pennell, C.E.; Boomsma, D.I.; Tiemeier, H.

    2016-01-01

    Individual differences in aggressive behavior emerge in early childhood and predict persisting behavioral problems and disorders. Studies of antisocial and severe aggression in adulthood indicate substantial underlying biology. However, little attention has been given to genome-wide approaches of

  7. A hybrid reference-guided de novo assembly approach for generating Cyclospora mitochondrion genomes.

    Science.gov (United States)

    Gopinath, G R; Cinar, H N; Murphy, H R; Durigan, M; Almeria, M; Tall, B D; DaSilva, A J

    2018-01-01

    Cyclospora cayetanensis is a coccidian parasite associated with large and complex foodborne outbreaks worldwide. Linking samples from cyclosporiasis patients during foodborne outbreaks with suspected contaminated food sources, using conventional epidemiological methods, has been a persistent challenge. To address this issue, development of new methods based on potential genomically-derived markers for strain-level identification has been a priority for the food safety research community. The absence of reference genomes to identify nucleotide and structural variants with a high degree of confidence has limited the application of using sequencing data for source tracking during outbreak investigations. In this work, we determined the quality of a high resolution, curated, public mitochondrial genome assembly to be used as a reference genome by applying bioinformatic analyses. Using this reference genome, three new mitochondrial genome assemblies were built starting with metagenomic reads generated by sequencing DNA extracted from oocysts present in stool samples from cyclosporiasis patients. Nucleotide variants were identified in the new and other publicly available genomes in comparison with the mitochondrial reference genome. A consolidated workflow, presented here, to generate new mitochondrion genomes using our reference-guided de novo assembly approach could be useful in facilitating the generation of other mitochondrion sequences, and in their application for subtyping C. cayetanensis strains during foodborne outbreak investigations.

  8. Whole-genome sequencing approaches for conservation biology: Advantages, limitations and practical recommendations.

    Science.gov (United States)

    Fuentes-Pardo, Angela P; Ruzzante, Daniel E

    2017-10-01

    Whole-genome resequencing (WGR) is a powerful method for addressing fundamental evolutionary biology questions that have not been fully resolved using traditional methods. WGR includes four approaches: the sequencing of individuals to a high depth of coverage with either unresolved or resolved haplotypes, the sequencing of population genomes to a high depth by mixing equimolar amounts of unlabelled-individual DNA (Pool-seq) and the sequencing of multiple individuals from a population to a low depth (lcWGR). These techniques require the availability of a reference genome. This, along with the still high cost of shotgun sequencing and the large demand for computing resources and storage, has limited their implementation in nonmodel species with scarce genomic resources and in fields such as conservation biology. Our goal here is to describe the various WGR methods, their pros and cons and potential applications in conservation biology. WGR offers an unprecedented marker density and surveys a wide diversity of genetic variations not limited to single nucleotide polymorphisms (e.g., structural variants and mutations in regulatory elements), increasing their power for the detection of signatures of selection and local adaptation as well as for the identification of the genetic basis of phenotypic traits and diseases. Currently, though, no single WGR approach fulfils all requirements of conservation genetics, and each method has its own limitations and sources of potential bias. We discuss proposed ways to minimize such biases. We envision a not distant future where the analysis of whole genomes becomes a routine task in many nonmodel species and fields including conservation biology. © 2017 John Wiley & Sons Ltd.

  9. Individual match approach to Bowling performance measures in ...

    African Journals Online (AJOL)

    Match conditions can play a significant role in player performances in a cricket match. If the pitch is in a good condition, the batsmen can achieve good scores, making it difficult for the bowlers. In the case of an uneven pitch or adverse weather conditions, the bowlers may have the upper hand. In order to measure bowlers' ...

  10. A Review on Block Matching Motion Estimation and Automata Theory based Approaches for Fractal Coding

    Directory of Open Access Journals (Sweden)

    Shailesh Kamble

    2016-12-01

    Full Text Available Fractal compression is the lossy compression technique in the field of gray/color image and video compression. It gives high compression ratio, better image quality with fast decoding time but improvement in encoding time is a challenge. This review paper/article presents the analysis of most significant existing approaches in the field of fractal based gray/color images and video compression, different block matching motion estimation approaches for finding out the motion vectors in a frame based on inter-frame coding and intra-frame coding i.e. individual frame coding and automata theory based coding approaches to represent an image/sequence of images. Though different review papers exist related to fractal coding, this paper is different in many sense. One can develop the new shape pattern for motion estimation and modify the existing block matching motion estimation with automata coding to explore the fractal compression technique with specific focus on reducing the encoding time and achieving better image/video reconstruction quality. This paper is useful for the beginners in the domain of video compression.

  11. A “Forward Genomics” Approach Links Genotype to Phenotype using Independent Phenotypic Losses among Related Species

    Directory of Open Access Journals (Sweden)

    Michael Hiller

    2012-10-01

    Full Text Available Genotype-phenotype mapping is hampered by countless genomic changes between species. We introduce a computational “forward genomics” strategy that—given only an independently lost phenotype and whole genomes—matches genomic and phenotypic loss patterns to associate specific genomic regions with this phenotype. We conducted genome-wide screens for two metabolic phenotypes. First, our approach correctly matches the inactivated Gulo gene exactly with the species that lost the ability to synthesize vitamin C. Second, we attribute naturally low biliary phospholipid levels in guinea pigs and horses to the inactivated phospholipid transporter Abcb4. Human ABCB4 mutations also result in low phospholipid levels but lead to severe liver disease, suggesting compensatory mechanisms in guinea pig and horse. Our simulation studies, counts of independent changes in existing phenotype surveys, and the forthcoming availability of many new genomes all suggest that forward genomics can be applied to many phenotypes, including those relevant for human evolution and disease.

  12. Word-level recognition of multifont Arabic text using a feature vector matching approach

    Science.gov (United States)

    Erlandson, Erik J.; Trenkle, John M.; Vogt, Robert C., III

    1996-03-01

    Many text recognition systems recognize text imagery at the character level and assemble words from the recognized characters. An alternative approach is to recognize text imagery at the word level, without analyzing individual characters. This approach avoids the problem of individual character segmentation, and can overcome local errors in character recognition. A word-level recognition system for machine-printed Arabic text has been implemented. Arabic is a script language, and is therefore difficult to segment at the character level. Character segmentation has been avoided by recognizing text imagery of complete words. The Arabic recognition system computes a vector of image-morphological features on a query word image. This vector is matched against a precomputed database of vectors from a lexicon of Arabic words. Vectors from the database with the highest match score are returned as hypotheses for the unknown image. Several feature vectors may be stored for each word in the database. Database feature vectors generated using multiple fonts and noise models allow the system to be tuned to its input stream. Used in conjunction with database pruning techniques, this Arabic recognition system has obtained promising word recognition rates on low-quality multifont text imagery.

  13. Global repeat discovery and estimation of genomic copy number in a large, complex genome using a high-throughput 454 sequence survey

    Directory of Open Access Journals (Sweden)

    Varala Kranthi

    2007-05-01

    Full Text Available Abstract Background Extensive computational and database tools are available to mine genomic and genetic databases for model organisms, but little genomic data is available for many species of ecological or agricultural significance, especially those with large genomes. Genome surveys using conventional sequencing techniques are powerful, particularly for detecting sequences present in many copies per genome. However these methods are time-consuming and have potential drawbacks. High throughput 454 sequencing provides an alternative method by which much information can be gained quickly and cheaply from high-coverage surveys of genomic DNA. Results We sequenced 78 million base-pairs of randomly sheared soybean DNA which passed our quality criteria. Computational analysis of the survey sequences provided global information on the abundant repetitive sequences in soybean. The sequence was used to determine the copy number across regions of large genomic clones or contigs and discover higher-order structures within satellite repeats. We have created an annotated, online database of sequences present in multiple copies in the soybean genome. The low bias of pyrosequencing against repeat sequences is demonstrated by the overall composition of the survey data, which matches well with past estimates of repetitive DNA content obtained by DNA re-association kinetics (Cot analysis. Conclusion This approach provides a potential aid to conventional or shotgun genome assembly, by allowing rapid assessment of copy number in any clone or clone-end sequence. In addition, we show that partial sequencing can provide access to partial protein-coding sequences.

  14. A quasi-dense matching approach and its calibration application with Internet photos.

    Science.gov (United States)

    Wan, Yanli; Miao, Zhenjiang; Wu, Q M Jonathan; Wang, Xifu; Tang, Zhen; Wang, Zhifei

    2015-03-01

    This paper proposes a quasi-dense matching approach to the automatic acquisition of camera parameters, which is required for recovering 3-D information from 2-D images. An affine transformation-based optimization model and a new matching cost function are used to acquire quasi-dense correspondences with high accuracy in each pair of views. These correspondences can be effectively detected and tracked at the sub-pixel level in multiviews with our neighboring view selection strategy. A two-layer iteration algorithm is proposed to optimize 3-D quasi-dense points and camera parameters. In the inner layer, different optimization strategies based on local photometric consistency and a global objective function are employed to optimize the 3-D quasi-dense points and camera parameters, respectively. In the outer layer, quasi-dense correspondences are resampled to guide a new estimation and optimization process of the camera parameters. We demonstrate the effectiveness of our algorithm with several experiments.

  15. saSNP Approach for Scalable SNP Analyses of Multiple Bacterial or Viral Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Gardner, Shea [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Slezak, Tom [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2010-07-27

    With the flood of whole genome finished and draft microbial sequences, we need faster, more scalable bioinformatics tools for sequence comparison. An algorithm is described to find single nucleotide polymorphisms (SNPs) in whole genome data. It scales to hundreds of bacterial or viral genomes, and can be used for finished and/or draft genomes available as unassembled contigs. The method is fast to compute, finding SNPs and building a SNP phylogeny in seconds to hours. We use it to identify thousands of putative SNPs from all publicly available Filoviridae, Poxviridae, foot-and-mouth disease virus, Bacillus, and Escherichia coli genomes and plasmids. The SNP-based trees that result are consistent with known taxonomy and trees determined in other studies. The approach we describe can handle as input hundreds of gigabases of sequence in a single run. The algorithm is based on k-mer analysis using a suffix array, so we call it saSNP.

  16. On Computing Breakpoint Distances for Genomes with Duplicate Genes.

    Science.gov (United States)

    Shao, Mingfu; Moret, Bernard M E

    2017-06-01

    A fundamental problem in comparative genomics is to compute the distance between two genomes in terms of its higher level organization (given by genes or syntenic blocks). For two genomes without duplicate genes, we can easily define (and almost always efficiently compute) a variety of distance measures, but the problem is NP-hard under most models when genomes contain duplicate genes. To tackle duplicate genes, three formulations (exemplar, maximum matching, and any matching) have been proposed, all of which aim to build a matching between homologous genes so as to minimize some distance measure. Of the many distance measures, the breakpoint distance (the number of nonconserved adjacencies) was the first one to be studied and remains of significant interest because of its simplicity and model-free property. The three breakpoint distance problems corresponding to the three formulations have been widely studied. Although we provided last year a solution for the exemplar problem that runs very fast on full genomes, computing optimal solutions for the other two problems has remained challenging. In this article, we describe very fast, exact algorithms for these two problems. Our algorithms rely on a compact integer-linear program that we further simplify by developing an algorithm to remove variables, based on new results on the structure of adjacencies and matchings. Through extensive experiments using both simulations and biological data sets, we show that our algorithms run very fast (in seconds) on mammalian genomes and scale well beyond. We also apply these algorithms (as well as the classic orthology tool MSOAR) to create orthology assignment, then compare their quality in terms of both accuracy and coverage. We find that our algorithm for the "any matching" formulation significantly outperforms other methods in terms of accuracy while achieving nearly maximum coverage.

  17. Functional Genomic Approaches for the Study of Fetal/Placental Development in Swine with Special Emphasis on Imprinted Genes

    Science.gov (United States)

    The overall focus of this chapter will be the application of functional genomic approaches for the study of the imprinted gene family in swine. While there are varied definitions of “functional genomics” in general they focus on the application of genomic approaches such as DNA microarrays, single n...

  18. PRISM offers a comprehensive genomic approach to transcription factor function prediction

    KAUST Repository

    Wenger, A. M.; Clarke, S. L.; Guturu, H.; Chen, J.; Schaar, B. T.; McLean, C. Y.; Bejerano, G.

    2013-01-01

    The human genome encodes 1500-2000 different transcription factors (TFs). ChIP-seq is revealing the global binding profiles of a fraction of TFs in a fraction of their biological contexts. These data show that the majority of TFs bind directly next to a large number of context-relevant target genes, that most binding is distal, and that binding is context specific. Because of the effort and cost involved, ChIP-seq is seldom used in search of novel TF function. Such exploration is instead done using expression perturbation and genetic screens. Here we propose a comprehensive computational framework for transcription factor function prediction. We curate 332 high-quality nonredundant TF binding motifs that represent all major DNA binding domains, and improve cross-species conserved binding site prediction to obtain 3.3 million conserved, mostly distal, binding site predictions. We combine these with 2.4 million facts about all human and mouse gene functions, in a novel statistical framework, in search of enrichments of particular motifs next to groups of target genes of particular functions. Rigorous parameter tuning and a harsh null are used to minimize false positives. Our novel PRISM (predicting regulatory information from single motifs) approach obtains 2543 TF function predictions in a large variety of contexts, at a false discovery rate of 16%. The predictions are highly enriched for validated TF roles, and 45 of 67 (67%) tested binding site regions in five different contexts act as enhancers in functionally matched cells.

  19. PRISM offers a comprehensive genomic approach to transcription factor function prediction

    KAUST Repository

    Wenger, A. M.

    2013-02-04

    The human genome encodes 1500-2000 different transcription factors (TFs). ChIP-seq is revealing the global binding profiles of a fraction of TFs in a fraction of their biological contexts. These data show that the majority of TFs bind directly next to a large number of context-relevant target genes, that most binding is distal, and that binding is context specific. Because of the effort and cost involved, ChIP-seq is seldom used in search of novel TF function. Such exploration is instead done using expression perturbation and genetic screens. Here we propose a comprehensive computational framework for transcription factor function prediction. We curate 332 high-quality nonredundant TF binding motifs that represent all major DNA binding domains, and improve cross-species conserved binding site prediction to obtain 3.3 million conserved, mostly distal, binding site predictions. We combine these with 2.4 million facts about all human and mouse gene functions, in a novel statistical framework, in search of enrichments of particular motifs next to groups of target genes of particular functions. Rigorous parameter tuning and a harsh null are used to minimize false positives. Our novel PRISM (predicting regulatory information from single motifs) approach obtains 2543 TF function predictions in a large variety of contexts, at a false discovery rate of 16%. The predictions are highly enriched for validated TF roles, and 45 of 67 (67%) tested binding site regions in five different contexts act as enhancers in functionally matched cells.

  20. Comparative genome analysis and characterization of the Salmonella Typhimurium strain CCRJ_26 isolated from swine carcasses using whole-genome sequencing approach.

    Science.gov (United States)

    Panzenhagen, P H N; Cabral, C C; Suffys, P N; Franco, R M; Rodrigues, D P; Conte-Junior, C A

    2018-04-01

    Salmonella pathogenicity relies on virulence factors many of which are clustered within the Salmonella pathogenicity islands. Salmonella also harbours mobile genetic elements such as virulence plasmids, prophage-like elements and antimicrobial resistance genes which can contribute to increase its pathogenicity. Here, we have genetically characterized a selected S. Typhimurium strain (CCRJ_26) from our previous study with Multiple Drugs Resistant profile and high-frequency PFGE clonal profile which apparently persists in the pork production centre of Rio de Janeiro State, Brazil. By whole-genome sequencing, we described the strain's genome virulent content and characterized the repertoire of bacterial plasmids, antibiotic resistance genes and prophage-like elements. Here, we have shown evidence that strain CCRJ_26 genome possible represent a virulence-associated phenotype which may be potentially virulent in human infection. Whole-genome sequencing technologies are still costly and remain underexplored for applied microbiology in Brazil. Hence, this genomic description of S. Typhimurium strain CCRJ_26 will provide help in future molecular epidemiological studies. The analysis described here reveals a quick and useful pipeline for bacterial virulence characterization using whole-genome sequencing approach. © 2018 The Society for Applied Microbiology.

  1. Pan-Genome Analysis of Human Gastric Pathogen H. pylori: Comparative Genomics and Pathogenomics Approaches to Identify Regions Associated with Pathogenicity and Prediction of Potential Core Therapeutic Targets

    DEFF Research Database (Denmark)

    Ali, Amjad; Naz, Anam; Soares, Siomar C.

    2015-01-01

    -genome approach; the predicted conserved gene families (1,193) constitute similar to 77% of the average H. pylori genome and 45% of the global gene repertoire of the species. Reverse vaccinology strategies have been adopted to identify and narrow down the potential core-immunogenic candidates. Total of 28 nonhost....... Pan-genome analyses of the global representative H. pylori isolates consisting of 39 complete genomes are presented in this paper. Phylogenetic analyses have revealed close relationships among geographically diverse strains of H. pylori. The conservation among these genomes was further analyzed by pan...

  2. Building a genome database using an object-oriented approach.

    Science.gov (United States)

    Barbasiewicz, Anna; Liu, Lin; Lang, B Franz; Burger, Gertraud

    2002-01-01

    GOBASE is a relational database that integrates data associated with mitochondria and chloroplasts. The most important data in GOBASE, i. e., molecular sequences and taxonomic information, are obtained from the public sequence data repository at the National Center for Biotechnology Information (NCBI), and are validated by our experts. Maintaining a curated genomic database comes with a towering labor cost, due to the shear volume of available genomic sequences and the plethora of annotation errors and omissions in records retrieved from public repositories. Here we describe our approach to increase automation of the database population process, thereby reducing manual intervention. As a first step, we used Unified Modeling Language (UML) to construct a list of potential errors. Each case was evaluated independently, and an expert solution was devised, and represented as a diagram. Subsequently, the UML diagrams were used as templates for writing object-oriented automation programs in the Java programming language.

  3. Equation level matching: An extension of the method of matched asymptotic expansion for problems of wave propagation

    Science.gov (United States)

    Faria, Luiz; Rosales, Rodolfo

    2017-11-01

    We introduce an alternative to the method of matched asymptotic expansions. In the ``traditional'' implementation, approximate solutions, valid in different (but overlapping) regions are matched by using ``intermediate'' variables. Here we propose to match at the level of the equations involved, via a ``uniform expansion'' whose equations enfold those of the approximations to be matched. This has the advantage that one does not need to explicitly solve the asymptotic equations to do the matching, which can be quite impossible for some problems. In addition, it allows matching to proceed in certain wave situations where the traditional approach fails because the time behaviors differ (e.g., one of the expansions does not include dissipation). On the other hand, this approach does not provide the fairly explicit approximations resulting from standard matching. In fact, this is not even its aim, which to produce the ``simplest'' set of equations that capture the behavior. Ruben Rosales work was partially supported by NSF Grants DMS-1614043 and DMS-1719637.

  4. Single-molecule approach to bacterial genomic comparisons via optical mapping.

    Energy Technology Data Exchange (ETDEWEB)

    Zhou, Shiguo [Univ. Wisc.-Madison; Kile, A. [Univ. Wisc.-Madison; Bechner, M. [Univ. Wisc.-Madison; Kvikstad, E. [Univ. Wisc.-Madison; Deng, W. [Univ. Wisc.-Madison; Wei, J. [Univ. Wisc.-Madison; Severin, J. [Univ. Wisc.-Madison; Runnheim, R. [Univ. Wisc.-Madison; Churas, C. [Univ. Wisc.-Madison; Forrest, D. [Univ. Wisc.-Madison; Dimalanta, E. [Univ. Wisc.-Madison; Lamers, C. [Univ. Wisc.-Madison; Burland, V. [Univ. Wisc.-Madison; Blattner, F. R. [Univ. Wisc.-Madison; Schwartz, David C. [Univ. Wisc.-Madison

    2004-01-01

    Modern comparative genomics has been established, in part, by the sequencing and annotation of a broad range of microbial species. To gain further insights, new sequencing efforts are now dealing with the variety of strains or isolates that gives a species definition and range; however, this number vastly outstrips our ability to sequence them. Given the availability of a large number of microbial species, new whole genome approaches must be developed to fully leverage this information at the level of strain diversity that maximize discovery. Here, we describe how optical mapping, a single-molecule system, was used to identify and annotate chromosomal alterations between bacterial strains represented by several species. Since whole-genome optical maps are ordered restriction maps, sequenced strains of Shigella flexneri serotype 2a (2457T and 301), Yersinia pestis (CO 92 and KIM), and Escherichia coli were aligned as maps to identify regions of homology and to further characterize them as possible insertions, deletions, inversions, or translocations. Importantly, an unsequenced Shigella flexneri strain (serotype Y strain AMC[328Y]) was optically mapped and aligned with two sequenced ones to reveal one novel locus implicated in serotype conversion and several other loci containing insertion sequence elements or phage-related gene insertions. Our results suggest that genomic rearrangements and chromosomal breakpoints are readily identified and annotated against a prototypic sequenced strain by using the tools of optical mapping.

  5. Gain-of-function mutagenesis approaches in rice for functional genomics and improvement of crop productivity.

    Science.gov (United States)

    Moin, Mazahar; Bakshi, Achala; Saha, Anusree; Dutta, Mouboni; Kirti, P B

    2017-07-01

    The epitome of any genome research is to identify all the existing genes in a genome and investigate their roles. Various techniques have been applied to unveil the functions either by silencing or over-expressing the genes by targeted expression or random mutagenesis. Rice is the most appropriate model crop for generating a mutant resource for functional genomic studies because of the availability of high-quality genome sequence and relatively smaller genome size. Rice has syntenic relationships with members of other cereals. Hence, characterization of functionally unknown genes in rice will possibly provide key genetic insights and can lead to comparative genomics involving other cereals. The current review attempts to discuss the available gain-of-function mutagenesis techniques for functional genomics, emphasizing the contemporary approach, activation tagging and alterations to this method for the enhancement of yield and productivity of rice. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  6. High-density rhesus macaque oligonucleotide microarray design using early-stage rhesus genome sequence information and human genome annotations

    Directory of Open Access Journals (Sweden)

    Magness Charles L

    2007-01-01

    Full Text Available Abstract Background Until recently, few genomic reagents specific for non-human primate research have been available. To address this need, we have constructed a macaque-specific high-density oligonucleotide microarray by using highly fragmented low-pass sequence contigs from the rhesus genome project together with the detailed sequence and exon structure of the human genome. Using this method, we designed oligonucleotide probes to over 17,000 distinct rhesus/human gene orthologs and increased by four-fold the number of available genes relative to our first-generation expressed sequence tag (EST-derived array. Results We constructed a database containing 248,000 exon sequences from 23,000 human RefSeq genes and compared each human exon with its best matching sequence in the January 2005 version of the rhesus genome project list of 486,000 DNA contigs. Best matching rhesus exon sequences for each of the 23,000 human genes were then concatenated in the proper order and orientation to produce a rhesus "virtual transcriptome." Microarray probes were designed, one per gene, to the region closest to the 3' untranslated region (UTR of each rhesus virtual transcript. Each probe was compared to a composite rhesus/human transcript database to test for cross-hybridization potential yielding a final probe set representing 18,296 rhesus/human gene orthologs, including transcript variants, and over 17,000 distinct genes. We hybridized mRNA from rhesus brain and spleen to both the EST- and genome-derived microarrays. Besides four-fold greater gene coverage, the genome-derived array also showed greater mean signal intensities for genes present on both arrays. Genome-derived probes showed 99.4% identity when compared to 4,767 rhesus GenBank sequence tag site (STS sequences indicating that early stage low-pass versions of complex genomes are of sufficient quality to yield valuable functional genomic information when combined with finished genome information from

  7. Histogram Curve Matching Approaches for Object-based Image Classification of Land Cover and Land Use

    Science.gov (United States)

    Toure, Sory I.; Stow, Douglas A.; Weeks, John R.; Kumar, Sunil

    2013-01-01

    The classification of image-objects is usually done using parametric statistical measures of central tendency and/or dispersion (e.g., mean or standard deviation). The objectives of this study were to analyze digital number histograms of image objects and evaluate classifications measures exploiting characteristic signatures of such histograms. Two histograms matching classifiers were evaluated and compared to the standard nearest neighbor to mean classifier. An ADS40 airborne multispectral image of San Diego, California was used for assessing the utility of curve matching classifiers in a geographic object-based image analysis (GEOBIA) approach. The classifications were performed with data sets having 0.5 m, 2.5 m, and 5 m spatial resolutions. Results show that histograms are reliable features for characterizing classes. Also, both histogram matching classifiers consistently performed better than the one based on the standard nearest neighbor to mean rule. The highest classification accuracies were produced with images having 2.5 m spatial resolution. PMID:24403648

  8. Adapting legume crops to climate change using genomic approaches.

    Science.gov (United States)

    Mousavi-Derazmahalleh, Mahsa; Bayer, Philipp E; Hane, James K; Babu, Valliyodan; Nguyen, Henry T; Nelson, Matthew N; Erskine, William; Varshney, Rajeev K; Papa, Roberto; Edwards, David

    2018-03-30

    Our agricultural system and hence food security is threatened by combination of events, such as increasing population, the impacts of climate change and the need to a more sustainable development. Evolutionary adaptation may help some species to overcome environmental changes through new selection pressures driven by climate change. However, success of evolutionary adaptation is dependent on various factors, one of which is the extent of genetic variation available within species. Genomic approaches provide an exceptional opportunity to identify genetic variation that can be employed in crop improvement programs. In this review, we illustrate some of the routinely used genomics-based methods as well as recent breakthroughs, which facilitate assessment of genetic variation and discovery of adaptive genes in legumes. While additional information is needed, the current utility of selection tools indicate a robust ability to utilize existing variation among legumes to address the challenges of climate uncertainty. This article is protected by copyright. All rights reserved.

  9. BAUM: Improving genome assembly by adaptive unique mapping and local overlap-layout-consensus approach.

    Science.gov (United States)

    Wang, Anqi; Wang, Zhanyu; Li, Zheng; Li, Lei M

    2018-01-15

    It is highly desirable to assemble genomes of high continuity and consistency at low cost. The current bottleneck of draft genome continuity using the Second Generation Sequencing (SGS) reads is primarily caused by uncertainty among repetitive sequences. Even though the Single-Molecule Real-Time sequencing technology is very promising to overcome the uncertainty issue, its relatively high cost and error rate add burden on budget or computation. Many long-read assemblers take the overlap-layout-consensus (OLC) paradigm, which is less sensitive to sequencing errors, heterozygosity and variability of coverage. However, current assemblers of SGS data do not sufficiently take advantage of the OLC approach. Aiming at minimizing uncertainty, the proposed method BAUM, breaks the whole genome into regions by adaptive unique mapping; then the local OLC is used to assemble each region in parallel. BAUM can: (1) perform reference-assisted assembly based on the genome of a close species; (2) or improve the results of existing assemblies that are obtained based on short or long sequencing reads. The tests on two eukaryote genomes, a wild rice Oryza longistaminata and a parrot Melopsittacus undulatus, show that BAUM achieved substantial improvement on genome size and continuity. Besides, BAUM reconstructed a considerable amount of repetitive regions that failed to be assembled by existing short read assemblers. We also propose statistical approaches to control the uncertainty in different steps of BAUM. http://www.zhanyuwang.xin/wordpress/index.php/2017/07/21/baum. lilei@amss.ac.cn. Supplementary data are available at Bioinformatics online. © The Author (2018). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  10. Genome-wide analytical approaches for reverse metabolic engineering of industrially relevant phenotypes in yeast

    Science.gov (United States)

    Oud, Bart; Maris, Antonius J A; Daran, Jean-Marc; Pronk, Jack T

    2012-01-01

    Successful reverse engineering of mutants that have been obtained by nontargeted strain improvement has long presented a major challenge in yeast biotechnology. This paper reviews the use of genome-wide approaches for analysis of Saccharomyces cerevisiae strains originating from evolutionary engineering or random mutagenesis. On the basis of an evaluation of the strengths and weaknesses of different methods, we conclude that for the initial identification of relevant genetic changes, whole genome sequencing is superior to other analytical techniques, such as transcriptome, metabolome, proteome, or array-based genome analysis. Key advantages of this technique over gene expression analysis include the independency of genome sequences on experimental context and the possibility to directly and precisely reproduce the identified changes in naive strains. The predictive value of genome-wide analysis of strains with industrially relevant characteristics can be further improved by classical genetics or simultaneous analysis of strains derived from parallel, independent strain improvement lineages. PMID:22152095

  11. Genome-wide analytical approaches for reverse metabolic engineering of industrially relevant phenotypes in yeast.

    Science.gov (United States)

    Oud, Bart; van Maris, Antonius J A; Daran, Jean-Marc; Pronk, Jack T

    2012-03-01

    Successful reverse engineering of mutants that have been obtained by nontargeted strain improvement has long presented a major challenge in yeast biotechnology. This paper reviews the use of genome-wide approaches for analysis of Saccharomyces cerevisiae strains originating from evolutionary engineering or random mutagenesis. On the basis of an evaluation of the strengths and weaknesses of different methods, we conclude that for the initial identification of relevant genetic changes, whole genome sequencing is superior to other analytical techniques, such as transcriptome, metabolome, proteome, or array-based genome analysis. Key advantages of this technique over gene expression analysis include the independency of genome sequences on experimental context and the possibility to directly and precisely reproduce the identified changes in naive strains. The predictive value of genome-wide analysis of strains with industrially relevant characteristics can be further improved by classical genetics or simultaneous analysis of strains derived from parallel, independent strain improvement lineages. © 2011 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.

  12. Empirically Examining the Performance of Approaches to Multi-Level Matching to Study the Effect of School-Level Interventions

    Science.gov (United States)

    Hallberg, Kelly; Cook, Thomas D.; Figlio, David

    2013-01-01

    The goal of this paper is to provide guidance for applied education researchers in using multi-level data to study the effects of interventions implemented at the school level. Two primary approaches are currently employed in observational studies of the effect of school-level interventions. One approach employs intact school matching: matching…

  13. Genome Modeling System: A Knowledge Management Platform for Genomics.

    Directory of Open Access Journals (Sweden)

    Malachi Griffith

    2015-07-01

    Full Text Available In this work, we present the Genome Modeling System (GMS, an analysis information management system capable of executing automated genome analysis pipelines at a massive scale. The GMS framework provides detailed tracking of samples and data coupled with reliable and repeatable analysis pipelines. The GMS also serves as a platform for bioinformatics development, allowing a large team to collaborate on data analysis, or an individual researcher to leverage the work of others effectively within its data management system. Rather than separating ad-hoc analysis from rigorous, reproducible pipelines, the GMS promotes systematic integration between the two. As a demonstration of the GMS, we performed an integrated analysis of whole genome, exome and transcriptome sequencing data from a breast cancer cell line (HCC1395 and matched lymphoblastoid line (HCC1395BL. These data are available for users to test the software, complete tutorials and develop novel GMS pipeline configurations. The GMS is available at https://github.com/genome/gms.

  14. Whole genome DNA copy number changes identified by high density oligonucleotide arrays

    Directory of Open Access Journals (Sweden)

    Huang Jing

    2004-05-01

    Full Text Available Abstract Changes in DNA copy number are one of the hallmarks of the genetic instability common to most human cancers. Previous micro-array-based methods have been used to identify chromosomal gains and losses; however, they are unable to genotype alleles at the level of single nucleotide polymorphisms (SNPs. Here we describe a novel algorithm that uses a recently developed high-density oligonucleotide array-based SNP genotyping method, whole genome sampling analysis (WGSA, to identify genome-wide chromosomal gains and losses at high resolution. WGSA simultaneously genotypes over 10,000 SNPs by allele-specific hybridisation to perfect match (PM and mismatch (MM probes synthesised on a single array. The copy number algorithm jointly uses PM intensity and discrimination ratios between paired PM and MM intensity values to identify and estimate genetic copy number changes. Values from an experimental sample are compared with SNP-specific distributions derived from a reference set containing over 100 normal individuals to gain statistical power. Genomic regions with statistically significant copy number changes can be identified using both single point analysis and contiguous point analysis of SNP intensities. We identified multiple regions of amplification and deletion using a panel of human breast cancer cell lines. We verified these results using an independent method based on quantitative polymerase chain reaction and found that our approach is both sensitive and specific and can tolerate samples which contain a mixture of both tumour and normal DNA. In addition, by using known allele frequencies from the reference set, statistically significant genomic intervals can be identified containing contiguous stretches of homozygous markers, potentially allowing the detection of regions undergoing loss of heterozygosity (LOH without the need for a matched normal control sample. The coupling of LOH analysis, via SNP genotyping, with copy number

  15. On the analysis of genome-wide association studies in family-based designs: a universal, robust analysis approach and an application to four genome-wide association studies.

    Directory of Open Access Journals (Sweden)

    Sungho Won

    2009-11-01

    Full Text Available For genome-wide association studies in family-based designs, we propose a new, universally applicable approach. The new test statistic exploits all available information about the association, while, by virtue of its design, it maintains the same robustness against population admixture as traditional family-based approaches that are based exclusively on the within-family information. The approach is suitable for the analysis of almost any trait type, e.g. binary, continuous, time-to-onset, multivariate, etc., and combinations of those. We use simulation studies to verify all theoretically derived properties of the approach, estimate its power, and compare it with other standard approaches. We illustrate the practical implications of the new analysis method by an application to a lung-function phenotype, forced expiratory volume in one second (FEV1 in 4 genome-wide association studies.

  16. Identification of meat products by shotgun spectral matching

    DEFF Research Database (Denmark)

    Ohana, D.; Dalebout, H.; Marissen, R. J.

    2016-01-01

    A new method, based on shotgun spectral matching of peptide tandem mass spectra, was successfully applied to the identification of different food species. The method was demonstrated to work on raw as well as processed samples from 16 mammalian and 10 bird species by counting spectral matches...... to spectral libraries in a reference database with one spectral library per species. A phylogenetic tree could also be constructed directly from the spectra. Nearly all samples could be correctly identified at the species level, and 100% at the genus level. The method does not use any genomic information...

  17. Post-genomic approaches to understanding interactions between fungi and their environment.

    Science.gov (United States)

    de Vries, Ronald P; Benoit, Isabelle; Doehlemann, Gunther; Kobayashi, Tetsuo; Magnuson, Jon K; Panisko, Ellen A; Baker, Scott E; Lebrun, Marc-Henri

    2011-06-01

    Fungi inhabit every natural and anthropogenic environment on Earth. They have highly varied life-styles including saprobes (using only dead biomass as a nutrient source), pathogens (feeding on living biomass), and symbionts (co-existing with other organisms). These distinctions are not absolute as many species employ several life styles (e.g. saprobe and opportunistic pathogen, saprobe and mycorrhiza). To efficiently survive in these different and often changing environments, fungi need to be able to modify their physiology and in some cases will even modify their local environment. Understanding the interaction between fungi and their environments has been a topic of study for many decades. However, recently these studies have reached a new dimension. The availability of fungal genomes and development of post-genomic technologies for fungi, such as transcriptomics, proteomics and metabolomics, have enabled more detailed studies into this topic resulting in new insights. Based on a Special Interest Group session held during IMC9, this paper provides examples of the recent advances in using (post-)genomic approaches to better understand fungal interactions with their environments.

  18. An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data.

    Science.gov (United States)

    Jenkinson, Garrett; Abante, Jordi; Feinberg, Andrew P; Goutsias, John

    2018-03-07

    DNA methylation is a stable form of epigenetic memory used by cells to control gene expression. Whole genome bisulfite sequencing (WGBS) has emerged as a gold-standard experimental technique for studying DNA methylation by producing high resolution genome-wide methylation profiles. Statistical modeling and analysis is employed to computationally extract and quantify information from these profiles in an effort to identify regions of the genome that demonstrate crucial or aberrant epigenetic behavior. However, the performance of most currently available methods for methylation analysis is hampered by their inability to directly account for statistical dependencies between neighboring methylation sites, thus ignoring significant information available in WGBS reads. We present a powerful information-theoretic approach for genome-wide modeling and analysis of WGBS data based on the 1D Ising model of statistical physics. This approach takes into account correlations in methylation by utilizing a joint probability model that encapsulates all information available in WGBS methylation reads and produces accurate results even when applied on single WGBS samples with low coverage. Using the Shannon entropy, our approach provides a rigorous quantification of methylation stochasticity in individual WGBS samples genome-wide. Furthermore, it utilizes the Jensen-Shannon distance to evaluate differences in methylation distributions between a test and a reference sample. Differential performance assessment using simulated and real human lung normal/cancer data demonstrate a clear superiority of our approach over DSS, a recently proposed method for WGBS data analysis. Critically, these results demonstrate that marginal methods become statistically invalid when correlations are present in the data. This contribution demonstrates clear benefits and the necessity of modeling joint probability distributions of methylation using the 1D Ising model of statistical physics and of

  19. A robust approach to optimal matched filter design in ultrasonic non-destructive evaluation (NDE)

    Science.gov (United States)

    Li, Minghui; Hayward, Gordon

    2017-02-01

    The matched filter was demonstrated to be a powerful yet efficient technique to enhance defect detection and imaging in ultrasonic non-destructive evaluation (NDE) of coarse grain materials, provided that the filter was properly designed and optimized. In the literature, in order to accurately approximate the defect echoes, the design utilized the real excitation signals, which made it time consuming and less straightforward to implement in practice. In this paper, we present a more robust and flexible approach to optimal matched filter design using the simulated excitation signals, and the control parameters are chosen and optimized based on the real scenario of array transducer, transmitter-receiver system response, and the test sample, as a result, the filter response is optimized and depends on the material characteristics. Experiments on industrial samples are conducted and the results confirm the great benefits of the method.

  20. The Human Genome Project and the social contract: a law policy approach.

    Science.gov (United States)

    Byk, C

    1992-08-01

    For the first time in history, genetics will enable science to completely identify each human as genetically unique. Will this knowledge reinforce the trend for more individual liberties or will it create a 'brave new world'? A law policy approach to the problems raised by the human genome project shows how far our democratic institutions are from being the proper forum to discuss such issues. Because of the fears and anxiety raised in the population, and also because of its wide implications on the everyday life, the human genome analysis more than any other project needs to succeed in setting up such a social assessment.

  1. History Matching in Parallel Computational Environments

    Energy Technology Data Exchange (ETDEWEB)

    Steven Bryant; Sanjay Srinivasan; Alvaro Barrera; Sharad Yadav

    2004-08-31

    In the probabilistic approach for history matching, the information from the dynamic data is merged with the prior geologic information in order to generate permeability models consistent with the observed dynamic data as well as the prior geology. The relationship between dynamic response data and reservoir attributes may vary in different regions of the reservoir due to spatial variations in reservoir attributes, fluid properties, well configuration, flow constrains on wells etc. This implies probabilistic approach should then update different regions of the reservoir in different ways. This necessitates delineation of multiple reservoir domains in order to increase the accuracy of the approach. The research focuses on a probabilistic approach to integrate dynamic data that ensures consistency between reservoir models developed from one stage to the next. The algorithm relies on efficient parameterization of the dynamic data integration problem and permits rapid assessment of the updated reservoir model at each stage. The report also outlines various domain decomposition schemes from the perspective of increasing the accuracy of probabilistic approach of history matching. Research progress in three important areas of the project are discussed: {lg_bullet}Validation and testing the probabilistic approach to incorporating production data in reservoir models. {lg_bullet}Development of a robust scheme for identifying reservoir regions that will result in a more robust parameterization of the history matching process. {lg_bullet}Testing commercial simulators for parallel capability and development of a parallel algorithm for history matching.

  2. RegPredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach

    Energy Technology Data Exchange (ETDEWEB)

    Novichkov, Pavel S.; Rodionov, Dmitry A.; Stavrovskaya, Elena D.; Novichkova, Elena S.; Kazakov, Alexey E.; Gelfand, Mikhail S.; Arkin, Adam P.; Mironov, Andrey A.; Dubchak, Inna

    2010-05-26

    RegPredict web server is designed to provide comparative genomics tools for reconstruction and analysis of microbial regulons using comparative genomics approach. The server allows the user to rapidly generate reference sets of regulons and regulatory motif profiles in a group of prokaryotic genomes. The new concept of a cluster of co-regulated orthologous operons allows the user to distribute the analysis of large regulons and to perform the comparative analysis of multiple clusters independently. Two major workflows currently implemented in RegPredict are: (i) regulon reconstruction for a known regulatory motif and (ii) ab initio inference of a novel regulon using several scenarios for the generation of starting gene sets. RegPredict provides a comprehensive collection of manually curated positional weight matrices of regulatory motifs. It is based on genomic sequences, ortholog and operon predictions from the MicrobesOnline. An interactive web interface of RegPredict integrates and presents diverse genomic and functional information about the candidate regulon members from several web resources. RegPredict is freely accessible at http://regpredict.lbl.gov.

  3. High-speed and high-ratio referential genome compression.

    Science.gov (United States)

    Liu, Yuansheng; Peng, Hui; Wong, Limsoon; Li, Jinyan

    2017-11-01

    The rapidly increasing number of genomes generated by high-throughput sequencing platforms and assembly algorithms is accompanied by problems in data storage, compression and communication. Traditional compression algorithms are unable to meet the demand of high compression ratio due to the intrinsic challenging features of DNA sequences such as small alphabet size, frequent repeats and palindromes. Reference-based lossless compression, by which only the differences between two similar genomes are stored, is a promising approach with high compression ratio. We present a high-performance referential genome compression algorithm named HiRGC. It is based on a 2-bit encoding scheme and an advanced greedy-matching search on a hash table. We compare the performance of HiRGC with four state-of-the-art compression methods on a benchmark dataset of eight human genomes. HiRGC takes compress about 21 gigabytes of each set of the seven target genomes into 96-260 megabytes, achieving compression ratios of 217 to 82 times. This performance is at least 1.9 times better than the best competing algorithm on its best case. Our compression speed is also at least 2.9 times faster. HiRGC is stable and robust to deal with different reference genomes. In contrast, the competing methods' performance varies widely on different reference genomes. More experiments on 100 human genomes from the 1000 Genome Project and on genomes of several other species again demonstrate that HiRGC's performance is consistently excellent. The C ++ and Java source codes of our algorithm are freely available for academic and non-commercial use. They can be downloaded from https://github.com/yuansliu/HiRGC. jinyan.li@uts.edu.au. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  4. Online tuning of impedance matching circuit for long pulse inductively coupled plasma source operation—An alternate approach

    International Nuclear Information System (INIS)

    Sudhir, Dass; Bandyopadhyay, M.; Chakraborty, A.; Kraus, W.; Gahlaut, A.; Bansal, G.

    2014-01-01

    Impedance matching circuit between radio frequency (RF) generator and the plasma load, placed between them, determines the RF power transfer from RF generator to the plasma load. The impedance of plasma load depends on the plasma parameters through skin depth and plasma conductivity or resistivity. Therefore, for long pulse operation of inductively coupled plasmas, particularly for high power (∼100 kW or more) where plasma load condition may vary due to different reasons (e.g., pressure, power, and thermal), online tuning of impedance matching circuit is necessary through feedback. In fusion grade ion source operation, such online methodology through feedback is not present but offline remote tuning by adjusting the matching circuit capacitors and tuning the driving frequency of the RF generator between the ion source operation pulses is envisaged. The present model is an approach for remote impedance tuning methodology for long pulse operation and corresponding online impedance matching algorithm based on RF coil antenna current measurement or coil antenna calorimetric measurement may be useful in this regard

  5. A methodological study of genome-wide DNA methylation analyses using matched archival formalin-fixed paraffin embedded and fresh frozen breast tumors.

    Science.gov (United States)

    Espinal, Allyson C; Wang, Dan; Yan, Li; Liu, Song; Tang, Li; Hu, Qiang; Morrison, Carl D; Ambrosone, Christine B; Higgins, Michael J; Sucheston-Campbell, Lara E

    2017-02-28

    DNA from archival formalin-fixed and paraffin embedded (FFPE) tissue is an invaluable resource for genome-wide methylation studies although concerns about poor quality may limit its use. In this study, we compared DNA methylation profiles of breast tumors using DNA from fresh-frozen (FF) tissues and three types of matched FFPE samples. For 9/10 patients, correlation and unsupervised clustering analysis revealed that the FF and FFPE samples were consistently correlated with each other and clustered into distinct subgroups. Greater than 84% of the top 100 loci previously shown to differentiate ER+ and ER- tumors in FF tissues were also FFPE DML. Weighted Correlation Gene Network Analyses (WCGNA) grouped the DML loci into 16 modules in FF tissue, with ~85% of the module membership preserved across tissue types. Restored FFPE and matched FF samples were profiled using the Illumina Infinium HumanMethylation450K platform. Methylation levels (β-values) across all loci and the top 100 loci previously shown to differentiate tumors by estrogen receptor status (ER+ or ER-) in a larger FF study, were compared between matched FF and FFPE samples using Pearson's correlation, hierarchical clustering and WCGNA. Positive predictive values and sensitivity levels for detecting differentially methylated loci (DML) in FF samples were calculated in an independent FFPE cohort. FFPE breast tumors samples show lower overall detection of DMLs versus FF, however FFPE and FF DMLs compare favorably. These results support the emerging consensus that the 450K platform can be employed to investigate epigenetics in large sets of archival FFPE tissues.

  6. The European Renal Genome Project: An Integrated Approach Towards Understanding the Genetics of Kidney Development and Disease

    OpenAIRE

    Willnow, TE; Antignac, C; Brändli, AW; Christensen, EI; Cox, RD; Davidson, D; Davies, JA; Devuyst, O; Eichele, G; Hastie, ND; Verroust, PJ; Schedl, A; Meij, IC

    2005-01-01

    Rapid progress in genome research creates a wealth of information on the functional annotation of mammalian genome sequences. However, as we accumulate large amounts of scientific information we are facing problems of how to integrate and relate the data produced by various genomic approaches. Here, we propose the novel concept of an organ atlas where diverse data from expression maps to histological findings to mutant phenotypes can be queried, compared and visualized in the context of a thr...

  7. A Quantitative Genomic Approach for Analysis of Fitness and Stress Related Traits in a Drosophila melanogaster Model Population

    Directory of Open Access Journals (Sweden)

    Palle Duun Rohde

    2016-01-01

    Full Text Available The ability of natural populations to withstand environmental stresses relies partly on their adaptive ability. In this study, we used a subset of the Drosophila Genetic Reference Panel, a population of inbred, genome-sequenced lines derived from a natural population of Drosophila melanogaster, to investigate whether this population harbors genetic variation for a set of stress resistance and life history traits. Using a genomic approach, we found substantial genetic variation for metabolic rate, heat stress resistance, expression of a major heat shock protein, and egg-to-adult viability investigated at a benign and a higher stressful temperature. This suggests that these traits will be able to evolve. In addition, we outline an approach to conduct pathway associations based on genomic linear models, which has potential to identify adaptive genes and pathways, and therefore can be a valuable tool in conservation genomics.

  8. Integrative computational approach for genome-based study of microbial lipid-degrading enzymes.

    Science.gov (United States)

    Vorapreeda, Tayvich; Thammarongtham, Chinae; Laoteng, Kobkul

    2016-07-01

    Lipid-degrading or lipolytic enzymes have gained enormous attention in academic and industrial sectors. Several efforts are underway to discover new lipase enzymes from a variety of microorganisms with particular catalytic properties to be used for extensive applications. In addition, various tools and strategies have been implemented to unravel the functional relevance of the versatile lipid-degrading enzymes for special purposes. This review highlights the study of microbial lipid-degrading enzymes through an integrative computational approach. The identification of putative lipase genes from microbial genomes and metagenomic libraries using homology-based mining is discussed, with an emphasis on sequence analysis of conserved motifs and enzyme topology. Molecular modelling of three-dimensional structure on the basis of sequence similarity is shown to be a potential approach for exploring the structural and functional relationships of candidate lipase enzymes. The perspectives on a discriminative framework of cutting-edge tools and technologies, including bioinformatics, computational biology, functional genomics and functional proteomics, intended to facilitate rapid progress in understanding lipolysis mechanism and to discover novel lipid-degrading enzymes of microorganisms are discussed.

  9. Genome-scale metabolic models applied to human health and disease.

    Science.gov (United States)

    Cook, Daniel J; Nielsen, Jens

    2017-11-01

    Advances in genome sequencing, high throughput measurement of gene and protein expression levels, data accessibility, and computational power have allowed genome-scale metabolic models (GEMs) to become a useful tool for understanding metabolic alterations associated with many different diseases. Despite the proven utility of GEMs, researchers confront multiple challenges in the use of GEMs, their application to human health and disease, and their construction and simulation in an organ-specific and disease-specific manner. Several approaches that researchers are taking to address these challenges include using proteomic and transcriptomic-informed methods to build GEMs for individual organs, diseases, and patients and using constraints on model behavior during simulation to match observed metabolic fluxes. We review the challenges facing researchers in the use of GEMs, review the approaches used to address these challenges, and describe advances that are on the horizon and could lead to a better understanding of human metabolism. WIREs Syst Biol Med 2017, 9:e1393. doi: 10.1002/wsbm.1393 For further resources related to this article, please visit the WIREs website. © 2017 Wiley Periodicals, Inc.

  10. Comparative Genomics in Switchgrass Using 61,585 High-Quality Expressed Sequence Tags

    Directory of Open Access Journals (Sweden)

    Christian M. Tobias

    2008-11-01

    Full Text Available The development of genomic resources for switchgrass ( L., a perennial NAD-malic enzyme type C grass, is required to enable molecular breeding and biotechnological approaches for improving its value as a forage and bioenergy crop. Expressed sequence tag (EST sequencing is one method that can quickly sample gene inventories and produce data suitable for marker development or analysis of tissue-specific patterns of expression. Toward this goal, three cDNA libraries from callus, crown, and seedling tissues of ‘Kanlow’ switchgrass were end-sequenced to generate a total of 61,585 high-quality ESTs from 36,565 separate clones. Seventy-three percent of the assembled consensus sequences could be aligned with the sorghum [ (L. Moench] genome at a -value of <1 × 10, indicating a high degree of similarity. Sixty-five percent of the ESTs matched with gene ontology molecular terms, and 3.3% of the sequences were matched with genes that play potential roles in cell-wall biogenesis. The representation in the three libraries of gene families known to be associated with C photosynthesis, cellulose and β-glucan synthesis, phenylpropanoid biosynthesis, and peroxidase activity indicated likely roles for individual family members. Pairwise comparisons of synonymous codon substitutions were used to assess genome sequence diversity and indicated an overall similarity between the two genome copies present in the tetraploid. Identification of EST–simple sequence repeat markers and amplification on two individual parents of a mapping population yielded an average of 2.18 amplicons per individual, and 35% of the markers produced fragment length polymorphisms.

  11. Evaluation and Validation of Assembling Corrected PacBio Long Reads for Microbial Genome Completion via Hybrid Approaches.

    Science.gov (United States)

    Lin, Hsin-Hung; Liao, Yu-Chieh

    2015-01-01

    Despite the ever-increasing output of next-generation sequencing data along with developing assemblers, dozens to hundreds of gaps still exist in de novo microbial assemblies due to uneven coverage and large genomic repeats. Third-generation single-molecule, real-time (SMRT) sequencing technology avoids amplification artifacts and generates kilobase-long reads with the potential to complete microbial genome assembly. However, due to the low accuracy (~85%) of third-generation sequences, a considerable amount of long reads (>50X) are required for self-correction and for subsequent de novo assembly. Recently-developed hybrid approaches, using next-generation sequencing data and as few as 5X long reads, have been proposed to improve the completeness of microbial assembly. In this study we have evaluated the contemporary hybrid approaches and demonstrated that assembling corrected long reads (by runCA) produced the best assembly compared to long-read scaffolding (e.g., AHA, Cerulean and SSPACE-LongRead) and gap-filling (SPAdes). For generating corrected long reads, we further examined long-read correction tools, such as ECTools, LSC, LoRDEC, PBcR pipeline and proovread. We have demonstrated that three microbial genomes including Escherichia coli K12 MG1655, Meiothermus ruber DSM1279 and Pdeobacter heparinus DSM2366 were successfully hybrid assembled by runCA into near-perfect assemblies using ECTools-corrected long reads. In addition, we developed a tool, Patch, which implements corrected long reads and pre-assembled contigs as inputs, to enhance microbial genome assemblies. With the additional 20X long reads, short reads of S. cerevisiae W303 were hybrid assembled into 115 contigs using the verified strategy, ECTools + runCA. Patch was subsequently applied to upgrade the assembly to a 35-contig draft genome. Our evaluation of the hybrid approaches shows that assembling the ECTools-corrected long reads via runCA generates near complete microbial genomes, suggesting

  12. Mapping topographic plant location properties using a dense matching approach

    Science.gov (United States)

    Niederheiser, Robert; Rutzinger, Martin; Lamprecht, Andrea; Bardy-Durchhalter, Manfred; Pauli, Harald; Winkler, Manuela

    2017-04-01

    Within the project MEDIALPS (Disentangling anthropogenic drivers of climate change impacts on alpine plant species: Alps vs. Mediterranean mountains) six regions in Alpine and in Mediterranean mountain regions are investigated to assess how plant species respond to climate change. The project is embedded in the Global Observation Research Initiative in Alpine Environments (GLORIA), which is a well-established global monitoring initiative for systematic observation of changes in the plant species composition and soil temperature on mountain summits worldwide to discern accelerating climate change pressures on these fragile alpine ecosystems. Close-range sensing techniques such as terrestrial photogrammetry are well suited for mapping terrain topography of small areas with high resolution. Lightweight equipment, flexible positioning for image acquisition in the field, and independence on weather conditions (i.e. wind) make this a feasible method for in-situ data collection. New developments of dense matching approaches allow high quality 3D terrain mapping with less requirements for field set-up. However, challenges occur in post-processing and required data storage if many sites have to be mapped. Within MEDIALPS dense matching is used for mapping high resolution topography for 284 3x3 meter plots deriving information on vegetation coverage, roughness, slope, aspect and modelled solar radiation. This information helps identifying types of topography-dependent ecological growing conditions and evaluating the potential for existing refugial locations for specific plant species under climate change. This research is conducted within the project MEDIALPS - Disentangling anthropogenic drivers of climate change impacts on alpine plant species: Alps vs. Mediterranean mountains funded by the Earth System Sciences Programme of the Austrian Academy of Sciences.

  13. An optimized electroporation approach for efficient CRISPR/Cas9 genome editing in murine zygotes.

    Directory of Open Access Journals (Sweden)

    Simon E Tröder

    Full Text Available Electroporation of zygotes represents a rapid alternative to the elaborate pronuclear injection procedure for CRISPR/Cas9-mediated genome editing in mice. However, current protocols for electroporation either require the investment in specialized electroporators or corrosive pre-treatment of zygotes which compromises embryo viability. Here, we describe an easily adaptable approach for the introduction of specific mutations in C57BL/6 mice by electroporation of intact zygotes using a common electroporator with synthetic CRISPR/Cas9 components and minimal technical requirement. Direct comparison to conventional pronuclear injection demonstrates significantly reduced physical damage and thus improved embryo development with successful genome editing in up to 100% of living offspring. Hence, our novel approach for Easy Electroporation of Zygotes (EEZy allows highly efficient generation of CRISPR/Cas9 transgenic mice while reducing the numbers of animals required.

  14. Alignment-free phylogeny of whole genomes using underlying subwords

    Directory of Open Access Journals (Sweden)

    Comin Matteo

    2012-12-01

    Full Text Available Abstract Background With the progress of modern sequencing technologies a large number of complete genomes are now available. Traditionally the comparison of two related genomes is carried out by sequence alignment. There are cases where these techniques cannot be applied, for example if two genomes do not share the same set of genes, or if they are not alignable to each other due to low sequence similarity, rearrangements and inversions, or more specifically to their lengths when the organisms belong to different species. For these cases the comparison of complete genomes can be carried out only with ad hoc methods that are usually called alignment-free methods. Methods In this paper we propose a distance function based on subword compositions called Underlying Approach (UA. We prove that the matching statistics, a popular concept in the field of string algorithms able to capture the statistics of common words between two sequences, can be derived from a small set of “independent” subwords, namely the irredundant common subwords. We define a distance-like measure based on these subwords, such that each region of genomes contributes only once, thus avoiding to count shared subwords a multiple number of times. In a nutshell, this filter discards subwords occurring in regions covered by other more significant subwords. Results The Underlying Approach (UA builds a scoring function based on this set of patterns, called underlying. We prove that this set is by construction linear in the size of input, without overlaps, and can be efficiently constructed. Results show the validity of our method in the reconstruction of phylogenetic trees, where the Underlying Approach outperforms the current state of the art methods. Moreover, we show that the accuracy of UA is achieved with a very small number of subwords, which in some cases carry meaningful biological information. Availability http://www.dei.unipd.it/∼ciompin/main/underlying.html

  15. Genomics approaches in the understanding of Entamoeba ...

    African Journals Online (AJOL)

    Entamoeba histolytica is the intestinal protozoan parasite responsible for amebic colitis and liver abscesses, which cause mortality in many developing countries. The sequencing of the parasite genome provides new insights into the cellular workings and genome evolution of this major human pathogen. Here, we reviewed ...

  16. Faustoviruses: Comparative genomics of new Megavirales family members

    Directory of Open Access Journals (Sweden)

    Samia eBenamar

    2016-02-01

    Full Text Available An emerging interest for the giant virus discovery process, genome sequencing and analysis has allowed an expansion of the number of known Megavirales members. Using the protist Vermamoeba sp. as cell support, a new giant virus named Faustovirus has been isolated. In this study, we describe the genome sequences of nine Faustoviruses and build a genomic comparison in order to have a comprehensive overview of genomic composition and diversity among this new virus family. The average sequence length of these viruses is 467,592.44 bp (ranging from 455,803 bp to 491,024 bp, making them the fourth largest Megavirales genome after Mimiviruses, Pandoraviruses and Pithovirus sibericum. Faustovirus genomes displayed an average G+C content of 37.14 % (ranging from 36.22% to 39.59% which is close to the G+C content range of the Asfarviridae genomes (38%. The proportion of best matches and the phylogenetic analysis suggest a shared origin with Asfarviridae without belonging to the same family. The core-gene-based phylogeny of Faustoviruses study has identified four lineages. These results were confirmed by the analysis of amino acids and COGs category distribution. The diversity of the gene composition of these lineages is mainly explained by gene deletion or acquisition and some exceptions for gene duplications. The high proportion of best matches from Bacteria and Phycodnaviridae on the pan-genome and unique genes may be explained by an interaction occurring after the separation of the lineages. The Faustovirus core-genome appears to consolidate the surrounding of 207 genes whereas the pan-genome is described as an open pan-genome, its enrichment via the discovery of new Faustoviruses is required to better seize all the genomic diversity of this family.

  17. Tracembler – software for in-silico chromosome walking in unassembled genomes

    Directory of Open Access Journals (Sweden)

    Wilkerson Matthew D

    2007-05-01

    Full Text Available Abstract Background Whole genome shotgun sequencing produces increasingly higher coverage of a genome with random sequence reads. Progressive whole genome assembly and eventual finishing sequencing is a process that typically takes several years for large eukaryotic genomes. In the interim, all sequence reads of public sequencing projects are made available in repositories such as the NCBI Trace Archive. For a particular locus, sequencing coverage may be high enough early on to produce a reliable local genome assembly. We have developed software, Tracembler, that facilitates in silico chromosome walking by recursively assembling reads of a selected species from the NCBI Trace Archive starting with reads that significantly match sequence seeds supplied by the user. Results Tracembler takes one or multiple DNA or protein sequence(s as input to the NCBI Trace Archive BLAST engine to identify matching sequence reads from a species of interest. The BLAST searches are carried out recursively such that BLAST matching sequences identified in previous rounds of searches are used as new queries in subsequent rounds of BLAST searches. The recursive BLAST search stops when either no more new matching sequences are found, a given maximal number of queries is exhausted, or a specified maximum number of rounds of recursion is reached. All the BLAST matching sequences are then assembled into contigs based on significant sequence overlaps using the CAP3 program. We demonstrate the validity of the concept and software implementation with an example of successfully recovering a full-length Chrm2 gene as well as its upstream and downstream genomic regions from Rattus norvegicus reads. In a second example, a query with two adjacent Medicago truncatula genes as seeds resulted in a contig that likely identifies the microsyntenic homologous soybean locus. Conclusion Tracembler streamlines the process of recursive database searches, sequence assembly, and gene

  18. Centromere Locations in Brassica A and C Genomes Revealed Through Half-Tetrad Analysis.

    Science.gov (United States)

    Mason, Annaliese S; Rousseau-Gueutin, Mathieu; Morice, Jérôme; Bayer, Philipp E; Besharat, Naghmeh; Cousin, Anouska; Pradhan, Aneeta; Parkin, Isobel A P; Chèvre, Anne-Marie; Batley, Jacqueline; Nelson, Matthew N

    2016-02-01

    Locating centromeres on genome sequences can be challenging. The high density of repetitive elements in these regions makes sequence assembly problematic, especially when using short-read sequencing technologies. It can also be difficult to distinguish between active and recently extinct centromeres through sequence analysis. An effective solution is to identify genetically active centromeres (functional in meiosis) by half-tetrad analysis. This genetic approach involves detecting heterozygosity along chromosomes in segregating populations derived from gametes (half-tetrads). Unreduced gametes produced by first division restitution mechanisms comprise complete sets of nonsister chromatids. Along these chromatids, heterozygosity is maximal at the centromeres, and homologous recombination events result in homozygosity toward the telomeres. We genotyped populations of half-tetrad-derived individuals (from Brassica interspecific hybrids) using a high-density array of physically anchored SNP markers (Illumina Brassica 60K Infinium array). Mapping the distribution of heterozygosity in these half-tetrad individuals allowed the genetic mapping of all 19 centromeres of the Brassica A and C genomes to the reference Brassica napus genome. Gene and transposable element density across the B. napus genome were also assessed and corresponded well to previously reported genetic map positions. Known centromere-specific sequences were located in the reference genome, but mostly matched unanchored sequences, suggesting that the core centromeric regions may not yet be assembled into the pseudochromosomes of the reference genome. The increasing availability of genetic markers physically anchored to reference genomes greatly simplifies the genetic and physical mapping of centromeres using half-tetrad analysis. We discuss possible applications of this approach, including in species where half-tetrads are currently difficult to isolate. Copyright © 2016 by the Genetics Society of America.

  19. GenomePeek—an online tool for prokaryotic genome and metagenome analysis

    Directory of Open Access Journals (Sweden)

    Katelyn McNair

    2015-06-01

    Full Text Available As more and more prokaryotic sequencing takes place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations; such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping error rates low, as well as offering unique data visualization options.

  20. Evaluation of multiple approaches to identify genome-wide polymorphisms in closely related genotypes of sweet cherry (Prunus avium L.

    Directory of Open Access Journals (Sweden)

    Seanna Hewitt

    Full Text Available Identification of genetic polymorphisms and subsequent development of molecular markers is important for marker assisted breeding of superior cultivars of economically important species. Sweet cherry (Prunus avium L. is an economically important non-climacteric tree fruit crop in the Rosaceae family and has undergone a genetic bottleneck due to breeding, resulting in limited genetic diversity in the germplasm that is utilized for breeding new cultivars. Therefore, it is critical to recognize the best platforms for identifying genome-wide polymorphisms that can help identify, and consequently preserve, the diversity in a genetically constrained species. For the identification of polymorphisms in five closely related genotypes of sweet cherry, a gel-based approach (TRAP, reduced representation sequencing (TRAPseq, a 6k cherry SNParray, and whole genome sequencing (WGS approaches were evaluated in the identification of genome-wide polymorphisms in sweet cherry cultivars. All platforms facilitated detection of polymorphisms among the genotypes with variable efficiency. In assessing multiple SNP detection platforms, this study has demonstrated that a combination of appropriate approaches is necessary for efficient polymorphism identification, especially between closely related cultivars of a species. The information generated in this study provides a valuable resource for future genetic and genomic studies in sweet cherry, and the insights gained from the evaluation of multiple approaches can be utilized for other closely related species with limited genetic diversity in the breeding germplasm. Keywords: Polymorphisms, Prunus avium, Next-generation sequencing, Target region amplification polymorphism (TRAP, Genetic diversity, SNParray, Reduced representation sequencing, Whole genome sequencing (WGS

  1. A quantitative experimental paradigm to optimize construction of rank order lists in the National Resident Matching Program: the ROSS-MOORE approach.

    Science.gov (United States)

    Ross, David A; Moore, Edward Z

    2013-09-01

    As part of the National Resident Matching Program, programs must submit a rank order list of desired applicants. Despite the importance of this process and the numerous manifest limitations with traditional approaches, minimal research has been conducted to examine the accuracy of different ranking strategies. The authors developed the Moore Optimized Ordinal Rank Estimator (MOORE), a novel algorithm for ranking applicants that is based on college sports ranking systems. Because it is not possible to study the Match in vivo, the authors then designed the Recruitment Outcomes Simulation System (ROSS). This program was used to simulate a series of interview seasons and to compare MOORE and traditional approaches under different conditions. The accuracy of traditional ranking and the MOORE approach are equally and adversely affected with higher levels of intrarater variability. However, compared with traditional ranking methods, MOORE produces a more accurate rank order list as interrater variability increases. The present data demonstrate three key findings. First, they provide proof of concept that it is possible to scientifically test the accuracy of different rank methods used in the Match. Second, they show that small amounts of variability can have a significant adverse impact on the accuracy of rank order lists. Finally, they demonstrate that an ordinal approach may lead to a more accurate rank order list in the presence of interviewer bias. The ROSS-MOORE approach offers programs a novel way to optimize the recruitment process and, potentially, to construct a more accurate rank order list.

  2. Repeat associated mechanisms of genome evolution and function revealed by the Mus caroli and Mus pahari genomes

    Science.gov (United States)

    Thybert, David; Roller, Maša; Navarro, Fábio C.P.; Fiddes, Ian; Streeter, Ian; Feig, Christine; Martin-Galvez, David; Kolmogorov, Mikhail; Janoušek, Václav; Akanni, Wasiu; Aken, Bronwen; Aldridge, Sarah; Chakrapani, Varshith; Chow, William; Clarke, Laura; Cummins, Carla; Doran, Anthony; Dunn, Matthew; Goodstadt, Leo; Howe, Kerstin; Howell, Matthew; Josselin, Ambre-Aurore; Karn, Robert C.; Laukaitis, Christina M.; Jingtao, Lilue; Martin, Fergal; Muffato, Matthieu; Nachtweide, Stefanie; Quail, Michael A.; Sisu, Cristina; Stanke, Mario; Stefflova, Klara; Van Oosterhout, Cock; Veyrunes, Frederic; Ward, Ben; Yang, Fengtang; Yazdanifar, Golbahar; Zadissa, Amonida; Adams, David J.; Brazma, Alvis; Gerstein, Mark; Paten, Benedict; Pham, Son; Keane, Thomas M.; Odom, Duncan T.; Flicek, Paul

    2018-01-01

    Understanding the mechanisms driving lineage-specific evolution in both primates and rodents has been hindered by the lack of sister clades with a similar phylogenetic structure having high-quality genome assemblies. Here, we have created chromosome-level assemblies of the Mus caroli and Mus pahari genomes. Together with the Mus musculus and Rattus norvegicus genomes, this set of rodent genomes is similar in divergence times to the Hominidae (human-chimpanzee-gorilla-orangutan). By comparing the evolutionary dynamics between the Muridae and Hominidae, we identified punctate events of chromosome reshuffling that shaped the ancestral karyotype of Mus musculus and Mus caroli between 3 and 6 million yr ago, but that are absent in the Hominidae. Hominidae show between four- and sevenfold lower rates of nucleotide change and feature turnover in both neutral and functional sequences, suggesting an underlying coherence to the Muridae acceleration. Our system of matched, high-quality genome assemblies revealed how specific classes of repeats can play lineage-specific roles in related species. Recent LINE activity has remodeled protein-coding loci to a greater extent across the Muridae than the Hominidae, with functional consequences at the species level such as reproductive isolation. Furthermore, we charted a Muridae-specific retrotransposon expansion at unprecedented resolution, revealing how a single nucleotide mutation transformed a specific SINE element into an active CTCF binding site carrier specifically in Mus caroli, which resulted in thousands of novel, species-specific CTCF binding sites. Our results show that the comparison of matched phylogenetic sets of genomes will be an increasingly powerful strategy for understanding mammalian biology. PMID:29563166

  3. Single Cell HLA Matching Feasibility by Whole Genomic Amplification and Nested PCR

    Institute of Scientific and Technical Information of China (English)

    Xiao-hong Li; Fang-yin Meng

    2004-01-01

    @@ PCR based single-cell DNA analysis has been widely used in forensic science, preimplantation genetic diagnosis and so on. However, the original sample cannot be efficiently retrieved following single cell PCR, consequently the amount of information gained is limited. HLA system is too sophisticated that it is very hard to complete HLA typing by single cell. A Taq polymerase-based method using random primers to amplify whole genome termed as whole genome amplification (WGA) has demonstrated to be a useful method in increasing the copies of minimum sample. We establish a technique in this study to amplify HLA-A and HLA-B loci at same time in a single cell using WGA.

  4. Draft Sequencing of the Heterozygous Diploid Genome of Satsuma (Citrus unshiu Marc. Using a Hybrid Assembly Approach

    Directory of Open Access Journals (Sweden)

    Tokurou Shimizu

    2017-12-01

    Full Text Available Satsuma (Citrus unshiu Marc. is one of the most abundantly produced mandarin varieties of citrus, known for its seedless fruit production and as a breeding parent of citrus. De novo assembly of the heterozygous diploid genome of Satsuma (“Miyagawa Wase” was conducted by a hybrid assembly approach using short-read sequences, three mate-pair libraries, and a long-read sequence of PacBio by the PLATANUS assembler. The assembled sequence, with a total size of 359.7 Mb at the N50 length of 386,404 bp, consisted of 20,876 scaffolds. Pseudomolecules of Satsuma constructed by aligning the scaffolds to three genetic maps showed genome-wide synteny to the genomes of Clementine, pummelo, and sweet orange. Gene prediction by modeling with MAKER-P proposed 29,024 genes and 37,970 mRNA; additionally, gene prediction analysis found candidates for novel genes in several biosynthesis pathways for gibberellin and violaxanthin catabolism. BUSCO scores for the assembled scaffold and predicted transcripts, and another analysis by BAC end sequence mapping indicated the assembled genome consistency was close to those of the haploid Clementine, pummel, and sweet orange genomes. The number of repeat elements and long terminal repeat retrotransposon were comparable to those of the seven citrus genomes; this suggested no significant failure in the assembly at the repeat region. A resequencing application using the assembled sequence confirmed that both kunenbo-A and Satsuma are offsprings of Kishu, and Satsuma is a back-crossed offspring of Kishu. These results illustrated the performance of the hybrid assembly approach and its ability to construct an accurate heterozygous diploid genome.

  5. Exploiting Best-Match Equations for Efficient Reinforcement Learning

    NARCIS (Netherlands)

    van Seijen, Harm; Whiteson, Shimon; van Hasselt, Hado; Wiering, Marco

    This article presents and evaluates best-match learning, a new approach to reinforcement learning that trades off the sample efficiency of model-based methods with the space efficiency of model-free methods. Best-match learning works by approximating the solution to a set of best-match equations,

  6. Efficient Breeding by Genomic Mating.

    Science.gov (United States)

    Akdemir, Deniz; Sánchez, Julio I

    2016-01-01

    Selection in breeding programs can be done by using phenotypes (phenotypic selection), pedigree relationship (breeding value selection) or molecular markers (marker assisted selection or genomic selection). All these methods are based on truncation selection, focusing on the best performance of parents before mating. In this article we proposed an approach to breeding, named genomic mating, which focuses on mating instead of truncation selection. Genomic mating uses information in a similar fashion to genomic selection but includes information on complementation of parents to be mated. Following the efficiency frontier surface, genomic mating uses concepts of estimated breeding values, risk (usefulness) and coefficient of ancestry to optimize mating between parents. We used a genetic algorithm to find solutions to this optimization problem and the results from our simulations comparing genomic selection, phenotypic selection and the mating approach indicate that current approach for breeding complex traits is more favorable than phenotypic and genomic selection. Genomic mating is similar to genomic selection in terms of estimating marker effects, but in genomic mating the genetic information and the estimated marker effects are used to decide which genotypes should be crossed to obtain the next breeding population.

  7. Experimental Approaches to Study Genome Packaging of Influenza A Viruses

    Directory of Open Access Journals (Sweden)

    Catherine Isel

    2016-08-01

    Full Text Available The genome of influenza A viruses (IAV consists of eight single-stranded negative sense viral RNAs (vRNAs encapsidated into viral ribonucleoproteins (vRNPs. It is now well established that genome packaging (i.e., the incorporation of a set of eight distinct vRNPs into budding viral particles, follows a specific pathway guided by segment-specific cis-acting packaging signals on each vRNA. However, the precise nature and function of the packaging signals, and the mechanisms underlying the assembly of vRNPs into sub-bundles in the cytoplasm and their selective packaging at the viral budding site, remain largely unknown. Here, we review the diverse and complementary methods currently being used to elucidate these aspects of the viral cycle. They range from conventional and competitive reverse genetics, single molecule imaging of vRNPs by fluorescence in situ hybridization (FISH and high-resolution electron microscopy and tomography of budding viral particles, to solely in vitro approaches to investigate vRNA-vRNA interactions at the molecular level.

  8. The Drosophila genome nexus: a population genomic resource of 623 Drosophila melanogaster genomes, including 197 from a single ancestral range population.

    Science.gov (United States)

    Lack, Justin B; Cardeno, Charis M; Crepeau, Marc W; Taylor, William; Corbett-Detig, Russell B; Stevens, Kristian A; Langley, Charles H; Pool, John E

    2015-04-01

    Hundreds of wild-derived Drosophila melanogaster genomes have been published, but rigorous comparisons across data sets are precluded by differences in alignment methodology. The most common approach to reference-based genome assembly is a single round of alignment followed by quality filtering and variant detection. We evaluated variations and extensions of this approach and settled on an assembly strategy that utilizes two alignment programs and incorporates both substitutions and short indels to construct an updated reference for a second round of mapping prior to final variant detection. Utilizing this approach, we reassembled published D. melanogaster population genomic data sets and added unpublished genomes from several sub-Saharan populations. Most notably, we present aligned data from phase 3 of the Drosophila Population Genomics Project (DPGP3), which provides 197 genomes from a single ancestral range population of D. melanogaster (from Zambia). The large sample size, high genetic diversity, and potentially simpler demographic history of the DPGP3 sample will make this a highly valuable resource for fundamental population genetic research. The complete set of assemblies described here, termed the Drosophila Genome Nexus, presently comprises 623 consistently aligned genomes and is publicly available in multiple formats with supporting documentation and bioinformatic tools. This resource will greatly facilitate population genomic analysis in this model species by reducing the methodological differences between data sets. Copyright © 2015 by the Genetics Society of America.

  9. Genome-wide DNA Methylation Profiling of Cell-Free Serum DNA in Esophageal Adenocarcinoma and Barrett Esophagus

    Directory of Open Access Journals (Sweden)

    Rihong Zhai

    2012-01-01

    Full Text Available Aberrant DNA methylation (DNAm is a feature of most types of cancers. Genome-wide DNAm profiling has been performed successfully on tumor tissue DNA samples. However, the invasive procedure limits the utility of tumor tissue for epidemiological studies. While recent data indicate that cell-free circulating DNAm (cfDNAm profiles reflect DNAm status in corresponding tumor tissues, no studies have examined the association of cfDNAm with cancer or precursors on a genome-wide scale. The objective of this pilot study was to evaluate the putative significance of genome-wide cfDNAm profiles in esophageal adenocarcinoma (EA and Barrett esophagus (BE, EA precursor. We performed genome-wide DNAm profiling in EA tissue DNA (n = 8 and matched serum DNA (n = 8, in serum DNA of BE (n = 10, and in healthy controls (n = 10 using the Infinium HumanMethylation27 BeadChip that covers 27,578 CpG loci in 14,495 genes. We found that cfDNAm profiles were highly correlated to DNAm profiles in matched tumor tissue DNA (r = 0.92 in patients with EA. We selected the most differentially methylated loci to perform hierarchical clustering analysis. We found that 911 loci can discriminate perfectly between EA and control samples, 554 loci can separate EA from BE samples, and 46 loci can distinguish BE from control samples. These results suggest that genome-wide cfDNAm profiles are highly consistent with DNAm profiles detected in corresponding tumor tissues. Differential cfDNAm profiling may be a useful approach for the noninvasive screening of EA and EA premalignant lesions.

  10. Coarse-to-fine region selection and matching

    KAUST Repository

    Yang, Yanchao

    2015-10-15

    We present a new approach to wide baseline matching. We propose to use a hierarchical decomposition of the image domain and coarse-to-fine selection of regions to match. In contrast to interest point matching methods, which sample salient regions to reduce the cost of comparing all regions in two images, our method eliminates regions systematically to achieve efficiency. One advantage of our approach is that it is not restricted to covariant salient regions, which is too restrictive under large viewpoint and leads to few corresponding regions. Affine invariant matching of regions in the hierarchy is achieved efficiently by a coarse-to-fine search of the affine space. Experiments on two benchmark datasets shows that our method finds more correct correspondence of the image (with fewer false alarms) than other wide baseline methods on large viewpoint change. © 2015 IEEE.

  11. The benefits of a laparoscopic approach in ileal pouch anal anastomosis formation: a single institutional retrospective case-matched experience.

    LENUS (Irish Health Repository)

    Kelly, J

    2010-06-01

    A laparoscopic approach to ileoanal pouch formation is novel. By using prospectively gathered data, laparoscopic and open restorative proctocolectomy procedures in mucosal ulcerative colitis (UC) and familial adenomatous polyposis (FAP) patients were compared using a case-matched design.

  12. A pan-genomic approach to understand the basis of host adaptation in Achromobacter.

    Science.gov (United States)

    Jeukens, J; Freschi, L; Vincent, A T; Emond-Rheault, J G; Kukavica-Ibrulj, I; Charette, S J; Levesque, R C

    2017-04-05

    Over the past decade, there has been a rising interest in Achromobacter sp., an emerging opportunistic pathogen responsible for nosocomial and cystic fibrosis (CF) lung infections. Species of this genus are ubiquitous in the environment, can outcompete resident microbiota, and are resistant to commonly used disinfectants as well as antibiotics. Nevertheless, the Achromobacter genus suffers from difficulties in diagnosis, unresolved taxonomy and limited understanding of how it adapts to the CF lung, not to mention other host environments. The goals of this first genus-wide comparative genomics study were to clarify the taxonomy of this genus and identify genomic features associated with pathogenicity and host adaptation. This was done with a widely applicable approach based on pan-genome analysis. First, using all publicly available genomes, a combination of phylogenetic analysis based on 1,780 conserved genes with average nucleotide identity and accessory genome composition allowed the identification of a largely clinical lineage composed of A. xylosoxidans A insuavis A. dolens and A. ruhlandii. Within this lineage, we identified 35 positively selected genes involved in metabolism, regulation and efflux-mediated antibiotic resistance. Second, resistome analysis showed that this clinical lineage carried additional antibiotic resistance genes compared to other isolates. Finally, we identified putative mobile elements that contribute 53% of the genus's resistome and support horizontal gene transfer between Achromobacter and other ecologically similar genera. This study provides strong phylogenetic and pan-genomic bases to motivate further research on Achromobacter, and contributes to the understanding of opportunistic pathogen evolution. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  13. Genome projects and the functional-genomic era.

    Science.gov (United States)

    Sauer, Sascha; Konthur, Zoltán; Lehrach, Hans

    2005-12-01

    The problems we face today in public health as a result of the -- fortunately -- increasing age of people and the requirements of developing countries create an urgent need for new and innovative approaches in medicine and in agronomics. Genomic and functional genomic approaches have a great potential to at least partially solve these problems in the future. Important progress has been made by procedures to decode genomic information of humans, but also of other key organisms. The basic comprehension of genomic information (and its transfer) should now give us the possibility to pursue the next important step in life science eventually leading to a basic understanding of biological information flow; the elucidation of the function of all genes and correlative products encoded in the genome, as well as the discovery of their interactions in a molecular context and the response to environmental factors. As a result of the sequencing projects, we are now able to ask important questions about sequence variation and can start to comprehensively study the function of expressed genes on different levels such as RNA, protein or the cell in a systematic context including underlying networks. In this article we review and comment on current trends in large-scale systematic biological research. A particular emphasis is put on technology developments that can provide means to accomplish the tasks of future lines of functional genomics.

  14. A universal genomic coordinate translator for comparative genomics.

    Science.gov (United States)

    Zamani, Neda; Sundström, Görel; Meadows, Jennifer R S; Höppner, Marc P; Dainat, Jacques; Lantz, Henrik; Haas, Brian J; Grabherr, Manfred G

    2014-06-30

    Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved order of genomic sequences. However, a comprehensive analysis of duplication events and their contributions to evolution would require all-to-all genome alignments, which increases at N2 with the number of available genomes, N. Here, we introduce Kraken, software that omits the all-to-all requirement by recursively traversing a graph of pairwise alignments and dynamically re-computing orthology. Kraken scales linearly with the number of targeted genomes, N, which allows for including large numbers of genomes in analyses. We first evaluated the method on the set of 12 Drosophila genomes, finding that orthologous correspondence computed indirectly through a graph of multiple synteny maps comes at minimal cost in terms of sensitivity, but reduces overall computational runtime by an order of magnitude. We then used the method on three well-annotated mammalian genomes, human, mouse, and rat, and show that up to 93% of protein coding transcripts have unambiguous pairwise orthologous relationships across the genomes. On a nucleotide level, 70 to 83% of exons match exactly at both splice junctions, and up to 97% on at least one junction. We last applied Kraken to an RNA-sequencing dataset from multiple vertebrates and diverse tissues, where we confirmed that brain-specific gene family members, i.e. one-to-many or many-to-many homologs, are more highly correlated across species than single-copy (i.e. one-to-one homologous) genes. Not limited to protein coding genes, Kraken also identifies thousands of newly identified transcribed loci, likely non-coding RNAs that are consistently transcribed in human, chimpanzee and gorilla, and maintain significant correlation of expression levels across

  15. GRAbB: Selective Assembly of Genomic Regions, a New Niche for Genomic Research.

    Directory of Open Access Journals (Sweden)

    Balázs Brankovics

    2016-06-01

    Full Text Available GRAbB (Genomic Region Assembly by Baiting is a new program that is dedicated to assemble specific genomic regions from NGS data. This approach is especially useful when dealing with multi copy regions, such as mitochondrial genome and the rDNA repeat region, parts of the genome that are often neglected or poorly assembled, although they contain interesting information from phylogenetic or epidemiologic perspectives, but also single copy regions can be assembled. The program is capable of targeting multiple regions within a single run. Furthermore, GRAbB can be used to extract specific loci from NGS data, based on homology, like sequences that are used for barcoding. To make the assembly specific, a known part of the region, such as the sequence of a PCR amplicon or a homologous sequence from a related species must be specified. By assembling only the region of interest, the assembly process is computationally much less demanding and may lead to assemblies of better quality. In this study the different applications and functionalities of the program are demonstrated such as: exhaustive assembly (rDNA region and mitochondrial genome, extracting homologous regions or genes (IGS, RPB1, RPB2 and TEF1a, as well as extracting multiple regions within a single run. The program is also compared with MITObim, which is meant for the exhaustive assembly of a single target based on a similar query sequence. GRAbB is shown to be more efficient than MITObim in terms of speed, memory and disk usage. The other functionalities (handling multiple targets simultaneously and extracting homologous regions of the new program are not matched by other programs. The program is available with explanatory documentation at https://github.com/b-brankovics/grabb. GRAbB has been tested on Ubuntu (12.04 and 14.04, Fedora (23, CentOS (7.1.1503 and Mac OS X (10.7. Furthermore, GRAbB is available as a docker repository: brankovics/grabb (https://hub.docker.com/r/brankovics/grabb/.

  16. Interpreting physical performance in professional soccer match-play: should we be more pragmatic in our approach?

    Science.gov (United States)

    Carling, Christopher

    2013-08-01

    Academic and practitioner interest in the physical performance of male professional soccer players in the competition setting determined via time-motion analyses has grown substantially over the last four decades leading to a substantial body of published research and aiding development of a more systematic evidence-based framework for physical conditioning. Findings have forcibly shaped contemporary opinions in the sport with researchers and practitioners frequently emphasising the important role that physical performance plays in match outcomes. Time-motion analyses have also influenced practice as player conditioning programmes can be tailored according to the different physical demands identified across individual playing positions. Yet despite a more systematic approach to physical conditioning, data indicate that even at the very highest standards of competition, the contemporary player is still susceptible to transient and end-game fatigue. Over the course of this article, the author suggests that a more pragmatic approach to interpreting the current body of time-motion analysis data and its application in the practical setting is nevertheless required. Examples of this are addressed using findings in the literature to examine (a) the association between competitive physical performance and 'success' in professional soccer, (b) current approaches to interpreting differences in time-motion analysis data across playing positions, and (c) whether data can realistically be used to demonstrate the occurrence of fatigue in match-play. Gaps in the current literature and directions for future research are also identified.

  17. A Quantitative Genomic Approach for Analysis of Fitness and Stress Related Traits in a Drosophila melanogaster Model Population

    DEFF Research Database (Denmark)

    Rohde, Palle Duun; Krag, Kristian; Loeschcke, Volker

    2016-01-01

    , to investigate whether this population harbors genetic variation for a set of stress resistance and life history traits. Using a genomic approach, we found substantial genetic variation for metabolic rate, heat stress resistance, expression of a major heat shock protein, and egg-to-adult viability investigated......The ability of natural populations to withstand environmental stresses relies partly on their adaptive ability. In this study, we used a subset of the Drosophila Genetic Reference Panel, a population of inbred, genome-sequenced lines derived from a natural population of Drosophila melanogaster...... at a benign and a higher stressful temperature. This suggests that these traits will be able to evolve. In addition, we outline an approach to conduct pathway associations based on genomic linear models, which has potential to identify adaptive genes and pathways, and therefore can be a valuable tool...

  18. Role of Shwachman-Bodian-Diamond syndrome protein in translation machinery and cell chemotaxis: a comparative genomics approach

    Directory of Open Access Journals (Sweden)

    Vasieva O

    2011-09-01

    Full Text Available Olga VasievaInstitute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom; Fellowship for the Interpretation of Genomes, Burr Ridge, IL, USAAbstract: Shwachman-Bodian-Diamond syndrome (SBDS is linked to a mutation in a single gene. The SBDS proinvolved in RNA metabolism and ribosome-associated functions, but SBDS mutation is primarily linked to a defect in polymorphonuclear leukocytes unable to orient correctly in a spatial gradient of chemoattractants. Results of data mining and comparative genomic approaches undertaken in this study suggest that SBDS protein is also linked to tRNA metabolism and translation initiation. Analysis of crosstalk between translation machinery and cytoskeletal dynamics provides new insights into the cellular chemotactic defects caused by SBDS protein malfunction. The proposed functional interactions provide a new approach to exploit potential targets in the treatment and monitoring of this disease.Keywords: Shwachman-Bodian-Diamond syndrome, wybutosine, tRNA, chemotaxis, translation, genomics, gene proximity

  19. A new generation of cancer genome diagnostics for routine clinical use: overcoming the roadblocks to personalized cancer medicine.

    Science.gov (United States)

    Heuckmann, J M; Thomas, R K

    2015-09-01

    The identification of 'druggable' kinase gene alterations has revolutionized cancer treatment in the last decade by providing new and successfully targetable drug targets. Thus, genotyping tumors for matching the right patients with the right drugs have become a clinical routine. Today, advances in sequencing technology and computational genome analyses enable the discovery of a constantly growing number of genome alterations relevant for clinical decision making. As a consequence, several technological approaches have emerged in order to deal with these rapidly increasing demands for clinical cancer genome analyses. Here, we describe challenges on the path to the broad introduction of diagnostic cancer genome analyses and the technologies that can be applied to overcome them. We define three generations of molecular diagnostics that are in clinical use. The latest generation of these approaches involves deep and thus, highly sensitive sequencing of all therapeutically relevant types of genome alterations-mutations, copy number alterations and rearrangements/fusions-in a single assay. Such approaches therefore have substantial advantages (less time and less tissue required) over PCR-based methods that typically have to be combined with fluorescence in situ hybridization for detection of gene amplifications and fusions. Since these new technologies work reliably on routine diagnostic formalin-fixed, paraffin-embedded specimens, they can help expedite the broad introduction of personalized cancer therapy into the clinic by providing comprehensive, sensitive and accurate cancer genome diagnoses in 'real-time'. © The Author 2015. Published by Oxford University Press on behalf of the European Society for Medical Oncology. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  20. Plantagora: modeling whole genome sequencing and assembly of plant genomes.

    Directory of Open Access Journals (Sweden)

    Roger Barthelson

    Full Text Available BACKGROUND: Genomics studies are being revolutionized by the next generation sequencing technologies, which have made whole genome sequencing much more accessible to the average researcher. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome. The Plantagora project was conceived to address specifically the gap between having the technical tools for genome sequencing and knowing precisely the best way to use them. METHODOLOGY/PRINCIPAL FINDINGS: For Plantagora, a platform was created for generating simulated reads from several different plant genomes of different sizes. The resulting read files mimicked either 454 or Illumina reads, with varying paired end spacing. Thousands of datasets of reads were created, most derived from our primary model genome, rice chromosome one. All reads were assembled with different software assemblers, including Newbler, Abyss, and SOAPdenovo, and the resulting assemblies were evaluated by an extensive battery of metrics chosen for these studies. The metrics included both statistics of the assembly sequences and fidelity-related measures derived by alignment of the assemblies to the original genome source for the reads. The results were presented in a website, which includes a data graphing tool, all created to help the user compare rapidly the feasibility and effectiveness of different sequencing and assembly strategies prior to testing an approach in the lab. Some of our own conclusions regarding the different strategies were also recorded on the website. CONCLUSIONS/SIGNIFICANCE: Plantagora provides a substantial body of information for comparing different approaches to sequencing a plant genome, and some conclusions regarding some of the specific approaches. Plantagora also provides a platform of metrics and tools for studying the process of sequencing and assembly

  1. Repeat associated mechanisms of genome evolution and function revealed by the Mus caroli and Mus pahari genomes.

    Science.gov (United States)

    Thybert, David; Roller, Maša; Navarro, Fábio C P; Fiddes, Ian; Streeter, Ian; Feig, Christine; Martin-Galvez, David; Kolmogorov, Mikhail; Janoušek, Václav; Akanni, Wasiu; Aken, Bronwen; Aldridge, Sarah; Chakrapani, Varshith; Chow, William; Clarke, Laura; Cummins, Carla; Doran, Anthony; Dunn, Matthew; Goodstadt, Leo; Howe, Kerstin; Howell, Matthew; Josselin, Ambre-Aurore; Karn, Robert C; Laukaitis, Christina M; Jingtao, Lilue; Martin, Fergal; Muffato, Matthieu; Nachtweide, Stefanie; Quail, Michael A; Sisu, Cristina; Stanke, Mario; Stefflova, Klara; Van Oosterhout, Cock; Veyrunes, Frederic; Ward, Ben; Yang, Fengtang; Yazdanifar, Golbahar; Zadissa, Amonida; Adams, David J; Brazma, Alvis; Gerstein, Mark; Paten, Benedict; Pham, Son; Keane, Thomas M; Odom, Duncan T; Flicek, Paul

    2018-04-01

    Understanding the mechanisms driving lineage-specific evolution in both primates and rodents has been hindered by the lack of sister clades with a similar phylogenetic structure having high-quality genome assemblies. Here, we have created chromosome-level assemblies of the Mus caroli and Mus pahari genomes. Together with the Mus musculus and Rattus norvegicus genomes, this set of rodent genomes is similar in divergence times to the Hominidae (human-chimpanzee-gorilla-orangutan). By comparing the evolutionary dynamics between the Muridae and Hominidae, we identified punctate events of chromosome reshuffling that shaped the ancestral karyotype of Mus musculus and Mus caroli between 3 and 6 million yr ago, but that are absent in the Hominidae. Hominidae show between four- and sevenfold lower rates of nucleotide change and feature turnover in both neutral and functional sequences, suggesting an underlying coherence to the Muridae acceleration. Our system of matched, high-quality genome assemblies revealed how specific classes of repeats can play lineage-specific roles in related species. Recent LINE activity has remodeled protein-coding loci to a greater extent across the Muridae than the Hominidae, with functional consequences at the species level such as reproductive isolation. Furthermore, we charted a Muridae-specific retrotransposon expansion at unprecedented resolution, revealing how a single nucleotide mutation transformed a specific SINE element into an active CTCF binding site carrier specifically in Mus caroli , which resulted in thousands of novel, species-specific CTCF binding sites. Our results show that the comparison of matched phylogenetic sets of genomes will be an increasingly powerful strategy for understanding mammalian biology. © 2018 Thybert et al.; Published by Cold Spring Harbor Laboratory Press.

  2. Security and matching of partial fingerprint recognition systems

    Science.gov (United States)

    Jea, Tsai-Yang; Chavan, Viraj S.; Govindaraju, Venu; Schneider, John K.

    2004-08-01

    Despite advances in fingerprint identification techniques, matching incomplete or partial fingerprints still poses a difficult challenge. While the introduction of compact silicon chip-based sensors that capture only a part of the fingerprint area have made this problem important from a commercial perspective, there is also considerable interest on the topic for processing partial and latent fingerprints obtained at crime scenes. Attempts to match partial fingerprints using singular ridge structures-based alignment techniques fail when the partial print does not include such structures (e.g., core or delta). We present a multi-path fingerprint matching approach that utilizes localized secondary features derived using only the relative information of minutiae. Since the minutia-based fingerprint representation, is an ANSI-NIST standard, our approach has the advantage of being directly applicable to already existing databases. We also analyze the vulnerability of partial fingerprint identification systems to brute force attacks. The described matching approach has been tested on one of FVC2002"s DB1 database11. The experimental results show that our approach achieves an equal error rate of 1.25% and a total error rate of 1.8% (with FAR at 0.2% and FRR at 1.6%).

  3. Comparative genomic analysis of single-molecule sequencing and hybrid approaches for finishing the Clostridium autoethanogenum JA1-1 strain DSM 10061 genome

    Energy Technology Data Exchange (ETDEWEB)

    Brown, Steven D [ORNL; Nagaraju, Shilpa [LanzaTech; Utturkar, Sagar M [ORNL; De Tissera, Sashini [LanzaTech; Segovia, Simón [LanzaTech; Mitchell, Wayne [LanzaTech; Land, Miriam L [ORNL; Dassanayake, Asela [LanzaTech; Köpke, Michael [LanzaTech

    2014-01-01

    Background Clostridium autoethanogenum strain JA1-1 (DSM 10061) is an acetogen capable of fermenting CO, CO2 and H2 (e.g. from syngas or waste gases) into biofuel ethanol and commodity chemicals such as 2,3-butanediol. A draft genome sequence consisting of 100 contigs has been published. Results A closed, high-quality genome sequence for C. autoethanogenum DSM10061 was generated using only the latest single-molecule DNA sequencing technology and without the need for manual finishing. It is assigned to the most complex genome classification based upon genome features such as repeats, prophage, nine copies of the rRNA gene operons. It has a low G + C content of 31.1%. Illumina, 454, Illumina/454 hybrid assemblies were generated and then compared to the draft and PacBio assemblies using summary statistics, CGAL, QUAST and REAPR bioinformatics tools and comparative genomic approaches. Assemblies based upon shorter read DNA technologies were confounded by the large number repeats and their size, which in the case of the rRNA gene operons were ~5 kb. CRISPR (Clustered Regularly Interspaced Short Paloindromic Repeats) systems among biotechnologically relevant Clostridia were classified and related to plasmid content and prophages. Potential associations between plasmid content and CRISPR systems may have implications for historical industrial scale Acetone-Butanol-Ethanol (ABE) fermentation failures and future large scale bacterial fermentations. While C. autoethanogenum contains an active CRISPR system, no such system is present in the closely related Clostridium ljungdahlii DSM 13528. A common prophage inserted into the Arg-tRNA shared between the strains suggests a common ancestor. However, C. ljungdahlii contains several additional putative prophages and it has more than double the amount of prophage DNA compared to C. autoethanogenum. Other differences include important metabolic genes for central metabolism (as an additional hydrogenase and the absence of a

  4. Global assessment of genomic variation in cattle by genome resequencing and high-throughput genotyping

    DEFF Research Database (Denmark)

    Zhan, Bujie; Fadista, João; Thomsen, Bo

    2011-01-01

    Background Integration of genomic variation with phenotypic information is an effective approach for uncovering genotype-phenotype associations. This requires an accurate identification of the different types of variation in individual genomes. Results We report the integration of the whole genome...... of split-read and read-pair approaches proved to be complementary in finding different signatures. CNVs were identified on the basis of the depth of sequenced reads, and by using SNP and CGH arrays. Conclusions Our results provide high resolution mapping of diverse classes of genomic variation...

  5. Pattern recognition and string matching

    CERN Document Server

    Cheng, Xiuzhen

    2002-01-01

    The research and development of pattern recognition have proven to be of importance in science, technology, and human activity. Many useful concepts and tools from different disciplines have been employed in pattern recognition. Among them is string matching, which receives much theoretical and practical attention. String matching is also an important topic in combinatorial optimization. This book is devoted to recent advances in pattern recognition and string matching. It consists of twenty eight chapters written by different authors, addressing a broad range of topics such as those from classifica­ tion, matching, mining, feature selection, and applications. Each chapter is self-contained, and presents either novel methodological approaches or applications of existing theories and techniques. The aim, intent, and motivation for publishing this book is to pro­ vide a reference tool for the increasing number of readers who depend upon pattern recognition or string matching in some way. This includes student...

  6. The 'morbid anatomy' of the human genome: tracing the observational and representational approaches of postwar genetics and biomedicine the William Bynum Prize Essay.

    Science.gov (United States)

    Hogan, Andrew J

    2014-07-01

    This paper explores evolving conceptions and depictions of the human genome among human and medical geneticists during the postwar period. Historians of science and medicine have shown significant interest in the use of informational approaches in postwar genetics, which treat the genome as an expansive digital data set composed of three billion DNA nucleotides. Since the 1950s, however, geneticists have largely interacted with the human genome at the microscopically visible level of chromosomes. Mindful of this, I examine the observational and representational approaches of postwar human and medical genetics. During the 1970s and 1980s, the genome increasingly came to be understood as, at once, a discrete part of the human anatomy and a standardised scientific object. This paper explores the role of influential medical geneticists in recasting the human genome as being a visible, tangible, and legible entity, which was highly relevant to traditional medical thinking and practice. I demonstrate how the human genome was established as an object amenable to laboratory and clinical research, and argue that the observational and representational approaches of postwar medical genetics reflect, more broadly, the interdisciplinary efforts underlying the development of contemporary biomedicine.

  7. GRAbB : Selective Assembly of Genomic Regions, a New Niche for Genomic Research

    NARCIS (Netherlands)

    Brankovics, Balázs; Zhang, Hao; van Diepeningen, Anne D; van der Lee, Theo A J; Waalwijk, Cees; de Hoog, G Sybren

    GRAbB (Genomic Region Assembly by Baiting) is a new program that is dedicated to assemble specific genomic regions from NGS data. This approach is especially useful when dealing with multi copy regions, such as mitochondrial genome and the rDNA repeat region, parts of the genome that are often

  8. A Bac Library and Paired-PCR Approach to Mapping and Completing the Genome Sequence of Sulfolobus Solfataricus P2

    DEFF Research Database (Denmark)

    She, Qunxin; Confalonieri, F.; Zivanovic, Y.

    2000-01-01

    The original strategy used in the Sulfolobus solfatnricus genome project was to sequence non overlapping, or minimally overlapping, cosmid or lambda inserts without constructing a physical map. However, after only about two thirds of the genome sequence was completed, this approach became counter......-productive because there was a high sequence bias in the cosmid and lambda libraries. Therefore, a new approach was devised for linking the sequenced regions which may be generally applicable. BAC libraries were constructed and terminal sequences of the clones were determined and used for both end mapping and PCR...

  9. PREDICTING THE MATCH OUTCOME IN ONE DAY INTERNATIONAL CRICKET MATCHES, WHILE THE GAME IS IN PROGRESS

    Directory of Open Access Journals (Sweden)

    Michael Bailey

    2006-12-01

    Full Text Available Millions of dollars are wagered on the outcome of one day international (ODI cricket matches, with a large percentage of bets occurring after the game has commenced. Using match information gathered from all 2200 ODI matches played prior to January 2005, a range of variables that could independently explain statistically significant proportions of variation associated with the predicted run totals and match outcomes were created. Such variables include home ground advantage, past performances, match experience, performance at the specific venue, performance against the specific opposition, experience at the specific venue and current form. Using a multiple linear regression model, prediction variables were numerically weighted according to statistical significance and used to predict the match outcome. With the use of the Duckworth-Lewis method to determine resources remaining, at the end of each completed over, the predicted run total of the batting team could be updated to provide a more accurate prediction of the match outcome. By applying this prediction approach to a holdout sample of matches, the efficiency of the "in the run" wagering market could be assessed. Preliminary results suggest that the market is prone to overreact to events occurring throughout the course of the match, thus creating brief inefficiencies in the wagering market

  10. Elastic-net regularization approaches for genome-wide association studies of rheumatoid arthritis.

    Science.gov (United States)

    Cho, Seoae; Kim, Haseong; Oh, Sohee; Kim, Kyunga; Park, Taesung

    2009-12-15

    The current trend in genome-wide association studies is to identify regions where the true disease-causing genes may lie by evaluating thousands of single-nucleotide polymorphisms (SNPs) across the whole genome. However, many challenges exist in detecting disease-causing genes among the thousands of SNPs. Examples include multicollinearity and multiple testing issues, especially when a large number of correlated SNPs are simultaneously tested. Multicollinearity can often occur when predictor variables in a multiple regression model are highly correlated, and can cause imprecise estimation of association. In this study, we propose a simple stepwise procedure that identifies disease-causing SNPs simultaneously by employing elastic-net regularization, a variable selection method that allows one to address multicollinearity. At Step 1, the single-marker association analysis was conducted to screen SNPs. At Step 2, the multiple-marker association was scanned based on the elastic-net regularization. The proposed approach was applied to the rheumatoid arthritis (RA) case-control data set of Genetic Analysis Workshop 16. While the selected SNPs at the screening step are located mostly on chromosome 6, the elastic-net approach identified putative RA-related SNPs on other chromosomes in an increased proportion. For some of those putative RA-related SNPs, we identified the interactions with sex, a well known factor affecting RA susceptibility.

  11. Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach.

    Directory of Open Access Journals (Sweden)

    Simon Boitard

    2016-03-01

    Full Text Available Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey, PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles.

  12. BLAST Ring Image Generator (BRIG: simple prokaryote genome comparisons

    Directory of Open Access Journals (Sweden)

    Beatson Scott A

    2011-08-01

    Full Text Available Abstract Background Visualisation of genome comparisons is invaluable for helping to determine genotypic differences between closely related prokaryotes. New visualisation and abstraction methods are required in order to improve the validation, interpretation and communication of genome sequence information; especially with the increasing amount of data arising from next-generation sequencing projects. Visualising a prokaryote genome as a circular image has become a powerful means of displaying informative comparisons of one genome to a number of others. Several programs, imaging libraries and internet resources already exist for this purpose, however, most are either limited in the number of comparisons they can show, are unable to adequately utilise draft genome sequence data, or require a knowledge of command-line scripting for implementation. Currently, there is no freely available desktop application that enables users to rapidly visualise comparisons between hundreds of draft or complete genomes in a single image. Results BLAST Ring Image Generator (BRIG can generate images that show multiple prokaryote genome comparisons, without an arbitrary limit on the number of genomes compared. The output image shows similarity between a central reference sequence and other sequences as a set of concentric rings, where BLAST matches are coloured on a sliding scale indicating a defined percentage identity. Images can also include draft genome assembly information to show read coverage, assembly breakpoints and collapsed repeats. In addition, BRIG supports the mapping of unassembled sequencing reads against one or more central reference sequences. Many types of custom data and annotations can be shown using BRIG, making it a versatile approach for visualising a range of genomic comparison data. BRIG is readily accessible to any user, as it assumes no specialist computational knowledge and will perform all required file parsing and BLAST comparisons

  13. BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons.

    Science.gov (United States)

    Alikhan, Nabil-Fareed; Petty, Nicola K; Ben Zakour, Nouri L; Beatson, Scott A

    2011-08-08

    Visualisation of genome comparisons is invaluable for helping to determine genotypic differences between closely related prokaryotes. New visualisation and abstraction methods are required in order to improve the validation, interpretation and communication of genome sequence information; especially with the increasing amount of data arising from next-generation sequencing projects. Visualising a prokaryote genome as a circular image has become a powerful means of displaying informative comparisons of one genome to a number of others. Several programs, imaging libraries and internet resources already exist for this purpose, however, most are either limited in the number of comparisons they can show, are unable to adequately utilise draft genome sequence data, or require a knowledge of command-line scripting for implementation. Currently, there is no freely available desktop application that enables users to rapidly visualise comparisons between hundreds of draft or complete genomes in a single image. BLAST Ring Image Generator (BRIG) can generate images that show multiple prokaryote genome comparisons, without an arbitrary limit on the number of genomes compared. The output image shows similarity between a central reference sequence and other sequences as a set of concentric rings, where BLAST matches are coloured on a sliding scale indicating a defined percentage identity. Images can also include draft genome assembly information to show read coverage, assembly breakpoints and collapsed repeats. In addition, BRIG supports the mapping of unassembled sequencing reads against one or more central reference sequences. Many types of custom data and annotations can be shown using BRIG, making it a versatile approach for visualising a range of genomic comparison data. BRIG is readily accessible to any user, as it assumes no specialist computational knowledge and will perform all required file parsing and BLAST comparisons automatically. There is a clear need for a user

  14. Anterior Versus Posterior Approach for Multilevel Degenerative Cervical Disease: A Retrospective Propensity Score-Matched Study of the MarketScan Database.

    Science.gov (United States)

    Cole, Tyler; Veeravagu, Anand; Zhang, Michael; Azad, Tej D; Desai, Atman; Ratliff, John K

    2015-07-01

    Retrospective 2:1 propensity score-matched analysis on a national longitudinal database between 2006 and 2010. To compare rates of adverse events, revisions procedure rates, and payment differences in anterior cervical fusion procedures compared with posterior laminectomy and fusion procedures with at least 3 levels of instrumentation. The comparative benefits of anterior versus posterior approach to multilevel degenerative cervical disease remain controversial. Recent systematic reviews have reached conflicting conclusions. We demonstrate the comparative economic and clinical outcomes of anterior and posterior approaches for multilevel cervical degenerative disk disease. We identified 13,662 patients in a national billing claims database who underwent anterior or posterior cervical fusion procedures with 3 or more levels of instrumentation. Cohorts were balanced using 2:1 propensity score matching and outcomes were compared using bivariate analysis. With the exception of dysphagia (6.4% in anterior and 1.4% in posterior), overall 30-day complication rates were lower in the anterior approach group. The rate of any complication excluding dysphagia with anterior approaches was 12.3%, significantly lower (P disease provide clinical advantages over posterior approaches, including lower overall complication rates, revision procedure rates, and decreased length of stay. Anterior approach procedures are also associated with decreased overall payments. These findings must be interpreted in light of limitations inherent to retrospective longitudinal studies including absence of subjective and radiographical outcomes. 3.

  15. INVESTIGATIONS INTO MOLECULAR PATHWAYS IN THE POST GENOME ERA: CROSS SPECIES COMPARATIVE GENOMICS APPROACH

    Science.gov (United States)

    Genome sequencing efforts in the past decade were aimed at generating draft sequences of many prokaryotic and eukaryotic model organisms. Successful completion of unicellular eukaryotes, worm, fly and human genome have opened up the new field of molecular biology and function...

  16. Genomic and Functional Approaches to Understanding Cancer Aneuploidy

    NARCIS (Netherlands)

    Taylor, Alison M.; Shih, Juliann; Ha, Gavin; Gao, Galen F.; Zhang, Xiaoyang; Berger, Ashton C.; Schumacher, Steven E.; Wang, Chen; Hu, Hai; Liu, Jianfang; Lazar, Alexander J.; Caesar-Johnson, Samantha J.; Demchok, John A.; Felau, Ina; Kasapi, Melpomeni; Ferguson, Martin L.; Hutter, Carolyn M.; Sofia, Heidi J.; Tarnuzzer, Roy; Wang, Zhining; Yang, Liming; Zenklusen, Jean C.; Zhang, Jiashan (Julia); Chudamani, Sudha; Liu, Jia; Lolla, Laxmi; Naresh, Rashi; Pihl, Todd; Sun, Qiang; Wan, Yunhu; Wu, Ye; Cho, Juok; DeFreitas, Timothy; Frazer, Scott; Gehlenborg, Nils; Getz, Gad; Heiman, David I.; Kim, Jaegil; Lawrence, Michael S.; Lin, Pei; Meier, Sam; Noble, Michael S.; Saksena, Gordon; Voet, Doug; Zhang, Hailei; Bernard, Brady; Chambwe, Nyasha; Dhankani, Varsha; Knijnenburg, Theo; Kramer, Roger; Leinonen, Kalle; Liu, Yuexin; Miller, Michael; Reynolds, Sheila; Shmulevich, Ilya; Thorsson, Vesteinn; Zhang, Wei; Akbani, Rehan; Broom, Bradley M.; Hegde, Apurva M.; Ju, Zhenlin; Kanchi, Rupa S.; Korkut, Anil; Li, Jun; Liang, Han; Ling, Shiyun; Liu, Wenbin; Lu, Yiling; Mills, Gordon B.; Ng, Kwok Shing; Rao, Arvind; Ryan, Michael; Wang, Jing; Weinstein, John N.; Zhang, Jiexin; Abeshouse, Adam; Armenia, Joshua; Chakravarty, Debyani; Chatila, Walid K.; de Bruijn, Ino; Gao, Jianjiong; Gross, Benjamin E.; Heins, Zachary J.; Kundra, Ritika; La, Konnor; Ladanyi, Marc; Luna, Augustin; Nissan, Moriah G.; Ochoa, Angelica; Phillips, Sarah M.; Reznik, Ed; Sanchez-Vega, Francisco; Sander, Chris; Schultz, Nikolaus; Sheridan, Robert; Sumer, S. Onur; Sun, Yichao; Taylor, Barry S.; Wang, Jioajiao; Zhang, Hongxin; Anur, Pavana; Peto, Myron; Spellman, Paul; Benz, Christopher; Stuart, Joshua M.; Wong, Christopher K.; Yau, Christina; Hayes, D. Neil; Parker, Joel S.; Wilkerson, Matthew D.; Ally, Adrian; Balasundaram, Miruna; Bowlby, Reanne; Brooks, Denise; Carlsen, Rebecca; Chuah, Eric; Dhalla, Noreen; Holt, Robert; Jones, Steven J.M.; Kasaian, Katayoon; Lee, Darlene; Ma, Yussanne; Marra, Marco A.; Mayo, Michael; Moore, Richard A.; Mungall, Andrew J.; Mungall, Karen; Robertson, A. Gordon; Sadeghi, Sara; Schein, Jacqueline E.; Sipahimalani, Payal; Tam, Angela; Thiessen, Nina; Tse, Kane; Wong, Tina; Berger, Ashton C.; Beroukhim, Rameen; Cherniack, Andrew D.; Cibulskis, Carrie; Gabriel, Stacey B.; Gao, Galen F.; Ha, Gavin; Meyerson, Matthew; Schumacher, Steven E.; Shih, Juliann; Kucherlapati, Melanie H.; Kucherlapati, Raju S.; Baylin, Stephen; Cope, Leslie; Danilova, Ludmila; Bootwalla, Moiz S.; Lai, Phillip H.; Maglinte, Dennis T.; Van Den Berg, David J.; Weisenberger, Daniel J.; Auman, J. Todd; Balu, Saianand; Bodenheimer, Tom; Fan, Cheng; Hoadley, Katherine A.; Hoyle, Alan P.; Jefferys, Stuart R.; Jones, Corbin D.; Meng, Shaowu; Mieczkowski, Piotr A.; Mose, Lisle E.; Perou, Amy H.; Perou, Charles M.; Roach, Jeffrey; Shi, Yan; Simons, Janae V.; Skelly, Tara; Soloway, Matthew G.; Tan, Donghui; Veluvolu, Umadevi; Fan, Huihui; Hinoue, Toshinori; Laird, Peter W.; Shen, Hui; Zhou, Wanding; Bellair, Michelle; Chang, Kyle; Covington, Kyle; Creighton, Chad J.; Dinh, Huyen; Doddapaneni, Harsha Vardhan; Donehower, Lawrence A.; Drummond, Jennifer; Gibbs, Richard A.; Glenn, Robert; Hale, Walker; Han, Yi; Hu, Jianhong; Korchina, Viktoriya; Lee, Sandra; Lewis, Lora; Li, Wei; Liu, Xiuping; Morgan, Margaret; Morton, Donna; Muzny, Donna; Santibanez, Jireh; Sheth, Margi; Shinbrot, Eve; Wang, Linghua; Wang, Min; Wheeler, David A.; Xi, Liu; Zhao, Fengmei; Hess, Julian; Appelbaum, Elizabeth L.; Bailey, Matthew; Cordes, Matthew G.; Ding, Li; Fronick, Catrina C.; Fulton, Lucinda A.; Fulton, Robert S.; Kandoth, Cyriac; Mardis, Elaine R.; McLellan, Michael D.; Miller, Christopher A.; Schmidt, Heather K.; Wilson, Richard K.; Crain, Daniel; Curley, Erin; Gardner, Johanna; Lau, Kevin; Mallery, David; Morris, Scott; Paulauskis, Joseph; Penny, Robert; Shelton, Candace; Shelton, Troy; Sherman, Mark; Thompson, Eric; Yena, Peggy; Bowen, Jay; Gastier-Foster, Julie M.; Gerken, Mark; Leraas, Kristen M.; Lichtenberg, Tara M.; Ramirez, Nilsa C.; Wise, Lisa; Zmuda, Erik; Corcoran, Niall; Costello, Tony; Hovens, Christopher; Carvalho, Andre L.; de Carvalho, Ana C.; Fregnani, José H.; Longatto-Filho, Adhemar; Reis, Rui M.; Scapulatempo-Neto, Cristovam; Silveira, Henrique C.S.; Vidal, Daniel O.; Burnette, Andrew; Eschbacher, Jennifer; Hermes, Beth; Noss, Ardene; Singh, Rosy; Anderson, Matthew L.; Castro, Patricia D.; Ittmann, Michael; Huntsman, David; Kohl, Bernard; Le, Xuan; Thorp, Richard; Andry, Chris; Duffy, Elizabeth R.; Lyadov, Vladimir; Paklina, Oxana; Setdikova, Galiya; Shabunin, Alexey; Tavobilov, Mikhail; McPherson, Christopher; Warnick, Ronald; Berkowitz, Ross; Cramer, Daniel; Feltmate, Colleen; Horowitz, Neil; Kibel, Adam; Muto, Michael; Raut, Chandrajit P.; Malykh, Andrei; Barnholtz-Sloan, Jill S.; Barrett, Wendi; Devine, Karen; Fulop, Jordonna; Ostrom, Quinn T.; Shimmel, Kristen; Wolinsky, Yingli; Sloan, Andrew E.; De Rose, Agostino; Giuliante, Felice; Goodman, Marc; Karlan, Beth Y.; Hagedorn, Curt H.; Eckman, John; Harr, Jodi; Myers, Jerome; Tucker, Kelinda; Zach, Leigh Anne; Deyarmin, Brenda; Hu, Hai; Kvecher, Leonid; Larson, Caroline; Mural, Richard J.; Somiari, Stella; Vicha, Ales; Zelinka, Tomas; Bennett, Joseph; Iacocca, Mary; Rabeno, Brenda; Swanson, Patricia; Latour, Mathieu; Lacombe, Louis; Têtu, Bernard; Bergeron, Alain; McGraw, Mary; Staugaitis, Susan M.; Chabot, John; Hibshoosh, Hanina; Sepulveda, Antonia; Su, Tao; Wang, Timothy; Potapova, Olga; Voronina, Olga; Desjardins, Laurence; Mariani, Odette; Roman-Roman, Sergio; Sastre, Xavier; Stern, Marc Henri; Cheng, Feixiong; Signoretti, Sabina; Berchuck, Andrew; Bigner, Darell; Lipp, Eric; Marks, Jeffrey; McCall, Shannon; McLendon, Roger; Secord, Angeles; Sharp, Alexis; Behera, Madhusmita; Brat, Daniel J.; Chen, Amy; Delman, Keith; Force, Seth; Khuri, Fadlo; Magliocca, Kelly; Maithel, Shishir; Olson, Jeffrey J.; Owonikoko, Taofeek; Pickens, Alan; Ramalingam, Suresh; Shin, Dong M.; Sica, Gabriel; Van Meir, Erwin G.; Zhang, Hongzheng; Eijckenboom, Wil; Gillis, Ad; Korpershoek, Esther; Looijenga, Leendert; Oosterhuis, Wolter; Stoop, Hans; van Kessel, Kim E.; Zwarthoff, Ellen C.; Calatozzolo, Chiara; Cuppini, Lucia; Cuzzubbo, Stefania; DiMeco, Francesco; Finocchiaro, Gaetano; Mattei, Luca; Perin, Alessandro; Pollo, Bianca; Chen, Chu; Houck, John; Lohavanichbutr, Pawadee; Hartmann, Arndt; Stoehr, Christine; Stoehr, Robert; Taubert, Helge; Wach, Sven; Wullich, Bernd; Kycler, Witold; Murawa, Dawid; Wiznerowicz, Maciej; Chung, Ki; Edenfield, W. Jeffrey; Martin, Julie; Baudin, Eric; Bubley, Glenn; Bueno, Raphael; De Rienzo, Assunta; Richards, William G.; Kalkanis, Steven; Mikkelsen, Tom; Noushmehr, Houtan; Scarpace, Lisa; Girard, Nicolas; Aymerich, Marta; Campo, Elias; Giné, Eva; Guillermo, Armando López; Van Bang, Nguyen; Hanh, Phan Thi; Phu, Bui Duc; Tang, Yufang; Colman, Howard; Evason, Kimberley; Dottino, Peter R.; Martignetti, John A.; Gabra, Hani; Juhl, Hartmut; Akeredolu, Teniola; Stepa, Serghei; Hoon, Dave; Ahn, Keunsoo; Kang, Koo Jeong; Beuschlein, Felix; Breggia, Anne; Birrer, Michael; Bell, Debra; Borad, Mitesh; Bryce, Alan H.; Castle, Erik; Chandan, Vishal; Cheville, John; Copland, John A.; Farnell, Michael; Flotte, Thomas; Giama, Nasra; Ho, Thai; Kendrick, Michael; Kocher, Jean Pierre; Kopp, Karla; Moser, Catherine; Nagorney, David; O'Brien, Daniel; O'Neill, Brian Patrick; Patel, Tushar; Petersen, Gloria; Que, Florencia; Rivera, Michael; Roberts, Lewis; Smallridge, Robert; Smyrk, Thomas; Stanton, Melissa; Thompson, R. Houston; Torbenson, Michael; Yang, Ju Dong; Zhang, Lizhi; Brimo, Fadi; Ajani, Jaffer A.; Angulo Gonzalez, Ana Maria; Behrens, Carmen; Bondaruk, Jolanta; Broaddus, Russell; Czerniak, Bogdan; Esmaeli, Bita; Fujimoto, Junya; Gershenwald, Jeffrey; Guo, Charles; Lazar, Alexander J.; Logothetis, Christopher; Meric-Bernstam, Funda; Moran, Cesar; Ramondetta, Lois; Rice, David; Sood, Anil; Tamboli, Pheroze; Thompson, Timothy; Troncoso, Patricia; Tsao, Anne; Wistuba, Ignacio; Carter, Candace; Haydu, Lauren; Hersey, Peter; Jakrot, Valerie; Kakavand, Hojabr; Kefford, Richard; Lee, Kenneth; Long, Georgina; Mann, Graham; Quinn, Michael; Saw, Robyn; Scolyer, Richard; Shannon, Kerwin; Spillane, Andrew; Stretch, Jonathan; Synott, Maria; Thompson, John; Wilmott, James; Al-Ahmadie, Hikmat; Chan, Timothy A.; Ghossein, Ronald; Gopalan, Anuradha; Levine, Douglas A.; Reuter, Victor; Singer, Samuel; Singh, Bhuvanesh; Tien, Nguyen Viet; Broudy, Thomas; Mirsaidi, Cyrus; Nair, Praveen; Drwiega, Paul; Miller, Judy; Smith, Jennifer; Zaren, Howard; Park, Joong Won; Hung, Nguyen Phi; Kebebew, Electron; Linehan, W. Marston; Metwalli, Adam R.; Pacak, Karel; Pinto, Peter A.; Schiffman, Mark; Schmidt, Laura S.; Vocke, Cathy D.; Wentzensen, Nicolas; Worrell, Robert; Yang, Hannah; Moncrieff, Marc; Goparaju, Chandra; Melamed, Jonathan; Pass, Harvey; Botnariuc, Natalia; Caraman, Irina; Cernat, Mircea; Chemencedji, Inga; Clipca, Adrian; Doruc, Serghei; Gorincioi, Ghenadie; Mura, Sergiu; Pirtac, Maria; Stancul, Irina; Tcaciuc, Diana; Albert, Monique; Alexopoulou, Iakovina; Arnaout, Angel; Bartlett, John; Engel, Jay; Gilbert, Sebastien; Parfitt, Jeremy; Sekhon, Harman; Thomas, George; Rassl, Doris M.; Rintoul, Robert C.; Bifulco, Carlo; Tamakawa, Raina; Urba, Walter; Hayward, Nicholas; Timmers, Henri; Antenucci, Anna; Facciolo, Francesco; Grazi, Gianluca; Marino, Mirella; Merola, Roberta; de Krijger, Ronald; Gimenez-Roqueplo, Anne Paule; Piché, Alain; Chevalier, Simone; McKercher, Ginette; Birsoy, Kivanc; Barnett, Gene; Brewer, Cathy; Farver, Carol; Naska, Theresa; Pennell, Nathan A.; Raymond, Daniel; Schilero, Cathy; Smolenski, Kathy; Williams, Felicia; Morrison, Carl; Borgia, Jeffrey A.; Liptay, Michael J.; Pool, Mark; Seder, Christopher W.; Junker, Kerstin; Omberg, Larsson; Dinkin, Mikhail; Manikhas, George; Alvaro, Domenico; Bragazzi, Maria Consiglia; Cardinale, Vincenzo; Carpino, Guido; Gaudio, Eugenio; Chesla, David; Cottingham, Sandra; Dubina, Michael; Moiseenko, Fedor; Dhanasekaran, Renumathy; Becker, Karl Friedrich; Janssen, Klaus Peter; Slotta-Huspenina, Julia; Abdel-Rahman, Mohamed H.; Aziz, Dina; Bell, Sue; Cebulla, Colleen M.; Davis, Amy; Duell, Rebecca; Elder, J. Bradley; Hilty, Joe; Kumar, Bahavna; Lang, James; Lehman, Norman L.; Mandt, Randy; Nguyen, Phuong; Pilarski, Robert; Rai, Karan; Schoenfield, Lynn; Senecal, Kelly; Wakely, Paul; Hansen, Paul; Lechan, Ronald; Powers, James; Tischler, Arthur; Grizzle, William E.; Sexton, Katherine C.; Kastl, Alison; Henderson, Joel; Porten, Sima; Waldmann, Jens; Fassnacht, Martin; Asa, Sylvia L.; Schadendorf, Dirk; Couce, Marta; Graefen, Markus; Huland, Hartwig; Sauter, Guido; Schlomm, Thorsten; Simon, Ronald; Tennstedt, Pierre; Olabode, Oluwole; Nelson, Mark; Bathe, Oliver; Carroll, Peter R.; Chan, June M.; Disaia, Philip; Glenn, Pat; Kelley, Robin K.; Landen, Charles N.; Phillips, Joanna; Prados, Michael; Simko, Jeffry; Smith-McCune, Karen; VandenBerg, Scott; Roggin, Kevin; Fehrenbach, Ashley; Kendler, Ady; Sifri, Suzanne; Steele, Ruth; Jimeno, Antonio; Carey, Francis; Forgie, Ian; Mannelli, Massimo; Carney, Michael; Hernandez, Brenda; Campos, Benito; Herold-Mende, Christel; Jungk, Christin; Unterberg, Andreas; von Deimling, Andreas; Bossler, Aaron; Galbraith, Joseph; Jacobus, Laura; Knudson, Michael; Knutson, Tina; Ma, Deqin; Milhem, Mohammed; Sigmund, Rita; Godwin, Andrew K.; Madan, Rashna; Rosenthal, Howard G.; Adebamowo, Clement; Adebamowo, Sally N.; Boussioutas, Alex; Beer, David; Giordano, Thomas; Mes-Masson, Anne Marie; Saad, Fred; Bocklage, Therese; Landrum, Lisa; Mannel, Robert; Moore, Kathleen; Moxley, Katherine; Postier, Russel; Walker, Joan; Zuna, Rosemary; Feldman, Michael; Valdivieso, Federico; Dhir, Rajiv; Luketich, James; Mora Pinero, Edna M.; Quintero-Aguilo, Mario; Carlotti, Carlos Gilberto; Dos Santos, Jose Sebastião; Kemp, Rafael; Sankarankuty, Ajith; Tirapelli, Daniela; Catto, James; Agnew, Kathy; Swisher, Elizabeth; Creaney, Jenette; Robinson, Bruce; Shelley, Carl Simon; Godwin, Eryn M.; Kendall, Sara; Shipman, Cassaundra; Bradford, Carol; Carey, Thomas; Haddad, Andrea; Moyer, Jeffey; Peterson, Lisa; Prince, Mark; Rozek, Laura; Wolf, Gregory; Bowman, Rayleen; Fong, Kwun M.; Yang, Ian; Korst, Robert; Rathmell, W. Kimryn; Fantacone-Campbell, J. Leigh; Hooke, Jeffrey A.; Kovatich, Albert J.; Shriver, Craig D.; DiPersio, John; Drake, Bettina; Govindan, Ramaswamy; Heath, Sharon; Ley, Timothy; Van Tine, Brian; Westervelt, Peter; Rubin, Mark A.; Lee, Jung Il; Aredes, Natália D.; Mariamidze, Armaz; Cherniack, Andrew D.; Beroukhim, Rameen; Meyerson, Matthew

    2018-01-01

    Aneuploidy, whole chromosome or chromosome arm imbalance, is a near-universal characteristic of human cancers. In 10,522 cancer genomes from The Cancer Genome Atlas, aneuploidy was correlated with TP53 mutation, somatic mutation rate, and expression of proliferation genes. Aneuploidy was

  17. Impedance matching through a single passive fractional element

    KAUST Repository

    Radwan, Ahmed Gomaa

    2012-07-01

    For the first time, a generalized admittance Smith chart theory is introduced to represent fractional order circuit elements. The principles of fractional order matching circuits are described. We show that for fractional order α < 1, a single parallel fractional element can match a wider range of load impedances as compared to its series counterpart. Several matching examples demonstrate the versatility of fractional order series and parallel element matching as compared to the conventional approach. © 2012 IEEE.

  18. Genome Investigations of Vector Competence in Aedes aegypti to Inform Novel Arbovirus Disease Control Approaches

    Directory of Open Access Journals (Sweden)

    David W. Severson

    2016-10-01

    Full Text Available Dengue (DENV, yellow fever, chikungunya, and Zika virus transmission to humans by a mosquito host is confounded by both intrinsic and extrinsic variables. Besides virulence factors of the individual arboviruses, likelihood of virus transmission is subject to variability in the genome of the primary mosquito vector, Aedes aegypti. The “vectorial capacity” of A. aegypti varies depending upon its density, biting rate, and survival rate, as well as its intrinsic ability to acquire, host and transmit a given arbovirus. This intrinsic ability is known as “vector competence”. Based on whole transcriptome analysis, several genes and pathways have been predicated to have an association with a susceptible or refractory response in A. aegypti to DENV infection. However, the functional genomics of vector competence of A. aegypti is not well understood, primarily due to lack of integrative approaches in genomic or transcriptomic studies. In this review, we focus on the present status of genomics studies of DENV vector competence in A. aegypti as limited information is available relative to the other arboviruses. We propose future areas of research needed to facilitate the integration of vector and virus genomics and environmental factors to work towards better understanding of vector competence and vectorial capacity in natural conditions.

  19. Allele coding in genomic evaluation

    DEFF Research Database (Denmark)

    Standen, Ismo; Christensen, Ole Fredslund

    2011-01-01

    Genomic data are used in animal breeding to assist genetic evaluation. Several models to estimate genomic breeding values have been studied. In general, two approaches have been used. One approach estimates the marker effects first and then, genomic breeding values are obtained by summing marker...... effects. In the second approach, genomic breeding values are estimated directly using an equivalent model with a genomic relationship matrix. Allele coding is the method chosen to assign values to the regression coefficients in the statistical model. A common allele coding is zero for the homozygous...... genotype of the first allele, one for the heterozygote, and two for the homozygous genotype for the other allele. Another common allele coding changes these regression coefficients by subtracting a value from each marker such that the mean of regression coefficients is zero within each marker. We call...

  20. Indexes of large genome collections on a PC.

    Directory of Open Access Journals (Sweden)

    Agnieszka Danek

    Full Text Available The availability of thousands of individual genomes of one species should boost rapid progress in personalized medicine or understanding of the interaction between genotype and phenotype, to name a few applications. A key operation useful in such analyses is aligning sequencing reads against a collection of genomes, which is costly with the use of existing algorithms due to their large memory requirements. We present MuGI, Multiple Genome Index, which reports all occurrences of a given pattern, in exact and approximate matching model, against a collection of thousand(s genomes. Its unique feature is the small index size, which is customisable. It fits in a standard computer with 16-32 GB, or even 8 GB, of RAM, for the 1000GP collection of 1092 diploid human genomes. The solution is also fast. For example, the exact matching queries (of average length 150 bp are handled in average time of 39 µs and with up to 3 mismatches in 373 µs on the test PC with the index size of 13.4 GB. For a smaller index, occupying 7.4 GB in memory, the respective times grow to 76 µs and 917 µs. Software is available at http://sun.aei.polsl.pl/mugi under a free license. Data S1 is available at PLOS One online.

  1. Broadband electrical impedance matching for piezoelectric ultrasound transducers.

    Science.gov (United States)

    Huang, Haiying; Paramo, Daniel

    2011-12-01

    This paper presents a systematic method for designing broadband electrical impedance matching networks for piezoelectric ultrasound transducers. The design process involves three steps: 1) determine the equivalent circuit of the unmatched piezoelectric transducer based on its measured admittance; 2) design a set of impedance matching networks using a computerized Smith chart; and 3) establish the simulation model of the matched transducer to evaluate the gain and bandwidth of the impedance matching networks. The effectiveness of the presented approach is demonstrated through the design, implementation, and characterization of impedance matching networks for a broadband acoustic emission sensor. The impedance matching network improved the power of the acquired signal by 9 times.

  2. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database.

    Science.gov (United States)

    Winsor, Geoffrey L; Griffiths, Emma J; Lo, Raymond; Dhillon, Bhavjinder K; Shay, Julie A; Brinkman, Fiona S L

    2016-01-04

    The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. Photon signature analysis using template matching

    Energy Technology Data Exchange (ETDEWEB)

    Bradley, D.A., E-mail: d.a.bradley@surrey.ac.uk [Department of Physics, University of Surrey, Guildford GU2 7XH (United Kingdom); Hashim, S., E-mail: suhairul@utm.my [Department of Physics, Universiti Teknologi Malaysia, 81310 Skudai, Johor (Malaysia); Saripan, M.I. [Faculty of Engineering, Universiti Putra Malaysia, 43400 Serdang, Selangor (Malaysia); Wells, K. [Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH (United Kingdom); Dunn, W.L. [Department of Mechanical and Nuclear Engineering, Kansas State University, 3002 Rathbone Hall, Manhattan, KS 66506 (United States)

    2011-10-01

    We describe an approach to detect improvised explosive devices (IEDs) by using a template matching procedure. This approach relies on the signature due to backstreaming {gamma} photons from various targets. In this work we have simulated cylindrical targets of aluminum, iron, copper, water and ammonium nitrate (nitrogen-rich fertilizer). We simulate 3.5 MeV source photons distributed on a plane inside a shielded area using Monte Carlo N-Particle (MCNP{sup TM}) code version 5 (V5). The 3.5 MeV source gamma rays yield 511 keV peaks due to pair production and scattered gamma rays. In this work, we simulate capture of those photons that backstream, after impinging on the target element, toward a NaI detector. The captured backstreamed photons are expected to produce a unique spectrum that will become part of a simple signal processing recognition system based on the template matching method. Different elements were simulated using different sets of random numbers in the Monte Carlo simulation. To date, the sum of absolute differences (SAD) method has been used to match the template. In the examples investigated, template matching was found to detect all elements correctly.

  4. Gene-enriched draft genome of the cattle tick Rhipicephalus microplus: assembly by the hybrid Pacific Biosciences/Illumina approach enabled analysis of the highly repetitive genome.

    Science.gov (United States)

    Barrero, Roberto A; Guerrero, Felix D; Black, Michael; McCooke, John; Chapman, Brett; Schilkey, Faye; Pérez de León, Adalberto A; Miller, Robert J; Bruns, Sara; Dobry, Jason; Mikhaylenko, Galina; Stormo, Keith; Bell, Callum; Tao, Quanzhou; Bogden, Robert; Moolhuijzen, Paula M; Hunter, Adam; Bellgard, Matthew I

    2017-08-01

    The genome of the cattle tick Rhipicephalus microplus, an ectoparasite with global distribution, is estimated to be 7.1Gbp in length and consists of approximately 70% repetitive DNA. We report the draft assembly of a tick genome that utilized a hybrid sequencing and assembly approach to capture the repetitive fractions of the genome. Our hybrid approach produced an assembly consisting of 2.0Gbp represented in 195,170 scaffolds with a N50 of 60,284bp. The Rmi v2.0 assembly is 51.46% repetitive with a large fraction of unclassified repeats, short interspersed elements, long interspersed elements and long terminal repeats. We identified 38,827 putative R. microplus gene loci, of which 24,758 were protein coding genes (≥100 amino acids). OrthoMCL comparative analysis against 11 selected species including insects and vertebrates identified 10,835 and 3,423 protein coding gene loci that are unique to R. microplus or common to both R. microplus and Ixodes scapularis ticks, respectively. We identified 191 microRNA loci, of which 168 have similarity to known miRNAs and 23 represent novel miRNA families. We identified the genomic loci of several highly divergent R. microplus esterases with sequence similarity to acetylcholinesterase. Additionally we report the finding of a novel cytochrome P450 CYP41 homolog that shows similar protein folding structures to known CYP41 proteins known to be involved in acaricide resistance. Copyright © 2017 Australian Society for Parasitology. Published by Elsevier Ltd. All rights reserved.

  5. The impact of post-genomics approaches in neurodegenerative demyelinating diseases: the case of Guillain-Barré syndrome.

    Science.gov (United States)

    Villar, Margarita; Mateos-Hernandez, Lourdes; de la Fuente, Jose

    2018-03-14

    Why an autoimmune disease that is the main cause of the acute neuromuscular paralysis worldwide has not yet a well-characterized cause or an effective treatment? The existence of different clinical variants for the Guillain-Barré syndrome (GBS) coupled with the fact that a high number of pathogens can cause an infection that sometimes, but not always, precedes the development of the syndrome, confers a high degree of uncertainty for both prognosis and treatment. In the post-genomic era, the development of omics technologies for the high-throughput analysis of biological molecules is allowing the characterization of biological systems in a degree of depth unimaginable before. In this context, this work summarize the application of post-genomics technologies to the study of GBS. We performed a structured search of bibliographic databases for peer-reviewed research literature to outline the state of the art with regard the application of post-genomics technologies to the study of GBS. The quality of retrieved papers was assessed using standard tools and thirty-four were included in the review. To date, transcriptomics and proteomics have been the unique post-genomics approaches applied to GBS study. Most of these studies have been performed on cerebrospinal fluid samples and only few studies have been conducted with other samples such as serum, Schwann cells and human peripheral nerve. In the post-genomics era, transcriptomics and proteomics have shown the possibilities that omics technologies can offer for a better understanding of the immunological and pathological mechanisms involved in GBS and the identification of potential biomarkers, but these results have only shown the tip of the iceberg and there is still a long way to exploit the full potential that post-genomics approaches could offer to the study of the GBS. The integration of different omics datasets through a systems biology approach could allow network-based analyses to describe the complexity and

  6. Robust Point Matching for Non-Rigid Shapes: A Relaxation Labeling Based Approach

    National Research Council Canada - National Science Library

    Zheng, Yefeng; Doermann, David S

    2004-01-01

    .... Based on this observation, we formulate point matching as a graph matching problem. Each point is a node in the graph, and two nodes are connected by an edge if their Euclidean distance is less...

  7. BG7: A New Approach for Bacterial Genome Annotation Designed for Next Generation Sequencing Data

    Science.gov (United States)

    Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Pareja, Eduardo; Tobes, Raquel

    2012-01-01

    BG7 is a new system for de novo bacterial, archaeal and viral genome annotation based on a new approach specifically designed for annotating genomes sequenced with next generation sequencing technologies. The system is versatile and able to annotate genes even in the step of preliminary assembly of the genome. It is especially efficient detecting unexpected genes horizontally acquired from bacterial or archaeal distant genomes, phages, plasmids, and mobile elements. From the initial phases of the gene annotation process, BG7 exploits the massive availability of annotated protein sequences in databases. BG7 predicts ORFs and infers their function based on protein similarity with a wide set of reference proteins, integrating ORF prediction and functional annotation phases in just one step. BG7 is especially tolerant to sequencing errors in start and stop codons, to frameshifts, and to assembly or scaffolding errors. The system is also tolerant to the high level of gene fragmentation which is frequently found in not fully assembled genomes. BG7 current version – which is developed in Java, takes advantage of Amazon Web Services (AWS) cloud computing features, but it can also be run locally in any operating system. BG7 is a fast, automated and scalable system that can cope with the challenge of analyzing the huge amount of genomes that are being sequenced with NGS technologies. Its capabilities and efficiency were demonstrated in the 2011 EHEC Germany outbreak in which BG7 was used to get the first annotations right the next day after the first entero-hemorrhagic E. coli genome sequences were made publicly available. The suitability of BG7 for genome annotation has been proved for Illumina, 454, Ion Torrent, and PacBio sequencing technologies. Besides, thanks to its plasticity, our system could be very easily adapted to work with new technologies in the future. PMID:23185310

  8. BG7: a new approach for bacterial genome annotation designed for next generation sequencing data.

    Directory of Open Access Journals (Sweden)

    Pablo Pareja-Tobes

    Full Text Available BG7 is a new system for de novo bacterial, archaeal and viral genome annotation based on a new approach specifically designed for annotating genomes sequenced with next generation sequencing technologies. The system is versatile and able to annotate genes even in the step of preliminary assembly of the genome. It is especially efficient detecting unexpected genes horizontally acquired from bacterial or archaeal distant genomes, phages, plasmids, and mobile elements. From the initial phases of the gene annotation process, BG7 exploits the massive availability of annotated protein sequences in databases. BG7 predicts ORFs and infers their function based on protein similarity with a wide set of reference proteins, integrating ORF prediction and functional annotation phases in just one step. BG7 is especially tolerant to sequencing errors in start and stop codons, to frameshifts, and to assembly or scaffolding errors. The system is also tolerant to the high level of gene fragmentation which is frequently found in not fully assembled genomes. BG7 current version - which is developed in Java, takes advantage of Amazon Web Services (AWS cloud computing features, but it can also be run locally in any operating system. BG7 is a fast, automated and scalable system that can cope with the challenge of analyzing the huge amount of genomes that are being sequenced with NGS technologies. Its capabilities and efficiency were demonstrated in the 2011 EHEC Germany outbreak in which BG7 was used to get the first annotations right the next day after the first entero-hemorrhagic E. coli genome sequences were made publicly available. The suitability of BG7 for genome annotation has been proved for Illumina, 454, Ion Torrent, and PacBio sequencing technologies. Besides, thanks to its plasticity, our system could be very easily adapted to work with new technologies in the future.

  9. Contig-Layout-Authenticator (CLA): A Combinatorial Approach to Ordering and Scaffolding of Bacterial Contigs for Comparative Genomics and Molecular Epidemiology.

    Science.gov (United States)

    Shaik, Sabiha; Kumar, Narender; Lankapalli, Aditya K; Tiwari, Sumeet K; Baddam, Ramani; Ahmed, Niyaz

    2016-01-01

    A wide variety of genome sequencing platforms have emerged in the recent past. High-throughput platforms like Illumina and 454 are essentially adaptations of the shotgun approach generating millions of fragmented single or paired sequencing reads. To reconstruct whole genomes, the reads have to be assembled into contigs, which often require further downstream processing. The contigs can be directly ordered according to a reference, scaffolded based on paired read information, or assembled using a combination of the two approaches. While the reference-based approach appears to mask strain-specific information, scaffolding based on paired-end information suffers when repetitive elements longer than the size of the sequencing reads are present in the genome. Sequencing technologies that produce long reads can solve the problems associated with repetitive elements but are not necessarily easily available to researchers. The most common high-throughput technology currently used is the Illumina short read platform. To improve upon the shortcomings associated with the construction of draft genomes with Illumina paired-end sequencing, we developed Contig-Layout-Authenticator (CLA). The CLA pipeline can scaffold reference-sorted contigs based on paired reads, resulting in better assembled genomes. Moreover, CLA also hints at probable misassemblies and contaminations, for the users to cross-check before constructing the consensus draft. The CLA pipeline was designed and trained extensively on various bacterial genome datasets for the ordering and scaffolding of large repetitive contigs. The tool has been validated and compared favorably with other widely-used scaffolding and ordering tools using both simulated and real sequence datasets. CLA is a user friendly tool that requires a single command line input to generate ordered scaffolds.

  10. The ‘Morbid Anatomy’ of the Human Genome: Tracing the Observational and Representational Approaches of Postwar Genetics and Biomedicine The William Bynum Prize Essay

    Science.gov (United States)

    Hogan, Andrew J.

    2014-01-01

    This paper explores evolving conceptions and depictions of the human genome among human and medical geneticists during the postwar period. Historians of science and medicine have shown significant interest in the use of informational approaches in postwar genetics, which treat the genome as an expansive digital data set composed of three billion DNA nucleotides. Since the 1950s, however, geneticists have largely interacted with the human genome at the microscopically visible level of chromosomes. Mindful of this, I examine the observational and representational approaches of postwar human and medical genetics. During the 1970s and 1980s, the genome increasingly came to be understood as, at once, a discrete part of the human anatomy and a standardised scientific object. This paper explores the role of influential medical geneticists in recasting the human genome as being a visible, tangible, and legible entity, which was highly relevant to traditional medical thinking and practice. I demonstrate how the human genome was established as an object amenable to laboratory and clinical research, and argue that the observational and representational approaches of postwar medical genetics reflect, more broadly, the interdisciplinary efforts underlying the development of contemporary biomedicine. PMID:25045177

  11. ChromaSig: a probabilistic approach to finding common chromatin signatures in the human genome.

    Directory of Open Access Journals (Sweden)

    Gary Hon

    2008-10-01

    Full Text Available Computational methods to identify functional genomic elements using genetic information have been very successful in determining gene structure and in identifying a handful of cis-regulatory elements. But the vast majority of regulatory elements have yet to be discovered, and it has become increasingly apparent that their discovery will not come from using genetic information alone. Recently, high-throughput technologies have enabled the creation of information-rich epigenetic maps, most notably for histone modifications. However, tools that search for functional elements using this epigenetic information have been lacking. Here, we describe an unsupervised learning method called ChromaSig to find, in an unbiased fashion, commonly occurring chromatin signatures in both tiling microarray and sequencing data. Applying this algorithm to nine chromatin marks across a 1% sampling of the human genome in HeLa cells, we recover eight clusters of distinct chromatin signatures, five of which correspond to known patterns associated with transcriptional promoters and enhancers. Interestingly, we observe that the distinct chromatin signatures found at enhancers mark distinct functional classes of enhancers in terms of transcription factor and coactivator binding. In addition, we identify three clusters of novel chromatin signatures that contain evolutionarily conserved sequences and potential cis-regulatory elements. Applying ChromaSig to a panel of 21 chromatin marks mapped genomewide by ChIP-Seq reveals 16 classes of genomic elements marked by distinct chromatin signatures. Interestingly, four classes containing enrichment for repressive histone modifications appear to be locally heterochromatic sites and are enriched in quickly evolving regions of the genome. The utility of this approach in uncovering novel, functionally significant genomic elements will aid future efforts of genome annotation via chromatin modifications.

  12. Molecular Concordance Between Primary Breast Cancer and Matched Metastases

    DEFF Research Database (Denmark)

    Krøigård, Anne Bruun; Larsen, Martin Jakob; Thomassen, Mads

    2016-01-01

    Clinical management of breast cancer is increasingly personalized and based on molecular profiling. Often, primary tumors are used as proxies for systemic disease at the time of recurrence. However, recent studies have revealed substantial discordances between primary tumors and metastases, both....... The purpose of this review is to illuminate the extent of cancer genome evolution through disease progression and the degree of molecular concordance between primary breast cancers and matched metastases. We present an overview of the most prominent studies investigating the expression of endocrine receptors......, transcriptomics, and genome aberrations in primary tumors and metastases. In conclusion, biopsy of metastatic lesions at recurrence of breast cancer is encouraged to provide optimal treatment of the disease. Furthermore, molecular profiling of metastatic tissue provides invaluable mechanistic insight...

  13. Computational approaches to identify functional genetic variants in cancer genomes

    DEFF Research Database (Denmark)

    Gonzalez-Perez, Abel; Mustonen, Ville; Reva, Boris

    2013-01-01

    The International Cancer Genome Consortium (ICGC) aims to catalog genomic abnormalities in tumors from 50 different cancer types. Genome sequencing reveals hundreds to thousands of somatic mutations in each tumor but only a minority of these drive tumor progression. We present the result of discu......The International Cancer Genome Consortium (ICGC) aims to catalog genomic abnormalities in tumors from 50 different cancer types. Genome sequencing reveals hundreds to thousands of somatic mutations in each tumor but only a minority of these drive tumor progression. We present the result...... of discussions within the ICGC on how to address the challenge of identifying mutations that contribute to oncogenesis, tumor maintenance or response to therapy, and recommend computational techniques to annotate somatic variants and predict their impact on cancer phenotype....

  14. Punctuated evolution of prostate cancer genomes.

    Science.gov (United States)

    Baca, Sylvan C; Prandi, Davide; Lawrence, Michael S; Mosquera, Juan Miguel; Romanel, Alessandro; Drier, Yotam; Park, Kyung; Kitabayashi, Naoki; MacDonald, Theresa Y; Ghandi, Mahmoud; Van Allen, Eliezer; Kryukov, Gregory V; Sboner, Andrea; Theurillat, Jean-Philippe; Soong, T David; Nickerson, Elizabeth; Auclair, Daniel; Tewari, Ashutosh; Beltran, Himisha; Onofrio, Robert C; Boysen, Gunther; Guiducci, Candace; Barbieri, Christopher E; Cibulskis, Kristian; Sivachenko, Andrey; Carter, Scott L; Saksena, Gordon; Voet, Douglas; Ramos, Alex H; Winckler, Wendy; Cipicchio, Michelle; Ardlie, Kristin; Kantoff, Philip W; Berger, Michael F; Gabriel, Stacey B; Golub, Todd R; Meyerson, Matthew; Lander, Eric S; Elemento, Olivier; Getz, Gad; Demichelis, Francesca; Rubin, Mark A; Garraway, Levi A

    2013-04-25

    The analysis of exonic DNA from prostate cancers has identified recurrently mutated genes, but the spectrum of genome-wide alterations has not been profiled extensively in this disease. We sequenced the genomes of 57 prostate tumors and matched normal tissues to characterize somatic alterations and to study how they accumulate during oncogenesis and progression. By modeling the genesis of genomic rearrangements, we identified abundant DNA translocations and deletions that arise in a highly interdependent manner. This phenomenon, which we term "chromoplexy," frequently accounts for the dysregulation of prostate cancer genes and appears to disrupt multiple cancer genes coordinately. Our modeling suggests that chromoplexy may induce considerable genomic derangement over relatively few events in prostate cancer and other neoplasms, supporting a model of punctuated cancer evolution. By characterizing the clonal hierarchy of genomic lesions in prostate tumors, we charted a path of oncogenic events along which chromoplexy may drive prostate carcinogenesis. Copyright © 2013 Elsevier Inc. All rights reserved.

  15. A hidden Markov model approach for determining expression from genomic tiling micro arrays

    Directory of Open Access Journals (Sweden)

    Krogh Anders

    2006-05-01

    Full Text Available Abstract Background Genomic tiling micro arrays have great potential for identifying previously undiscovered coding as well as non-coding transcription. To-date, however, analyses of these data have been performed in an ad hoc fashion. Results We present a probabilistic procedure, ExpressHMM, that adaptively models tiling data prior to predicting expression on genomic sequence. A hidden Markov model (HMM is used to model the distributions of tiling array probe scores in expressed and non-expressed regions. The HMM is trained on sets of probes mapped to regions of annotated expression and non-expression. Subsequently, prediction of transcribed fragments is made on tiled genomic sequence. The prediction is accompanied by an expression probability curve for visual inspection of the supporting evidence. We test ExpressHMM on data from the Cheng et al. (2005 tiling array experiments on ten Human chromosomes 1. Results can be downloaded and viewed from our web site 2. Conclusion The value of adaptive modelling of fluorescence scores prior to categorisation into expressed and non-expressed probes is demonstrated. Our results indicate that our adaptive approach is superior to the previous analysis in terms of nucleotide sensitivity and transfrag specificity.

  16. MATCHING ALTERNATIVE ADDRESSES: A SEMANTIC WEB APPROACH

    Directory of Open Access Journals (Sweden)

    S. Ariannamazi

    2015-12-01

    Full Text Available Rapid development of crowd-sourcing or volunteered geographic information (VGI provides opportunities for authoritatives that deal with geospatial information. Heterogeneity of multiple data sources and inconsistency of data types is a key characteristics of VGI datasets. The expansion of cities resulted in the growing number of POIs in the OpenStreetMap, a well-known VGI source, which causes the datasets to outdate in short periods of time. These changes made to spatial and aspatial attributes of features such as names and addresses might cause confusion or ambiguity in the processes that require feature’s literal information like addressing and geocoding. VGI sources neither will conform specific vocabularies nor will remain in a specific schema for a long period of time. As a result, the integration of VGI sources is crucial and inevitable in order to avoid duplication and the waste of resources. Information integration can be used to match features and qualify different annotation alternatives for disambiguation. This study enhances the search capabilities of geospatial tools with applications able to understand user terminology to pursuit an efficient way for finding desired results. Semantic web is a capable tool for developing technologies that deal with lexical and numerical calculations and estimations. There are a vast amount of literal-spatial data representing the capability of linguistic information in knowledge modeling, but these resources need to be harmonized based on Semantic Web standards. The process of making addresses homogenous generates a helpful tool based on spatial data integration and lexical annotation matching and disambiguating.

  17. Order and correlations in genomic DNA sequences. The spectral approach

    International Nuclear Information System (INIS)

    Lobzin, Vasilii V; Chechetkin, Vladimir R

    2000-01-01

    The structural analysis of genomic DNA sequences is discussed in the framework of the spectral approach, which is sufficiently universal due to the reciprocal correspondence and mutual complementarity of Fourier transform length scales. The spectral characteristics of random sequences of the same nucleotide composition possess the property of self-averaging for relatively short sequences of length M≥100-300. Comparison with the characteristics of random sequences determines the statistical significance of the structural features observed. Apart from traditional applications to the search for hidden periodicities, spectral methods are also efficient in studying mutual correlations in DNA sequences. By combining spectra for structure factors and correlation functions, not only integral correlations can be estimated but also their origin identified. Using the structural spectral entropy approach, the regularity of a sequence can be quantitatively assessed. A brief introduction to the problem is also presented and other major methods of DNA sequence analysis described. (reviews of topical problems)

  18. Template match using local feature with view invariance

    Science.gov (United States)

    Lu, Cen; Zhou, Gang

    2013-10-01

    Matching the template image in the target image is the fundamental task in the field of computer vision. Aiming at the deficiency in the traditional image matching methods and inaccurate matching in scene image with rotation, illumination and view changing, a novel matching algorithm using local features are proposed in this paper. The local histograms of the edge pixels (LHoE) are extracted as the invariable feature to resist view and brightness changing. The merits of the LHoE is that the edge points have been little affected with view changing, and the LHoE can resist not only illumination variance but also the polution of noise. For the process of matching are excuded only on the edge points, the computation burden are highly reduced. Additionally, our approach is conceptually simple, easy to implement and do not need the training phase. The view changing can be considered as the combination of rotation, illumination and shear transformation. Experimental results on simulated and real data demonstrated that the proposed approach is superior to NCC(Normalized cross-correlation) and Histogram-based methods with view changing.

  19. An effective approach for iris recognition using phase-based image matching.

    Science.gov (United States)

    Miyazawa, Kazuyuki; Ito, Koichi; Aoki, Takafumi; Kobayashi, Koji; Nakajima, Hiroshi

    2008-10-01

    This paper presents an efficient algorithm for iris recognition using phase-based image matching--an image matching technique using phase components in 2D Discrete Fourier Transforms (DFTs) of given images. Experimental evaluation using CASIA iris image databases (versions 1.0 and 2.0) and Iris Challenge Evaluation (ICE) 2005 database clearly demonstrates that the use of phase components of iris images makes possible to achieve highly accurate iris recognition with a simple matching algorithm. This paper also discusses major implementation issues of our algorithm. In order to reduce the size of iris data and to prevent the visibility of iris images, we introduce the idea of 2D Fourier Phase Code (FPC) for representing iris information. The 2D FPC is particularly useful for implementing compact iris recognition devices using state-of-the-art Digital Signal Processing (DSP) technology.

  20. Capturing prokaryotic dark matter genomes.

    Science.gov (United States)

    Gasc, Cyrielle; Ribière, Céline; Parisot, Nicolas; Beugnot, Réjane; Defois, Clémence; Petit-Biderre, Corinne; Boucher, Delphine; Peyretaillade, Eric; Peyret, Pierre

    2015-12-01

    Prokaryotes are the most diverse and abundant cellular life forms on Earth. Most of them, identified by indirect molecular approaches, belong to microbial dark matter. The advent of metagenomic and single-cell genomic approaches has highlighted the metabolic capabilities of numerous members of this dark matter through genome reconstruction. Thus, linking functions back to the species has revolutionized our understanding of how ecosystem function is sustained by the microbial world. This review will present discoveries acquired through the illumination of prokaryotic dark matter genomes by these innovative approaches. Copyright © 2015 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

  1. Are Current Physical Match Performance Metrics in Elite Soccer Fit for Purpose or is the Adoption of an Integrated Approach Needed?

    Science.gov (United States)

    Bradley, Paul S; Ade, Jack D

    2018-01-18

    Time-motion analysis is a valuable data-collection technique used to quantify the physical match performance of elite soccer players. For over 40 years researchers have adopted a 'traditional' approach when evaluating match demands by simply reporting the distance covered or time spent along a motion continuum of walking through to sprinting. This methodology quantifies physical metrics in isolation without integrating other factors and this ultimately leads to a one-dimensional insight into match performance. Thus, this commentary proposes a novel 'integrated' approach that focuses on a sensitive physical metric such as high-intensity running but contextualizes this in relation to key tactical activities for each position and collectively for the team. In the example presented, the 'integrated' model clearly unveils the unique high-intensity profile that exists due to distinct tactical roles, rather than one-dimensional 'blind' distances produced by 'traditional' models. Intuitively this innovative concept may aid the coaches understanding of the physical performance in relation to the tactical roles and instructions given to the players. Additionally, it will enable practitioners to more effectively translate match metrics into training and testing protocols. This innovative model may well aid advances in other team sports that incorporate similar intermittent movements with tactical purpose. Evidence of the merits and application of this new concept are needed before the scientific community accepts this model as it may well add complexity to an area that conceivably needs simplicity.

  2. Matched pairs approach to set theoretic solutions of the Yang-Baxter equation

    International Nuclear Information System (INIS)

    Gateva-Ivanova, T.; Majid, S.

    2005-08-01

    We study set-theoretic solutions (X,r) of the Yang-Baxter equations on a set X in terms of the induced left and right actions of X on itself. We give a characterization of involutive square-free solutions in terms of cyclicity conditions. We characterise general solutions in terms of an induced matched pair of unital semigroups S(X,r) and construct (S,r S ) from the matched pair. Finally, we study extensions of solutions in terms of matched pairs of their associated semigroups. We also prove several general results about matched pairs of unital semigroups of the required type, including iterated products S bowtie S bowtie S underlying the proof that r S is a solution, and extensions (S bowtie T, r Sb owtie T ). Examples include a general 'double' construction (S bowtie S,r Sb owtie S ) and some concrete extensions, their actions and graphs based on small sets. (author)

  3. Mix-and-match holography

    KAUST Repository

    Peng, Yifan

    2017-11-22

    Computational caustics and light steering displays offer a wide range of interesting applications, ranging from art works and architectural installations to energy efficient HDR projection. In this work we expand on this concept by encoding several target images into pairs of front and rear phase-distorting surfaces. Different target holograms can be decoded by mixing and matching different front and rear surfaces under specific geometric alignments. Our approach, which we call mix-and-match holography, is made possible by moving from a refractive caustic image formation process to a diffractive, holographic one. This provides the extra bandwidth that is required to multiplex several images into pairing surfaces.

  4. A rigorous approach to facilitate and guarantee the correctness of the genetic testing management in human genome information systems.

    Science.gov (United States)

    Araújo, Luciano V; Malkowski, Simon; Braghetto, Kelly R; Passos-Bueno, Maria R; Zatz, Mayana; Pu, Calton; Ferreira, João E

    2011-12-22

    Recent medical and biological technology advances have stimulated the development of new testing systems that have been providing huge, varied amounts of molecular and clinical data. Growing data volumes pose significant challenges for information processing systems in research centers. Additionally, the routines of genomics laboratory are typically characterized by high parallelism in testing and constant procedure changes. This paper describes a formal approach to address this challenge through the implementation of a genetic testing management system applied to human genome laboratory. We introduced the Human Genome Research Center Information System (CEGH) in Brazil, a system that is able to support constant changes in human genome testing and can provide patients updated results based on the most recent and validated genetic knowledge. Our approach uses a common repository for process planning to ensure reusability, specification, instantiation, monitoring, and execution of processes, which are defined using a relational database and rigorous control flow specifications based on process algebra (ACP). The main difference between our approach and related works is that we were able to join two important aspects: 1) process scalability achieved through relational database implementation, and 2) correctness of processes using process algebra. Furthermore, the software allows end users to define genetic testing without requiring any knowledge about business process notation or process algebra. This paper presents the CEGH information system that is a Laboratory Information Management System (LIMS) based on a formal framework to support genetic testing management for Mendelian disorder studies. We have proved the feasibility and showed usability benefits of a rigorous approach that is able to specify, validate, and perform genetic testing using easy end user interfaces.

  5. Convergent functional genomics in addiction research - a translational approach to study candidate genes and gene networks.

    Science.gov (United States)

    Spanagel, Rainer

    2013-01-01

    Convergent functional genomics (CFG) is a translational methodology that integrates in a Bayesian fashion multiple lines of evidence from studies in human and animal models to get a better understanding of the genetics of a disease or pathological behavior. Here the integration of data sets that derive from forward genetics in animals and genetic association studies including genome wide association studies (GWAS) in humans is described for addictive behavior. The aim of forward genetics in animals and association studies in humans is to identify mutations (e.g. SNPs) that produce a certain phenotype; i.e. "from phenotype to genotype". Most powerful in terms of forward genetics is combined quantitative trait loci (QTL) analysis and gene expression profiling in recombinant inbreed rodent lines or genetically selected animals for a specific phenotype, e.g. high vs. low drug consumption. By Bayesian scoring genomic information from forward genetics in animals is then combined with human GWAS data on a similar addiction-relevant phenotype. This integrative approach generates a robust candidate gene list that has to be functionally validated by means of reverse genetics in animals; i.e. "from genotype to phenotype". It is proposed that studying addiction relevant phenotypes and endophenotypes by this CFG approach will allow a better determination of the genetics of addictive behavior.

  6. i-Genome: A database to summarize oligonucleotide data in genomes

    Directory of Open Access Journals (Sweden)

    Chang Yu-Chung

    2004-10-01

    Full Text Available Abstract Background Information on the occurrence of sequence features in genomes is crucial to comparative genomics, evolutionary analysis, the analyses of regulatory sequences and the quantitative evaluation of sequences. Computing the frequencies and the occurrences of a pattern in complete genomes is time-consuming. Results The proposed database provides information about sequence features generated by exhaustively computing the sequences of the complete genome. The repetitive elements in the eukaryotic genomes, such as LINEs, SINEs, Alu and LTR, are obtained from Repbase. The database supports various complete genomes including human, yeast, worm, and 128 microbial genomes. Conclusions This investigation presents and implements an efficiently computational approach to accumulate the occurrences of the oligonucleotides or patterns in complete genomes. A database is established to maintain the information of the sequence features, including the distributions of oligonucleotide, the gene distribution, the distribution of repetitive elements in genomes and the occurrences of the oligonucleotides. The database can provide more effective and efficient way to access the repetitive features in genomes.

  7. A comparative study between matched and mis-matched projection/back projection pairs used with ASIRT reconstruction method

    International Nuclear Information System (INIS)

    Guedouar, R.; Zarrad, B.

    2010-01-01

    For algebraic reconstruction techniques both forward and back projection operators are needed. The ability to perform accurate reconstruction relies fundamentally on the forward projection and back projection methods which are usually, the transpose of each other. Even though the mis-matched pairs may introduce additional errors during the iterative process, the usefulness of mis-matched projector/back projector pairs has been proved in image reconstruction. This work investigates the performance of matched and mis-matched reconstruction pairs using popular forward projectors and their transposes when used in reconstruction tasks with additive simultaneous iterative reconstruction techniques (ASIRT) in a parallel beam approach. Simulated noiseless phantoms are used to compare the performance of the investigated pairs in terms of the root mean squared errors (RMSE) which are calculated between reconstructed slices and the reference in different regions. Results show that mis-matched projection/back projection pairs can promise more accuracy of reconstructed images than matched ones. The forward projection operator performance seems independent of the choice of the back projection operator and vice versa.

  8. A comparative study between matched and mis-matched projection/back projection pairs used with ASIRT reconstruction method

    Energy Technology Data Exchange (ETDEWEB)

    Guedouar, R., E-mail: raja_guedouar@yahoo.f [Higher School of Health Sciences and Techniques of Monastir, Av. Avicenne, 5060 Monastir, B.P. 128 (Tunisia); Zarrad, B., E-mail: boubakerzarrad@yahoo.f [Higher School of Health Sciences and Techniques of Monastir, Av. Avicenne, 5060 Monastir, B.P. 128 (Tunisia)

    2010-07-21

    For algebraic reconstruction techniques both forward and back projection operators are needed. The ability to perform accurate reconstruction relies fundamentally on the forward projection and back projection methods which are usually, the transpose of each other. Even though the mis-matched pairs may introduce additional errors during the iterative process, the usefulness of mis-matched projector/back projector pairs has been proved in image reconstruction. This work investigates the performance of matched and mis-matched reconstruction pairs using popular forward projectors and their transposes when used in reconstruction tasks with additive simultaneous iterative reconstruction techniques (ASIRT) in a parallel beam approach. Simulated noiseless phantoms are used to compare the performance of the investigated pairs in terms of the root mean squared errors (RMSE) which are calculated between reconstructed slices and the reference in different regions. Results show that mis-matched projection/back projection pairs can promise more accuracy of reconstructed images than matched ones. The forward projection operator performance seems independent of the choice of the back projection operator and vice versa.

  9. Joint Genome Institute's Automation Approach and History

    Energy Technology Data Exchange (ETDEWEB)

    Roberts, Simon

    2006-07-05

    Department of Energy/Joint Genome Institute (DOE/JGI) collaborates with DOE national laboratories and community users, to advance genome science in support of the DOE missions of clean bio-energy, carbon cycling, and bioremediation.

  10. Efficient privacy-preserving string search and an application in genomics.

    Science.gov (United States)

    Shimizu, Kana; Nuida, Koji; Rätsch, Gunnar

    2016-06-01

    Personal genomes carry inherent privacy risks and protecting privacy poses major social and technological challenges. We consider the case where a user searches for genetic information (e.g. an allele) on a server that stores a large genomic database and aims to receive allele-associated information. The user would like to keep the query and result private and the server the database. We propose a novel approach that combines efficient string data structures such as the Burrows-Wheeler transform with cryptographic techniques based on additive homomorphic encryption. We assume that the sequence data is searchable in efficient iterative query operations over a large indexed dictionary, for instance, from large genome collections and employing the (positional) Burrows-Wheeler transform. We use a technique called oblivious transfer that is based on additive homomorphic encryption to conceal the sequence query and the genomic region of interest in positional queries. We designed and implemented an efficient algorithm for searching sequences of SNPs in large genome databases. During search, the user can only identify the longest match while the server does not learn which sequence of SNPs the user queried. In an experiment based on 2184 aligned haploid genomes from the 1000 Genomes Project, our algorithm was able to perform typical queries within [Formula: see text] 4.6 s and [Formula: see text] 10.8 s for client and server side, respectively, on laptop computers. The presented algorithm is at least one order of magnitude faster than an exhaustive baseline algorithm. https://github.com/iskana/PBWT-sec and https://github.com/ratschlab/PBWT-sec shimizu-kana@aist.go.jp or Gunnar.Ratsch@ratschlab.org Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  11. Genome-wide local ancestry approach identifies genes and variants associated with chemotherapeutic susceptibility in African Americans.

    Directory of Open Access Journals (Sweden)

    Heather E Wheeler

    Full Text Available Chemotherapeutic agents are used in the treatment of many cancers, yet variable resistance and toxicities among individuals limit successful outcomes. Several studies have indicated outcome differences associated with ancestry among patients with various cancer types. Using both traditional SNP-based and newly developed gene-based genome-wide approaches, we investigated the genetics of chemotherapeutic susceptibility in lymphoblastoid cell lines derived from 83 African Americans, a population for which there is a disparity in the number of genome-wide studies performed. To account for population structure in this admixed population, we incorporated local ancestry information into our association model. We tested over 2 million SNPs and identified 325, 176, 240, and 190 SNPs that were suggestively associated with cytarabine-, 5'-deoxyfluorouridine (5'-DFUR-, carboplatin-, and cisplatin-induced cytotoxicity, respectively (p≤10(-4. Importantly, some of these variants are found only in populations of African descent. We also show that cisplatin-susceptibility SNPs are enriched for carboplatin-susceptibility SNPs. Using a gene-based genome-wide association approach, we identified 26, 11, 20, and 41 suggestive candidate genes for association with cytarabine-, 5'-DFUR-, carboplatin-, and cisplatin-induced cytotoxicity, respectively (p≤10(-3. Fourteen of these genes showed evidence of association with their respective chemotherapeutic phenotypes in the Yoruba from Ibadan, Nigeria (p<0.05, including TP53I11, COPS5 and GAS8, which are known to be involved in tumorigenesis. Although our results require further study, we have identified variants and genes associated with chemotherapeutic susceptibility in African Americans by using an approach that incorporates local ancestry information.

  12. The match-to-match variation of match-running in elite female soccer.

    Science.gov (United States)

    Trewin, Joshua; Meylan, César; Varley, Matthew C; Cronin, John

    2018-02-01

    The purpose of this study was to examine the match-to-match variation of match-running in elite female soccer players utilising GPS, using full-match and rolling period analyses. Longitudinal study. Elite female soccer players (n=45) from the same national team were observed during 55 international fixtures across 5 years (2012-2016). Data was analysed using a custom built MS Excel spreadsheet as full-matches and using a rolling 5-min analysis period, for all players who played 90-min matches (files=172). Variation was examined using co-efficient of variation and 90% confidence limits, calculated following log transformation. Total distance per minute exhibited the smallest variation when both the full-match and peak 5-min running periods were examined (CV=6.8-7.2%). Sprint-efforts were the most variable during a full-match (CV=53%), whilst high-speed running per minute exhibited the greatest variation in the post-peak 5-min period (CV=143%). Peak running periods were observed as slightly more variable than full-match analyses, with the post-peak period very-highly variable. Variability of accelerations (CV=17%) and Player Load (CV=14%) was lower than that of high-speed actions. Positional differences were also present, with centre backs exhibiting the greatest variation in high-speed movements (CV=41-65%). Practitioners and researchers should account for within player variability when examining match performances. Identification of peak running periods should be used to assist worst case scenarios. Whilst micro-sensor technology should be further examined as to its viable use within match-analyses. Copyright © 2017 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.

  13. Theory of fractional order elements based impedance matching networks

    KAUST Repository

    Radwan, Ahmed G.

    2011-03-01

    Fractional order circuit elements (inductors and capacitors) based impedance matching networks are introduced for the first time. In comparison to the conventional integer based L-type matching networks, fractional matching networks are much simpler and versatile. Any complex load can be matched utilizing a single series fractional element, which generally requires two elements for matching in the conventional approach. It is shown that all the Smith chart circles (resistance and reactance) are actually pairs of completely identical circles. They appear to be single for the conventional integer order case, where the identical circles completely overlap each other. The concept is supported by design equations and impedance matching examples. © 2010 IEEE.

  14. GenColors-based comparative genome databases for small eukaryotic genomes.

    Science.gov (United States)

    Felder, Marius; Romualdi, Alessandro; Petzold, Andreas; Platzer, Matthias; Sühnel, Jürgen; Glöckner, Gernot

    2013-01-01

    Many sequence data repositories can give a quick and easily accessible overview on genomes and their annotations. Less widespread is the possibility to compare related genomes with each other in a common database environment. We have previously described the GenColors database system (http://gencolors.fli-leibniz.de) and its applications to a number of bacterial genomes such as Borrelia, Legionella, Leptospira and Treponema. This system has an emphasis on genome comparison. It combines data from related genomes and provides the user with an extensive set of visualization and analysis tools. Eukaryote genomes are normally larger than prokaryote genomes and thus pose additional challenges for such a system. We have, therefore, adapted GenColors to also handle larger datasets of small eukaryotic genomes and to display eukaryotic gene structures. Further recent developments include whole genome views, genome list options and, for bacterial genome browsers, the display of horizontal gene transfer predictions. Two new GenColors-based databases for two fungal species (http://fgb.fli-leibniz.de) and for four social amoebas (http://sacgb.fli-leibniz.de) were set up. Both new resources open up a single entry point for related genomes for the amoebozoa and fungal research communities and other interested users. Comparative genomics approaches are greatly facilitated by these resources.

  15. Genome-enabled Modeling of Microbial Biogeochemistry using a Trait-based Approach. Does Increasing Metabolic Complexity Increase Predictive Capabilities?

    Science.gov (United States)

    King, E.; Karaoz, U.; Molins, S.; Bouskill, N.; Anantharaman, K.; Beller, H. R.; Banfield, J. F.; Steefel, C. I.; Brodie, E.

    2015-12-01

    The biogeochemical functioning of ecosystems is shaped in part by genomic information stored in the subsurface microbiome. Cultivation-independent approaches allow us to extract this information through reconstruction of thousands of genomes from a microbial community. Analysis of these genomes, in turn, gives an indication of the organisms present and their functional roles. However, metagenomic analyses can currently deliver thousands of different genomes that range in abundance/importance, requiring the identification and assimilation of key physiologies and metabolisms to be represented as traits for successful simulation of subsurface processes. Here we focus on incorporating -omics information into BioCrunch, a genome-informed trait-based model that represents the diversity of microbial functional processes within a reactive transport framework. This approach models the rate of nutrient uptake and the thermodynamics of coupled electron donors and acceptors for a range of microbial metabolisms including heterotrophs and chemolithotrophs. Metabolism of exogenous substrates fuels catabolic and anabolic processes, with the proportion of energy used for cellular maintenance, respiration, biomass development, and enzyme production based upon dynamic intracellular and environmental conditions. This internal resource partitioning represents a trade-off against biomass formation and results in microbial community emergence across a fitness landscape. Biocrunch was used here in simulations that included organisms and metabolic pathways derived from a dataset of ~1200 non-redundant genomes reflecting a microbial community in a floodplain aquifer. Metagenomic data was directly used to parameterize trait values related to growth and to identify trait linkages associated with respiration, fermentation, and key enzymatic functions such as plant polymer degradation. Simulations spanned a range of metabolic complexities and highlight benefits originating from simulations

  16. Novel genomes and genome constitutions identified by GISH and 5S rDNA and knotted1 genomic sequences in the genus Setaria.

    Science.gov (United States)

    Zhao, Meicheng; Zhi, Hui; Doust, Andrew N; Li, Wei; Wang, Yongfang; Li, Haiquan; Jia, Guanqing; Wang, Yongqiang; Zhang, Ning; Diao, Xianmin

    2013-04-11

    The Setaria genus is increasingly of interest to researchers, as its two species, S. viridis and S. italica, are being developed as models for understanding C4 photosynthesis and plant functional genomics. The genome constitution of Setaria species has been studied in the diploid species S. viridis, S. adhaerans and S. grisebachii, where three genomes A, B and C were identified respectively. Two allotetraploid species, S. verticillata and S. faberi, were found to have AABB genomes, and one autotetraploid species, S. queenslandica, with an AAAA genome, has also been identified. The genomes and genome constitutions of most other species remain unknown, even though it was thought there are approximately 125 species in the genus distributed world-wide. GISH was performed to detect the genome constitutions of Eurasia species of S. glauca, S. plicata, and S. arenaria, with the known A, B and C genomes as probes. No or very poor hybridization signal was detected indicating that their genomes are different from those already described. GISH was also performed reciprocally between S. glauca, S. plicata, and S. arenaria genomes, but no hybridization signals between each other were found. The two sets of chromosomes of S. lachnea both hybridized strong signals with only the known C genome of S. grisebachii. Chromosomes of Qing 9, an accession formerly considered as S. viridis, hybridized strong signal only to B genome of S. adherans. Phylogenetic trees constructed with 5S rDNA and knotted1 markers, clearly classify the samples in this study into six clusters, matching the GISH results, and suggesting that the F genome of S. arenaria is basal in the genus. Three novel genomes in the Setaria genus were identified and designated as genome D (S. glauca), E (S. plicata) and F (S. arenaria) respectively. The genome constitution of tetraploid S. lachnea is putatively CCC'C'. Qing 9 is a B genome species indigenous to China and is hypothesized to be a newly identified species. The

  17. Theoretical investigation into negative differential resistance characteristics of resonant tunneling diodes based on lattice-matched and polarization-matched AlInN/GaN heterostructures

    Science.gov (United States)

    Rong, Taotao; Yang, Lin-An; Yang, Lin; Hao, Yue

    2018-01-01

    In this work, we report an investigation of resonant tunneling diodes (RTDs) with lattice-matched and polarization-matched AlInN/GaN heterostructures using the numerical simulation. Compared with the lattice-matched AlInN/GaN RTDs, the RTDs based on polarization-matched AlInN/GaN hetero-structures exhibit symmetrical conduction band profiles due to eliminating the polarization charge discontinuity, which achieve the equivalence of double barrier transmission coefficients, thereby the relatively high driving current, the high symmetry of current density, and the high peak-to-valley current ratio (PVCR) under the condition of the positive and the negative sweeping voltages. Simulations show that the peak current density approaches 1.2 × 107 A/cm2 at the bias voltage of 0.72 V and the PVCR approaches 1.37 at both sweeping voltages. It also shows that under the condition of the same shallow energy level, when the trap density reaches 1 × 1019 cm-3, the polarization-matched RTDs still have acceptable negative differential resistance (NDR) characteristics, while the NDR characteristics of lattice-matched RTDs become irregular. After introducing the deeper energy level of 1 eV into the polarization-matched and lattice-matched RTDs, 60 scans are performed under the same trap density. Simulation results show that the degradation of the polarization-matched RTDs is 22%, while lattice-matched RTDs have a degradation of 55%. It can be found that the polarization-matched RTDs have a greater defect tolerance than the lattice-matched RTDs, which is beneficial to the available manufacture of actual terahertz RTD devices.

  18. A genomic pathway approach to a complex disease: axon guidance and Parkinson disease.

    Directory of Open Access Journals (Sweden)

    Timothy G Lesnick

    2007-06-01

    Full Text Available While major inroads have been made in identifying the genetic causes of rare Mendelian disorders, little progress has been made in the discovery of common gene variations that predispose to complex diseases. The single gene variants that have been shown to associate reproducibly with complex diseases typically have small effect sizes or attributable risks. However, the joint actions of common gene variants within pathways may play a major role in predisposing to complex diseases (the paradigm of complex genetics. The goal of this study was to determine whether polymorphism in a candidate pathway (axon guidance predisposed to a complex disease (Parkinson disease [PD]. We mined a whole-genome association dataset and identified single nucleotide polymorphisms (SNPs that were within axon-guidance pathway genes. We then constructed models of axon-guidance pathway SNPs that predicted three outcomes: PD susceptibility (odds ratio = 90.8, p = 4.64 x 10(-38, survival free of PD (hazards ratio = 19.0, p = 5.43 x 10(-48, and PD age at onset (R(2 = 0.68, p = 1.68 x 10(-51. By contrast, models constructed from thousands of random selections of genomic SNPs predicted the three PD outcomes poorly. Mining of a second whole-genome association dataset and mining of an expression profiling dataset also supported a role for many axon-guidance pathway genes in PD. These findings could have important implications regarding the pathogenesis of PD. This genomic pathway approach may also offer insights into other complex diseases such as Alzheimer disease, diabetes mellitus, nicotine and alcohol dependence, and several cancers.

  19. Discovery and annotation of small proteins using genomics, proteomics and computational approaches

    Energy Technology Data Exchange (ETDEWEB)

    Yang, Xiaohan; Tschaplinski, Timothy J.; Hurst, Gregory B.; Jawdy, Sara; Abraham, Paul E.; Lankford, Patricia K.; Adams, Rachel M.; Shah, Manesh B.; Hettich, Robert L.; Lindquist, Erika; Kalluri, Udaya C.; Gunter, Lee E.; Pennacchio, Christa; Tuskan, Gerald A.

    2011-03-02

    Small proteins (10 200 amino acids aa in length) encoded by short open reading frames (sORF) play important regulatory roles in various biological processes, including tumor progression, stress response, flowering, and hormone signaling. However, ab initio discovery of small proteins has been relatively overlooked. Recent advances in deep transcriptome sequencing make it possible to efficiently identify sORFs at the genome level. In this study, we obtained 2.6 million expressed sequence tag (EST) reads from Populus deltoides leaf transcriptome and reconstructed full-length transcripts from the EST sequences. We identified an initial set of 12,852 sORFs encoding proteins of 10 200 aa in length. Three computational approaches were then used to enrich for bona fide protein-coding sORFs from the initial sORF set: (1) codingpotential prediction, (2) evolutionary conservation between P. deltoides and other plant species, and (3) gene family clustering within P. deltoides. As a result, a high-confidence sORF candidate set containing 1469 genes was obtained. Analysis of the protein domains, non-protein-coding RNA motifs, sequence length distribution, and protein mass spectrometry data supported this high-confidence sORF set. In the high-confidence sORF candidate set, known protein domains were identified in 1282 genes (higher-confidence sORF candidate set), out of which 611 genes, designated as highest-confidence candidate sORF set, were supported by proteomics data. Of the 611 highest-confidence candidate sORF genes, 56 were new to the current Populus genome annotation. This study not only demonstrates that there are potential sORF candidates to be annotated in sequenced genomes, but also presents an efficient strategy for discovery of sORFs in species with no genome annotation yet available.

  20. Applying Shannon's information theory to bacterial and phage genomes and metagenomes

    Science.gov (United States)

    Akhter, Sajia; Bailey, Barbara A.; Salamon, Peter; Aziz, Ramy K.; Edwards, Robert A.

    2013-01-01

    All sequence data contain inherent information that can be measured by Shannon's uncertainty theory. Such measurement is valuable in evaluating large data sets, such as metagenomic libraries, to prioritize their analysis and annotation, thus saving computational resources. Here, Shannon's index of complete phage and bacterial genomes was examined. The information content of a genome was found to be highly dependent on the genome length, GC content, and sequence word size. In metagenomic sequences, the amount of information correlated with the number of matches found by comparison to sequence databases. A sequence with more information (higher uncertainty) has a higher probability of being significantly similar to other sequences in the database. Measuring uncertainty may be used for rapid screening for sequences with matches in available database, prioritizing computational resources, and indicating which sequences with no known similarities are likely to be important for more detailed analysis.

  1. Unexpected observations after mapping LongSAGE tags to the human genome

    Directory of Open Access Journals (Sweden)

    Duret Laurent

    2007-05-01

    Full Text Available Abstract Background SAGE has been used widely to study the expression of known transcripts, but much less to annotate new transcribed regions. LongSAGE produces tags that are sufficiently long to be reliably mapped to a whole-genome sequence. Here we used this property to study the position of human LongSAGE tags obtained from all public libraries. We focused mainly on tags that do not map to known transcripts. Results Using a published error rate in SAGE libraries, we first removed the tags likely to result from sequencing errors. We then observed that an unexpectedly large number of the remaining tags still did not match the genome sequence. Some of these correspond to parts of human mRNAs, such as polyA tails, junctions between two exons and polymorphic regions of transcripts. Another non-negligible proportion can be attributed to contamination by murine transcripts and to residual sequencing errors. After filtering out our data with these screens to ensure that our dataset is highly reliable, we studied the tags that map once to the genome. 31% of these tags correspond to unannotated transcripts. The others map to known transcribed regions, but many of them (nearly half are located either in antisense or in new variants of these known transcripts. Conclusion We performed a comprehensive study of all publicly available human LongSAGE tags, and carefully verified the reliability of these data. We found the potential origin of many tags that did not match the human genome sequence. The properties of the remaining tags imply that the level of sequencing error may have been under-estimated. The frequency of tags matching once the genome sequence but not in an annotated exon suggests that the human transcriptome is much more complex than shown by the current human genome annotations, with many new splicing variants and antisense transcripts. SAGE data is appropriate to map new transcripts to the genome, as demonstrated by the high rate of cross

  2. Practical Approaches for Detecting Selection in Microbial Genomes

    OpenAIRE

    Hedge, Jessica; Wilson, Daniel J.

    2016-01-01

    Microbial genome evolution is shaped by a variety of selective pressures. Understanding how these processes occur can help to address important problems in microbiology by explaining observed differences in phenotypes, including virulence and resistance to antibiotics. Greater access to whole-genome sequencing provides microbiologists with the opportunity to perform large-scale analyses of selection in novel settings, such as within individual hosts. This tutorial aims to guide researchers th...

  3. Approaches for Comparative Genomics in Aspergillus and Penicillium

    DEFF Research Database (Denmark)

    Rasmussen, Jane Lind Nybo; Theobald, Sebastian; Brandl, Julian

    2016-01-01

    and applicable for many types of studies. In this chapter, we provide an overview of the state-of-the-art of comparative genomics in these fungi, along with recommended methods. The chapter describes databases for fungal comparative genomics. Based on experience, we suggest strategies for multiple types...... of comparative genomics, ranging from analysis of single genes, over gene clusters and CaZymes to genome-scale comparative genomics. Furthermore, we have examined published comparative genomics papers to summarize the preferred bioinformatic methods and parameters for a given type of analysis, highly useful...... comparative genomics to the development in bacterial genomics, where the comparison of hundreds of genomes has been performed for a while....

  4. Genomes to Proteomes

    Energy Technology Data Exchange (ETDEWEB)

    Panisko, Ellen A. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Grigoriev, Igor [USDOE Joint Genome Inst., Walnut Creek, CA (United States); Daly, Don S. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Webb-Robertson, Bobbie-Jo [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Baker, Scott E. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2009-03-01

    Biologists are awash with genomic sequence data. In large part, this is due to the rapid acceleration in the generation of DNA sequence that occurred as public and private research institutes raced to sequence the human genome. In parallel with the large human genome effort, mostly smaller genomes of other important model organisms were sequenced. Projects following on these initial efforts have made use of technological advances and the DNA sequencing infrastructure that was built for the human and other organism genome projects. As a result, the genome sequences of many organisms are available in high quality draft form. While in many ways this is good news, there are limitations to the biological insights that can be gleaned from DNA sequences alone; genome sequences offer only a bird's eye view of the biological processes endemic to an organism or community. Fortunately, the genome sequences now being produced at such a high rate can serve as the foundation for other global experimental platforms such as proteomics. Proteomic methods offer a snapshot of the proteins present at a point in time for a given biological sample. Current global proteomics methods combine enzymatic digestion, separations, mass spectrometry and database searching for peptide identification. One key aspect of proteomics is the prediction of peptide sequences from mass spectrometry data. Global proteomic analysis uses computational matching of experimental mass spectra with predicted spectra based on databases of gene models that are often generated computationally. Thus, the quality of gene models predicted from a genome sequence is crucial in the generation of high quality peptide identifications. Once peptides are identified they can be assigned to their parent protein. Proteins identified as expressed in a given experiment are most useful when compared to other expressed proteins in a larger biological context or biochemical pathway. In this chapter we will discuss the automatic

  5. Synthetic biology approaches in cancer immunotherapy, genetic network engineering, and genome editing.

    Science.gov (United States)

    Chakravarti, Deboki; Cho, Jang Hwan; Weinberg, Benjamin H; Wong, Nicole M; Wong, Wilson W

    2016-04-18

    Investigations into cells and their contents have provided evolving insight into the emergence of complex biological behaviors. Capitalizing on this knowledge, synthetic biology seeks to manipulate the cellular machinery towards novel purposes, extending discoveries from basic science to new applications. While these developments have demonstrated the potential of building with biological parts, the complexity of cells can pose numerous challenges. In this review, we will highlight the broad and vital role that the synthetic biology approach has played in applying fundamental biological discoveries in receptors, genetic circuits, and genome-editing systems towards translation in the fields of immunotherapy, biosensors, disease models and gene therapy. These examples are evidence of the strength of synthetic approaches, while also illustrating considerations that must be addressed when developing systems around living cells.

  6. Lactobacillus paracasei comparative genomics: towards species pan-genome definition and exploitation of diversity.

    Directory of Open Access Journals (Sweden)

    Tamara Smokvina

    Full Text Available Lactobacillus paracasei is a member of the normal human and animal gut microbiota and is used extensively in the food industry in starter cultures for dairy products or as probiotics. With the development of low-cost, high-throughput sequencing techniques it has become feasible to sequence many different strains of one species and to determine its "pan-genome". We have sequenced the genomes of 34 different L. paracasei strains, and performed a comparative genomics analysis. We analysed genome synteny and content, focussing on the pan-genome, core genome and variable genome. Each genome was shown to contain around 2800-3100 protein-coding genes, and comparative analysis identified over 4200 ortholog groups that comprise the pan-genome of this species, of which about 1800 ortholog groups make up the conserved core. Several factors previously associated with host-microbe interactions such as pili, cell-envelope proteinase, hydrolases p40 and p75 or the capacity to produce short branched-chain fatty acids (bkd operon are part of the L. paracasei core genome present in all analysed strains. The variome consists mainly of hypothetical proteins, phages, plasmids, transposon/conjugative elements, and known functions such as sugar metabolism, cell-surface proteins, transporters, CRISPR-associated proteins, and EPS biosynthesis proteins. An enormous variety and variability of sugar utilization gene cassettes were identified, with each strain harbouring between 25-53 cassettes, reflecting the high adaptability of L. paracasei to different niches. A phylogenomic tree was constructed based on total genome contents, and together with an analysis of horizontal gene transfer events we conclude that evolution of these L. paracasei strains is complex and not always related to niche adaptation. The results of this genome content comparison was used, together with high-throughput growth experiments on various carbohydrates, to perform gene-trait matching analysis

  7. Complete genome-wide screening and subtractive genomic approach revealed new virulence factors, potential drug targets against bio-war pathogen Brucella melitensis 16M

    Directory of Open Access Journals (Sweden)

    Pradeepkiran JA

    2015-03-01

    Full Text Available Jangampalli Adi Pradeepkiran,1* Sri Bhashyam Sainath,2,3* Konidala Kranthi Kumar,1 Matcha Bhaskar1 1Division of Animal Biotechnology, Department of Zoology, Sri Venkateswara University, Tirupati, India; 2CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Rua dos Bragas, Porto, Portugal, 3Department of Biotechnology, Vikrama Simhapuri University, Nellore, Andhra Pradesh, India *These authors contributed equally to this work Abstract: Brucella melitensis 16M is a Gram-negative coccobacillus that infects both animals and humans. It causes a disease known as brucellosis, which is characterized by acute febrile illness in humans and causes abortions in livestock. To prevent and control brucellosis, identification of putative drug targets is crucial. The present study aimed to identify drug targets in B. melitensis 16M by using a subtractive genomic approach. We used available database repositories (Database of Essential Genes, Kyoto Encyclopedia of Genes and Genomes Automatic Annotation Server, and Kyoto Encyclopedia of Genes and Genomes to identify putative genes that are nonhomologous to humans and essential for pathogen B. melitensis 16M. The results revealed that among 3 Mb genome size of pathogen, 53 putative characterized and 13 uncharacterized hypothetical genes were identified; further, from Basic Local Alignment Search Tool protein analysis, one hypothetical protein showed a close resemblance (50% to Silicibacter pomeroyi DUF1285 family protein (2RE3. A further homology model of the target was constructed using MODELLER 9.12 and optimized through variable target function method by molecular dynamics optimization with simulating annealing. The stereochemical quality of the restrained model was evaluated by PROCHECK, VERIFY-3D, ERRAT, and WHATIF servers. Furthermore, structure-based virtual screening was carried out against the predicted active site of the respective protein using the

  8. Approaching the Sequential and Three-Dimensional Organization of Genomes

    NARCIS (Netherlands)

    T.A. Knoch (Tobias)

    2006-01-01

    textabstractGenomes are one of the major foundations of life due to their role in information storage, process regulation and evolution. To achieve a deeper unterstanding of the human genome the three-dimensional organization of the human cell nucleus, the structural-, scaling- and dynamic

  9. A multi-objective constraint-based approach for modeling genome-scale microbial ecosystems.

    Science.gov (United States)

    Budinich, Marko; Bourdon, Jérémie; Larhlimi, Abdelhalim; Eveillard, Damien

    2017-01-01

    Interplay within microbial communities impacts ecosystems on several scales, and elucidation of the consequent effects is a difficult task in ecology. In particular, the integration of genome-scale data within quantitative models of microbial ecosystems remains elusive. This study advocates the use of constraint-based modeling to build predictive models from recent high-resolution -omics datasets. Following recent studies that have demonstrated the accuracy of constraint-based models (CBMs) for simulating single-strain metabolic networks, we sought to study microbial ecosystems as a combination of single-strain metabolic networks that exchange nutrients. This study presents two multi-objective extensions of CBMs for modeling communities: multi-objective flux balance analysis (MO-FBA) and multi-objective flux variability analysis (MO-FVA). Both methods were applied to a hot spring mat model ecosystem. As a result, multiple trade-offs between nutrients and growth rates, as well as thermodynamically favorable relative abundances at community level, were emphasized. We expect this approach to be used for integrating genomic information in microbial ecosystems. Following models will provide insights about behaviors (including diversity) that take place at the ecosystem scale.

  10. Genome-Wide Locations of Potential Epimutations Associated with Environmentally Induced Epigenetic Transgenerational Inheritance of Disease Using a Sequential Machine Learning Prediction Approach.

    Science.gov (United States)

    Haque, M Muksitul; Holder, Lawrence B; Skinner, Michael K

    2015-01-01

    Environmentally induced epigenetic transgenerational inheritance of disease and phenotypic variation involves germline transmitted epimutations. The primary epimutations identified involve altered differential DNA methylation regions (DMRs). Different environmental toxicants have been shown to promote exposure (i.e., toxicant) specific signatures of germline epimutations. Analysis of genomic features associated with these epimutations identified low-density CpG regions (machine learning computational approach to predict all potential epimutations in the genome. A number of previously identified sperm epimutations were used as training sets. A novel machine learning approach using a sequential combination of Active Learning and Imbalance Class Learner analysis was developed. The transgenerational sperm epimutation analysis identified approximately 50K individual sites with a 1 kb mean size and 3,233 regions that had a minimum of three adjacent sites with a mean size of 3.5 kb. A select number of the most relevant genomic features were identified with the low density CpG deserts being a critical genomic feature of the features selected. A similar independent analysis with transgenerational somatic cell epimutation training sets identified a smaller number of 1,503 regions of genome-wide predicted sites and differences in genomic feature contributions. The predicted genome-wide germline (sperm) epimutations were found to be distinct from the predicted somatic cell epimutations. Validation of the genome-wide germline predicted sites used two recently identified transgenerational sperm epimutation signature sets from the pesticides dichlorodiphenyltrichloroethane (DDT) and methoxychlor (MXC) exposure lineage F3 generation. Analysis of this positive validation data set showed a 100% prediction accuracy for all the DDT-MXC sperm epimutations. Observations further elucidate the genomic features associated with transgenerational germline epimutations and identify a genome

  11. Approaches to advancing quantitative human health risk assessment of environmental chemicals in the post-genomic era.

    Science.gov (United States)

    Chiu, Weihsueh A; Euling, Susan Y; Scott, Cheryl Siegel; Subramaniam, Ravi P

    2013-09-15

    The contribution of genomics and associated technologies to human health risk assessment for environmental chemicals has focused largely on elucidating mechanisms of toxicity, as discussed in other articles in this issue. However, there is interest in moving beyond hazard characterization to making more direct impacts on quantitative risk assessment (QRA)--i.e., the determination of toxicity values for setting exposure standards and cleanup values. We propose that the evolution of QRA of environmental chemicals in the post-genomic era will involve three, somewhat overlapping phases in which different types of approaches begin to mature. The initial focus (in Phase I) has been and continues to be on "augmentation" of weight of evidence--using genomic and related technologies qualitatively to increase the confidence in and scientific basis of the results of QRA. Efforts aimed towards "integration" of these data with traditional animal-based approaches, in particular quantitative predictors, or surrogates, for the in vivo toxicity data to which they have been anchored are just beginning to be explored now (in Phase II). In parallel, there is a recognized need for "expansion" of the use of established biomarkers of susceptibility or risk of human diseases and disorders for QRA, particularly for addressing the issues of cumulative assessment and population risk. Ultimately (in Phase III), substantial further advances could be realized by the development of novel molecular and pathway-based biomarkers and statistical and in silico models that build on anticipated progress in understanding the pathways of human diseases and disorders. Such efforts would facilitate a gradual "reorientation" of QRA towards approaches that more directly link environmental exposures to human outcomes. Published by Elsevier Inc.

  12. Approaches to advancing quantitative human health risk assessment of environmental chemicals in the post-genomic era

    Energy Technology Data Exchange (ETDEWEB)

    Chiu, Weihsueh A., E-mail: chiu.weihsueh@epa.gov [National Center for Environmental Assessment, U.S. Environmental Protection Agency, Washington DC, 20460 (United States); Euling, Susan Y.; Scott, Cheryl Siegel; Subramaniam, Ravi P. [National Center for Environmental Assessment, U.S. Environmental Protection Agency, Washington DC, 20460 (United States)

    2013-09-15

    The contribution of genomics and associated technologies to human health risk assessment for environmental chemicals has focused largely on elucidating mechanisms of toxicity, as discussed in other articles in this issue. However, there is interest in moving beyond hazard characterization to making more direct impacts on quantitative risk assessment (QRA) — i.e., the determination of toxicity values for setting exposure standards and cleanup values. We propose that the evolution of QRA of environmental chemicals in the post-genomic era will involve three, somewhat overlapping phases in which different types of approaches begin to mature. The initial focus (in Phase I) has been and continues to be on “augmentation” of weight of evidence — using genomic and related technologies qualitatively to increase the confidence in and scientific basis of the results of QRA. Efforts aimed towards “integration” of these data with traditional animal-based approaches, in particular quantitative predictors, or surrogates, for the in vivo toxicity data to which they have been anchored are just beginning to be explored now (in Phase II). In parallel, there is a recognized need for “expansion” of the use of established biomarkers of susceptibility or risk of human diseases and disorders for QRA, particularly for addressing the issues of cumulative assessment and population risk. Ultimately (in Phase III), substantial further advances could be realized by the development of novel molecular and pathway-based biomarkers and statistical and in silico models that build on anticipated progress in understanding the pathways of human diseases and disorders. Such efforts would facilitate a gradual “reorientation” of QRA towards approaches that more directly link environmental exposures to human outcomes.

  13. Approaches to advancing quantitative human health risk assessment of environmental chemicals in the post-genomic era

    International Nuclear Information System (INIS)

    Chiu, Weihsueh A.; Euling, Susan Y.; Scott, Cheryl Siegel; Subramaniam, Ravi P.

    2013-01-01

    The contribution of genomics and associated technologies to human health risk assessment for environmental chemicals has focused largely on elucidating mechanisms of toxicity, as discussed in other articles in this issue. However, there is interest in moving beyond hazard characterization to making more direct impacts on quantitative risk assessment (QRA) — i.e., the determination of toxicity values for setting exposure standards and cleanup values. We propose that the evolution of QRA of environmental chemicals in the post-genomic era will involve three, somewhat overlapping phases in which different types of approaches begin to mature. The initial focus (in Phase I) has been and continues to be on “augmentation” of weight of evidence — using genomic and related technologies qualitatively to increase the confidence in and scientific basis of the results of QRA. Efforts aimed towards “integration” of these data with traditional animal-based approaches, in particular quantitative predictors, or surrogates, for the in vivo toxicity data to which they have been anchored are just beginning to be explored now (in Phase II). In parallel, there is a recognized need for “expansion” of the use of established biomarkers of susceptibility or risk of human diseases and disorders for QRA, particularly for addressing the issues of cumulative assessment and population risk. Ultimately (in Phase III), substantial further advances could be realized by the development of novel molecular and pathway-based biomarkers and statistical and in silico models that build on anticipated progress in understanding the pathways of human diseases and disorders. Such efforts would facilitate a gradual “reorientation” of QRA towards approaches that more directly link environmental exposures to human outcomes

  14. A mixed-integer linear programming approach to the reduction of genome-scale metabolic networks.

    Science.gov (United States)

    Röhl, Annika; Bockmayr, Alexander

    2017-01-03

    Constraint-based analysis has become a widely used method to study metabolic networks. While some of the associated algorithms can be applied to genome-scale network reconstructions with several thousands of reactions, others are limited to small or medium-sized models. In 2015, Erdrich et al. introduced a method called NetworkReducer, which reduces large metabolic networks to smaller subnetworks, while preserving a set of biological requirements that can be specified by the user. Already in 2001, Burgard et al. developed a mixed-integer linear programming (MILP) approach for computing minimal reaction sets under a given growth requirement. Here we present an MILP approach for computing minimum subnetworks with the given properties. The minimality (with respect to the number of active reactions) is not guaranteed by NetworkReducer, while the method by Burgard et al. does not allow specifying the different biological requirements. Our procedure is about 5-10 times faster than NetworkReducer and can enumerate all minimum subnetworks in case there exist several ones. This allows identifying common reactions that are present in all subnetworks, and reactions appearing in alternative pathways. Applying complex analysis methods to genome-scale metabolic networks is often not possible in practice. Thus it may become necessary to reduce the size of the network while keeping important functionalities. We propose a MILP solution to this problem. Compared to previous work, our approach is more efficient and allows computing not only one, but even all minimum subnetworks satisfying the required properties.

  15. Mining a database of single amplified genomes from Red Sea brine pool extremophiles—improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA)

    Science.gov (United States)

    Grötzinger, Stefan W.; Alam, Intikhab; Ba Alawi, Wail; Bajic, Vladimir B.; Stingl, Ulrich; Eppinger, Jörg

    2014-01-01

    Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile's genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available

  16. Mining a database of single amplified genomes from Red Sea brine pool extremophiles-improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA).

    KAUST Repository

    Grötzinger, Stefan W.

    2014-04-07

    Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile\\'s genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available

  17. Annotation-Based Whole Genomic Prediction and Selection

    DEFF Research Database (Denmark)

    Kadarmideen, Haja; Do, Duy Ngoc; Janss, Luc

    Genomic selection is widely used in both animal and plant species, however, it is performed with no input from known genomic or biological role of genetic variants and therefore is a black box approach in a genomic era. This study investigated the role of different genomic regions and detected QTLs...... in their contribution to estimated genomic variances and in prediction of genomic breeding values by applying SNP annotation approaches to feed efficiency. Ensembl Variant Predictor (EVP) and Pig QTL database were used as the source of genomic annotation for 60K chip. Genomic prediction was performed using the Bayes...... classes. Predictive accuracy was 0.531, 0.532, 0.302, and 0.344 for DFI, RFI, ADG and BF, respectively. The contribution per SNP to total genomic variance was similar among annotated classes across different traits. Predictive performance of SNP classes did not significantly differ from randomized SNP...

  18. Genome-wide association study of insect bite hypersensitivity in two horse populations in the Netherlands

    Directory of Open Access Journals (Sweden)

    Schurink Anouk

    2012-10-01

    Full Text Available Abstract Background Insect bite hypersensitivity is a common allergic disease in horse populations worldwide. Insect bite hypersensitivity is affected by both environmental and genetic factors. However, little is known about genes contributing to the genetic variance associated with insect bite hypersensitivity. Therefore, the aim of our study was to identify and quantify genomic associations with insect bite hypersensitivity in Shetland pony mares and Icelandic horses in the Netherlands. Methods Data on 200 Shetland pony mares and 146 Icelandic horses were collected according to a matched case–control design. Cases and controls were matched on various factors (e.g. region, sire to minimize effects of population stratification. Breed-specific genome-wide association studies were performed using 70 k single nucleotide polymorphisms genotypes. Bayesian variable selection method Bayes-C with a threshold model implemented in GenSel software was applied. A 1 Mb non-overlapping window approach that accumulated contributions of adjacent single nucleotide polymorphisms was used to identify associated genomic regions. Results The percentage of variance explained by all single nucleotide polymorphisms was 13% in Shetland pony mares and 28% in Icelandic horses. The 20 non-overlapping windows explaining the largest percentages of genetic variance were found on nine chromosomes in Shetland pony mares and on 14 chromosomes in Icelandic horses. Overlap in identified associated genomic regions between breeds would suggest interesting candidate regions to follow-up on. Such regions common to both breeds (within 15 Mb were found on chromosomes 3, 7, 11, 20 and 23. Positional candidate genes within 2 Mb from the associated windows were identified on chromosome 20 in both breeds. Candidate genes are within the equine lymphocyte antigen class II region, which evokes an immune response by recognizing many foreign molecules. Conclusions The genome-wide association

  19. Implementing genomics and pharmacogenomics in the clinic: The National Human Genome Research Institute's genomic medicine portfolio.

    Science.gov (United States)

    Manolio, Teri A

    2016-10-01

    Increasing knowledge about the influence of genetic variation on human health and growing availability of reliable, cost-effective genetic testing have spurred the implementation of genomic medicine in the clinic. As defined by the National Human Genome Research Institute (NHGRI), genomic medicine uses an individual's genetic information in his or her clinical care, and has begun to be applied effectively in areas such as cancer genomics, pharmacogenomics, and rare and undiagnosed diseases. In 2011 NHGRI published its strategic vision for the future of genomic research, including an ambitious research agenda to facilitate and promote the implementation of genomic medicine. To realize this agenda, NHGRI is consulting and facilitating collaborations with the external research community through a series of "Genomic Medicine Meetings," under the guidance and leadership of the National Advisory Council on Human Genome Research. These meetings have identified and begun to address significant obstacles to implementation, such as lack of evidence of efficacy, limited availability of genomics expertise and testing, lack of standards, and difficulties in integrating genomic results into electronic medical records. The six research and dissemination initiatives comprising NHGRI's genomic research portfolio are designed to speed the evaluation and incorporation, where appropriate, of genomic technologies and findings into routine clinical care. Actual adoption of successful approaches in clinical care will depend upon the willingness, interest, and energy of professional societies, practitioners, patients, and payers to promote their responsible use and share their experiences in doing so. Published by Elsevier Ireland Ltd.

  20. A Genomics Approach to Tumor Gemome Analysis

    National Research Council Canada - National Science Library

    Collins, Colin

    2002-01-01

    Genomes of solid tumors are often highly rearranged and these rearrangements promote cancer progression through disruption of genes mediating immortality, survival, metastasis, and resistance to therapy...

  1. A Biochemical Approach to Understanding the Fanconi Anemia Pathway-Regulated Nucleases in Genome Maintenance for Preventing Bone Marrow Failure and Cancer

    Science.gov (United States)

    2014-04-01

    the Fanconi Anemia Pathway- Regulated Nucleases in Genome Maintenance for Preventing Bone Marrow Failure and Cancer PRINCIPAL INVESTIGATOR...GRANT NUMBER 4. TITLE AND SUBTITLE A Biochemical Approach to Understanding the Fanconi Anemia Pathway-Regulated Nucleases in Genome Maintenance for...Unlimited 13. SUPPLEMENTARY NOTES 14. ABSTRACT Fanconi anemia is the most prevalent inherited BMF syndromes, caused by mutations in

  2. Fiber cavities with integrated mode matching optics.

    Science.gov (United States)

    Gulati, Gurpreet Kaur; Takahashi, Hiroki; Podoliak, Nina; Horak, Peter; Keller, Matthias

    2017-07-17

    In fiber based Fabry-Pérot Cavities (FFPCs), limited spatial mode matching between the cavity mode and input/output modes has been the main hindrance for many applications. We have demonstrated a versatile mode matching method for FFPCs. Our novel design employs an assembly of a graded-index and large core multimode fiber directly spliced to a single mode fiber. This all-fiber assembly transforms the propagating mode of the single mode fiber to match with the mode of a FFPC. As a result, we have measured a mode matching of 90% for a cavity length of ~400 μm. This is a significant improvement compared to conventional FFPCs coupled with just a single mode fiber, especially at long cavity lengths. Adjusting the parameters of the assembly, the fundamental cavity mode can be matched with the mode of almost any single mode fiber, making this approach highly versatile and integrable.

  3. Genomic approaches in aquaculture and fisheries

    DEFF Research Database (Denmark)

    Cancela, M. Leonor; Bargelloni, Luca; Boudry, Pierre

    2010-01-01

    . Improving state-of-the-art genomics research in various aquaculture systems, as well as its industrial applications, remains one of the major challenges in this area and should be the focus of well developed strategies to be implemented in the next generation of projects. This chapter will first provide...

  4. Evolution of the Largest Mammalian Genome.

    Science.gov (United States)

    Evans, Ben J; Upham, Nathan S; Golding, Goeffrey B; Ojeda, Ricardo A; Ojeda, Agustina A

    2017-06-01

    The genome of the red vizcacha rat (Rodentia, Octodontidae, Tympanoctomys barrerae) is the largest of all mammals, and about double the size of their close relative, the mountain vizcacha rat Octomys mimax, even though the lineages that gave rise to these species diverged from each other only about 5 Ma. The mechanism for this rapid genome expansion is controversial, and hypothesized to be a consequence of whole genome duplication or accumulation of repetitive elements. To test these alternative but nonexclusive hypotheses, we gathered and evaluated evidence from whole transcriptome and whole genome sequences of T. barrerae and O. mimax. We recovered support for genome expansion due to accumulation of a diverse assemblage of repetitive elements, which represent about one half and one fifth of the genomes of T. barrerae and O. mimax, respectively, but we found no strong signal of whole genome duplication. In both species, repetitive sequences were rare in transcribed regions as compared with the rest of the genome, and mostly had no close match to annotated repetitive sequences from other rodents. These findings raise new questions about the genomic dynamics of these repetitive elements, their connection to widespread chromosomal fissions that occurred in the T. barrerae ancestor, and their fitness effects-including during the evolution of hypersaline dietary tolerance in T. barrerae. ©The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  5. A Two-Stage Penalized Logistic Regression Approach to Case-Control Genome-Wide Association Studies

    Directory of Open Access Journals (Sweden)

    Jingyuan Zhao

    2012-01-01

    Full Text Available We propose a two-stage penalized logistic regression approach to case-control genome-wide association studies. This approach consists of a screening stage and a selection stage. In the screening stage, main-effect and interaction-effect features are screened by using L1-penalized logistic like-lihoods. In the selection stage, the retained features are ranked by the logistic likelihood with the smoothly clipped absolute deviation (SCAD penalty (Fan and Li, 2001 and Jeffrey’s Prior penalty (Firth, 1993, a sequence of nested candidate models are formed, and the models are assessed by a family of extended Bayesian information criteria (J. Chen and Z. Chen, 2008. The proposed approach is applied to the analysis of the prostate cancer data of the Cancer Genetic Markers of Susceptibility (CGEMS project in the National Cancer Institute, USA. Simulation studies are carried out to compare the approach with the pair-wise multiple testing approach (Marchini et al. 2005 and the LASSO-patternsearch algorithm (Shi et al. 2007.

  6. A manually annotated Actinidia chinensis var. chinensis (kiwifruit) genome highlights the challenges associated with draft genomes and gene prediction in plants.

    Science.gov (United States)

    Pilkington, Sarah M; Crowhurst, Ross; Hilario, Elena; Nardozza, Simona; Fraser, Lena; Peng, Yongyan; Gunaseelan, Kularajathevan; Simpson, Robert; Tahir, Jibran; Deroles, Simon C; Templeton, Kerry; Luo, Zhiwei; Davy, Marcus; Cheng, Canhong; McNeilage, Mark; Scaglione, Davide; Liu, Yifei; Zhang, Qiong; Datson, Paul; De Silva, Nihal; Gardiner, Susan E; Bassett, Heather; Chagné, David; McCallum, John; Dzierzon, Helge; Deng, Cecilia; Wang, Yen-Yi; Barron, Lorna; Manako, Kelvina; Bowen, Judith; Foster, Toshi M; Erridge, Zoe A; Tiffin, Heather; Waite, Chethi N; Davies, Kevin M; Grierson, Ella P; Laing, William A; Kirk, Rebecca; Chen, Xiuyin; Wood, Marion; Montefiori, Mirco; Brummell, David A; Schwinn, Kathy E; Catanach, Andrew; Fullerton, Christina; Li, Dawei; Meiyalaghan, Sathiyamoorthy; Nieuwenhuizen, Niels; Read, Nicola; Prakash, Roneel; Hunter, Don; Zhang, Huaibi; McKenzie, Marian; Knäbel, Mareike; Harris, Alastair; Allan, Andrew C; Gleave, Andrew; Chen, Angela; Janssen, Bart J; Plunkett, Blue; Ampomah-Dwamena, Charles; Voogd, Charlotte; Leif, Davin; Lafferty, Declan; Souleyre, Edwige J F; Varkonyi-Gasic, Erika; Gambi, Francesco; Hanley, Jenny; Yao, Jia-Long; Cheung, Joey; David, Karine M; Warren, Ben; Marsh, Ken; Snowden, Kimberley C; Lin-Wang, Kui; Brian, Lara; Martinez-Sanchez, Marcela; Wang, Mindy; Ileperuma, Nadeesha; Macnee, Nikolai; Campin, Robert; McAtee, Peter; Drummond, Revel S M; Espley, Richard V; Ireland, Hilary S; Wu, Rongmei; Atkinson, Ross G; Karunairetnam, Sakuntala; Bulley, Sean; Chunkath, Shayhan; Hanley, Zac; Storey, Roy; Thrimawithana, Amali H; Thomson, Susan; David, Charles; Testolin, Raffaele; Huang, Hongwen; Hellens, Roger P; Schaffer, Robert J

    2018-04-16

    Most published genome sequences are drafts, and most are dominated by computational gene prediction. Draft genomes typically incorporate considerable sequence data that are not assigned to chromosomes, and predicted genes without quality confidence measures. The current Actinidia chinensis (kiwifruit) 'Hongyang' draft genome has 164 Mb of sequences unassigned to pseudo-chromosomes, and omissions have been identified in the gene models. A second genome of an A. chinensis (genotype Red5) was fully sequenced. This new sequence resulted in a 554.0 Mb assembly with all but 6 Mb assigned to pseudo-chromosomes. Pseudo-chromosomal comparisons showed a considerable number of translocation events have occurred following a whole genome duplication (WGD) event some consistent with centromeric Robertsonian-like translocations. RNA sequencing data from 12 tissues and ab initio analysis informed a genome-wide manual annotation, using the WebApollo tool. In total, 33,044 gene loci represented by 33,123 isoforms were identified, named and tagged for quality of evidential support. Of these 3114 (9.4%) were identical to a protein within 'Hongyang' The Kiwifruit Information Resource (KIR v2). Some proportion of the differences will be varietal polymorphisms. However, as most computationally predicted Red5 models required manual re-annotation this proportion is expected to be small. The quality of the new gene models was tested by fully sequencing 550 cloned 'Hort16A' cDNAs and comparing with the predicted protein models for Red5 and both the original 'Hongyang' assembly and the revised annotation from KIR v2. Only 48.9% and 63.5% of the cDNAs had a match with 90% identity or better to the original and revised 'Hongyang' annotation, respectively, compared with 90.9% to the Red5 models. Our study highlights the need to take a cautious approach to draft genomes and computationally predicted genes. Our use of the manual annotation tool WebApollo facilitated manual checking and

  7. Multimodal Personal Verification Using Likelihood Ratio for the Match Score Fusion

    Directory of Open Access Journals (Sweden)

    Long Binh Tran

    2017-01-01

    Full Text Available In this paper, the authors present a novel personal verification system based on the likelihood ratio test for fusion of match scores from multiple biometric matchers (face, fingerprint, hand shape, and palm print. In the proposed system, multimodal features are extracted by Zernike Moment (ZM. After matching, the match scores from multiple biometric matchers are fused based on the likelihood ratio test. A finite Gaussian mixture model (GMM is used for estimating the genuine and impostor densities of match scores for personal verification. Our approach is also compared to some different famous approaches such as the support vector machine and the sum rule with min-max. The experimental results have confirmed that the proposed system can achieve excellent identification performance for its higher level in accuracy than different famous approaches and thus can be utilized for more application related to person verification.

  8. History Matching in Parallel Computational Environments

    Energy Technology Data Exchange (ETDEWEB)

    Steven Bryant; Sanjay Srinivasan; Alvaro Barrera; Sharad Yadav

    2005-10-01

    A novel methodology for delineating multiple reservoir domains for the purpose of history matching in a distributed computing environment has been proposed. A fully probabilistic approach to perturb permeability within the delineated zones is implemented. The combination of robust schemes for identifying reservoir zones and distributed computing significantly increase the accuracy and efficiency of the probabilistic approach. The information pertaining to the permeability variations in the reservoir that is contained in dynamic data is calibrated in terms of a deformation parameter rD. This information is merged with the prior geologic information in order to generate permeability models consistent with the observed dynamic data as well as the prior geology. The relationship between dynamic response data and reservoir attributes may vary in different regions of the reservoir due to spatial variations in reservoir attributes, well configuration, flow constrains etc. The probabilistic approach then has to account for multiple r{sub D} values in different regions of the reservoir. In order to delineate reservoir domains that can be characterized with different rD parameters, principal component analysis (PCA) of the Hessian matrix has been done. The Hessian matrix summarizes the sensitivity of the objective function at a given step of the history matching to model parameters. It also measures the interaction of the parameters in affecting the objective function. The basic premise of PC analysis is to isolate the most sensitive and least correlated regions. The eigenvectors obtained during the PCA are suitably scaled and appropriate grid block volume cut-offs are defined such that the resultant domains are neither too large (which increases interactions between domains) nor too small (implying ineffective history matching). The delineation of domains requires calculation of Hessian, which could be computationally costly and as well as restricts the current approach to

  9. Multi-image Matching of Airborne SAR Imagery by SANCC

    Directory of Open Access Journals (Sweden)

    DING Hao

    2015-03-01

    Full Text Available In order to improve accuracy of SAR matching, a multi-image matching method based on sum of adaptive normalized cross-correlation (SANCC is proposed. It utilizes geometrical and radiometric information of multi-baselinesynthetic aperture radar (SARimages effectively. Firstly, imaging parameters, platform parameters and approximate digital surface model (DSM are used to predict matching line. Secondly, similarity and proximity in Gestalt theory are introduced to SANCC, and SANCC measures of potential matching points along the matching line are calculated. Thirdly, multi-image matching results and object coordinates of matching points are obtained by winner-take-all (WTA optimization strategy. The approach has been demonstrated with airborne SAR images acquired by a Chinese airborne SAR system (CASMSAR system. The experimental results indicate that the proposed algorithm is effective for providing dense and accuracy matching points, reducing the number of mismatches caused by repeated textures, and offering a better solution to match in poor textured areas.

  10. Using machine learning to assess covariate balance in matching studies.

    Science.gov (United States)

    Linden, Ariel; Yarnold, Paul R

    2016-12-01

    In order to assess the effectiveness of matching approaches in observational studies, investigators typically present summary statistics for each observed pre-intervention covariate, with the objective of showing that matching reduces the difference in means (or proportions) between groups to as close to zero as possible. In this paper, we introduce a new approach to distinguish between study groups based on their distributions of the covariates using a machine-learning algorithm called optimal discriminant analysis (ODA). Assessing covariate balance using ODA as compared with the conventional method has several key advantages: the ability to ascertain how individuals self-select based on optimal (maximum-accuracy) cut-points on the covariates; the application to any variable metric and number of groups; its insensitivity to skewed data or outliers; and the use of accuracy measures that can be widely applied to all analyses. Moreover, ODA accepts analytic weights, thereby extending the assessment of covariate balance to any study design where weights are used for covariate adjustment. By comparing the two approaches using empirical data, we are able to demonstrate that using measures of classification accuracy as balance diagnostics produces highly consistent results to those obtained via the conventional approach (in our matched-pairs example, ODA revealed a weak statistically significant relationship not detected by the conventional approach). Thus, investigators should consider ODA as a robust complement, or perhaps alternative, to the conventional approach for assessing covariate balance in matching studies. © 2016 John Wiley & Sons, Ltd.

  11. Rethinking the Match: A Proposal for Modern Matchmaking.

    Science.gov (United States)

    Ray, Chris; Bishop, Steven E; Dow, Alan W

    2018-01-01

    Since the 1950s, the National Resident Matching Program, or "the Match," has governed the placement of medical students into residencies. The Match was created to protect students in an era when residency positions outnumbered applicants and hospitals pressured students early in their academic careers to commit to a residency position. Now, however, applicants outnumber positions, applicants are applying to increasing numbers of programs, and the costs of the Match for applicants and programs are high. Meanwhile, medical education is evolving toward a competency-based approach, a U.S. physician shortage is predicted, and some researchers describe a "July effect"-worse clinical outcomes correlated with the mass entry of new residents.Against this background, the authors argue for adopting a more modern, free-market approach to residency matchmaking that might better suit the needs of applicants, programs, and the public. They propose allowing students who have been identified by their medical schools as having achieved graduation-level competency to apply to residency programs at any point during the year. Residency programs would set their own application timetables and extend offers in an ongoing fashion. Students, counseled by their schools, would accept or decline offers as desired. The authors argue this approach would better support competency-based education while allowing applicants and programs more choice regarding how they engage and adapt within the selection process. The approach's staggered start times for new residents might attenuate the July effect and improve outcomes for patients. Medical students might also enter and thereby complete residency earlier, increasing the physician workforce.

  12. Genome Size Dynamics and Evolution in Monocots

    Directory of Open Access Journals (Sweden)

    Ilia J. Leitch

    2010-01-01

    Full Text Available Monocot genomic diversity includes striking variation at many levels. This paper compares various genomic characters (e.g., range of chromosome numbers and ploidy levels, occurrence of endopolyploidy, GC content, chromosome packaging and organization, genome size between monocots and the remaining angiosperms to discern just how distinctive monocot genomes are. One of the most notable features of monocots is their wide range and diversity of genome sizes, including the species with the largest genome so far reported in plants. This genomic character is analysed in greater detail, within a phylogenetic context. By surveying available genome size and chromosome data it is apparent that different monocot orders follow distinctive modes of genome size and chromosome evolution. Further insights into genome size-evolution and dynamics were obtained using statistical modelling approaches to reconstruct the ancestral genome size at key nodes across the monocot phylogenetic tree. Such approaches reveal that while the ancestral genome size of all monocots was small (1C=1.9 pg, there have been several major increases and decreases during monocot evolution. In addition, notable increases in the rates of genome size-evolution were found in Asparagales and Poales compared with other monocot lineages.

  13. Novel approach for deriving genome wide SNP analysis data from archived blood spots

    Science.gov (United States)

    2012-01-01

    Background The ability to transport and store DNA at room temperature in low volumes has the advantage of optimising cost, time and storage space. Blood spots on adapted filter papers are popular for this, with FTA (Flinders Technology Associates) Whatman™TM technology being one of the most recent. Plant material, plasmids, viral particles, bacteria and animal blood have been stored and transported successfully using this technology, however the method of porcine DNA extraction from FTA Whatman™TM cards is a relatively new approach, allowing nucleic acids to be ready for downstream applications such as PCR, whole genome amplification, sequencing and subsequent application to single nucleotide polymorphism microarrays has hitherto been under-explored. Findings DNA was extracted from FTA Whatman™TM cards (following adaptations of the manufacturer’s instructions), whole genome amplified and subsequently analysed to validate the integrity of the DNA for downstream SNP analysis. DNA was successfully extracted from 288/288 samples and amplified by WGA. Allele dropout post WGA, was observed in less than 2% of samples and there was no clear evidence of amplification bias nor contamination. Acceptable call rates on porcine SNP chips were also achieved using DNA extracted and amplified in this way. Conclusions DNA extracted from FTA Whatman cards is of a high enough quality and quantity following whole genomic amplification to perform meaningful SNP chip studies. PMID:22974252

  14. Genome-Wide Locations of Potential Epimutations Associated with Environmentally Induced Epigenetic Transgenerational Inheritance of Disease Using a Sequential Machine Learning Prediction Approach.

    Directory of Open Access Journals (Sweden)

    M Muksitul Haque

    Full Text Available Environmentally induced epigenetic transgenerational inheritance of disease and phenotypic variation involves germline transmitted epimutations. The primary epimutations identified involve altered differential DNA methylation regions (DMRs. Different environmental toxicants have been shown to promote exposure (i.e., toxicant specific signatures of germline epimutations. Analysis of genomic features associated with these epimutations identified low-density CpG regions (<3 CpG / 100bp termed CpG deserts and a number of unique DNA sequence motifs. The rat genome was annotated for these and additional relevant features. The objective of the current study was to use a machine learning computational approach to predict all potential epimutations in the genome. A number of previously identified sperm epimutations were used as training sets. A novel machine learning approach using a sequential combination of Active Learning and Imbalance Class Learner analysis was developed. The transgenerational sperm epimutation analysis identified approximately 50K individual sites with a 1 kb mean size and 3,233 regions that had a minimum of three adjacent sites with a mean size of 3.5 kb. A select number of the most relevant genomic features were identified with the low density CpG deserts being a critical genomic feature of the features selected. A similar independent analysis with transgenerational somatic cell epimutation training sets identified a smaller number of 1,503 regions of genome-wide predicted sites and differences in genomic feature contributions. The predicted genome-wide germline (sperm epimutations were found to be distinct from the predicted somatic cell epimutations. Validation of the genome-wide germline predicted sites used two recently identified transgenerational sperm epimutation signature sets from the pesticides dichlorodiphenyltrichloroethane (DDT and methoxychlor (MXC exposure lineage F3 generation. Analysis of this positive validation

  15. Predicting Football Matches Results using Bayesian Networks for English Premier League (EPL)

    Science.gov (United States)

    Razali, Nazim; Mustapha, Aida; Yatim, Faiz Ahmad; Aziz, Ruhaya Ab

    2017-08-01

    The issues of modeling asscoiation football prediction model has become increasingly popular in the last few years and many different approaches of prediction models have been proposed with the point of evaluating the attributes that lead a football team to lose, draw or win the match. There are three types of approaches has been considered for predicting football matches results which include statistical approaches, machine learning approaches and Bayesian approaches. Lately, many studies regarding football prediction models has been produced using Bayesian approaches. This paper proposes a Bayesian Networks (BNs) to predict the results of football matches in term of home win (H), away win (A) and draw (D). The English Premier League (EPL) for three seasons of 2010-2011, 2011-2012 and 2012-2013 has been selected and reviewed. K-fold cross validation has been used for testing the accuracy of prediction model. The required information about the football data is sourced from a legitimate site at http://www.football-data.co.uk. BNs achieved predictive accuracy of 75.09% in average across three seasons. It is hoped that the results could be used as the benchmark output for future research in predicting football matches results.

  16. 2004 Structural, Function and Evolutionary Genomics

    Energy Technology Data Exchange (ETDEWEB)

    Douglas L. Brutlag Nancy Ryan Gray

    2005-03-23

    This Gordon conference will cover the areas of structural, functional and evolutionary genomics. It will take a systematic approach to genomics, examining the evolution of proteins, protein functional sites, protein-protein interactions, regulatory networks, and metabolic networks. Emphasis will be placed on what we can learn from comparative genomics and entire genomes and proteomes.

  17. Analysis of the Complete Mitochondrial Genome Sequence of the Diploid Cotton Gossypium raimondii by Comparative Genomics Approaches

    Directory of Open Access Journals (Sweden)

    Changwei Bi

    2016-01-01

    Full Text Available Cotton is one of the most important economic crops and the primary source of natural fiber and is an important protein source for animal feed. The complete nuclear and chloroplast (cp genome sequences of G. raimondii are already available but not mitochondria. Here, we assembled the complete mitochondrial (mt DNA sequence of G. raimondii into a circular genome of length of 676,078 bp and performed comparative analyses with other higher plants. The genome contains 39 protein-coding genes, 6 rRNA genes, and 25 tRNA genes. We also identified four larger repeats (63.9 kb, 10.6 kb, 9.1 kb, and 2.5 kb in this mt genome, which may be active in intramolecular recombination in the evolution of cotton. Strikingly, nearly all of the G. raimondii mt genome has been transferred to nucleus on Chr1, and the transfer event must be very recent. Phylogenetic analysis reveals that G. raimondii, as a member of Malvaceae, is much closer to another cotton (G. barbadense than other rosids, and the clade formed by two Gossypium species is sister to Brassicales. The G. raimondii mt genome may provide a crucial foundation for evolutionary analysis, molecular biology, and cytoplasmic male sterility in cotton and other higher plants.

  18. Efficient Topological Localization Using Global and Local Feature Matching

    Directory of Open Access Journals (Sweden)

    Junqiu Wang

    2013-03-01

    Full Text Available We present an efficient vision-based global topological localization approach in which different image features are used in a coarse-to-fine matching framework. Orientation Adjacency Coherence Histogram (OACH, a novel image feature, is proposed to improve the coarse localization. The coarse localization results are taken as inputs for the fine localization which is carried out by matching Harris-Laplace interest points characterized by the SIFT descriptor. The computation of OACHs and interest points is efficient due to the fact that these features are computed in an integrated process. The matching of local features is improved by using approximate nearest neighbor searching technique. We have implemented and tested the localization system in real environments. The experimental results demonstrate that our approach is efficient and reliable in both indoor and outdoor environments. This work has also been compared with previous works. The comparison results show that our approach has better performance with higher correct ratio and lower computational complexity.

  19. The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics.

    Directory of Open Access Journals (Sweden)

    Lincoln D Stein

    2003-11-01

    Full Text Available The soil nematodes Caenorhabditis briggsae and Caenorhabditis elegans diverged from a common ancestor roughly 100 million years ago and yet are almost indistinguishable by eye. They have the same chromosome number and genome sizes, and they occupy the same ecological niche. To explore the basis for this striking conservation of structure and function, we have sequenced the C. briggsae genome to a high-quality draft stage and compared it to the finished C. elegans sequence. We predict approximately 19,500 protein-coding genes in the C. briggsae genome, roughly the same as in C. elegans. Of these, 12,200 have clear C. elegans orthologs, a further 6,500 have one or more clearly detectable C. elegans homologs, and approximately 800 C. briggsae genes have no detectable matches in C. elegans. Almost all of the noncoding RNAs (ncRNAs known are shared between the two species. The two genomes exhibit extensive colinearity, and the rate of divergence appears to be higher in the chromosomal arms than in the centers. Operons, a distinctive feature of C. elegans, are highly conserved in C. briggsae, with the arrangement of genes being preserved in 96% of cases. The difference in size between the C. briggsae (estimated at approximately 104 Mbp and C. elegans (100.3 Mbp genomes is almost entirely due to repetitive sequence, which accounts for 22.4% of the C. briggsae genome in contrast to 16.5% of the C. elegans genome. Few, if any, repeat families are shared, suggesting that most were acquired after the two species diverged or are undergoing rapid evolution. Coclustering the C. elegans and C. briggsae proteins reveals 2,169 protein families of two or more members. Most of these are shared between the two species, but some appear to be expanding or contracting, and there seem to be as many as several hundred novel C. briggsae gene families. The C. briggsae draft sequence will greatly improve the annotation of the C. elegans genome. Based on similarity to C

  20. Herbarium genomics

    DEFF Research Database (Denmark)

    Bakker, Freek T.; Lei, Di; Yu, Jiaying

    2016-01-01

    Herbarium genomics is proving promising as next-generation sequencing approaches are well suited to deal with the usually fragmented nature of archival DNA. We show that routine assembly of partial plastome sequences from herbarium specimens is feasible, from total DNA extracts and with specimens...... up to 146 years old. We use genome skimming and an automated assembly pipeline, Iterative Organelle Genome Assembly, that assembles paired-end reads into a series of candidate assemblies, the best one of which is selected based on likelihood estimation. We used 93 specimens from 12 different...... correlation between plastome coverage and nuclear genome size (C value) in our samples, but the range of C values included is limited. Finally, we conclude that routine plastome sequencing from herbarium specimens is feasible and cost-effective (compared with Sanger sequencing or plastome...

  1. Implementing genomics and pharmacogenomics in the clinic: The National Human Genome Research Institute’s genomic medicine portfolio

    Science.gov (United States)

    Manolio, Teri A.

    2016-01-01

    Increasing knowledge about the influence of genetic variation on human health and growing availability of reliable, cost-effective genetic testing have spurred the implementation of genomic medicine in the clinic. As defined by the National Human Genome Research Institute (NHGRI), genomic medicine uses an individual’s genetic information in his or her clinical care, and has begun to be applied effectively in areas such as cancer genomics, pharmacogenomics, and rare and undiagnosed diseases. In 2011 NHGRI published its strategic vision for the future of genomic research, including an ambitious research agenda to facilitate and promote the implementation of genomic medicine. To realize this agenda, NHGRI is consulting and facilitating collaborations with the external research community through a series of “Genomic Medicine Meetings,” under the guidance and leadership of the National Advisory Council on Human Genome Research. These meetings have identified and begun to address significant obstacles to implementation, such as lack of evidence of efficacy, limited availability of genomics expertise and testing, lack of standards, and diffficulties in integrating genomic results into electronic medical records. The six research and dissemination initiatives comprising NHGRI’s genomic research portfolio are designed to speed the evaluation and incorporation, where appropriate, of genomic technologies and findings into routine clinical care. Actual adoption of successful approaches in clinical care will depend upon the willingness, interest, and energy of professional societies, practitioners, patients, and payers to promote their responsible use and share their experiences in doing so. PMID:27612677

  2. Secondhand smoke exposure and other correlates of susceptibility to smoking: a propensity score matching approach.

    Science.gov (United States)

    McIntire, Russell K; Nelson, Ashlyn A; Macy, Jonathan T; Seo, Dong-Chul; Kolbe, Lloyd J

    2015-09-01

    Secondhand smoke (SHS) exposure is responsible for numerous diseases of the lungs and other bodily systems among children. In addition to the adverse health effects of SHS exposure, studies show that children exposed to SHS are more likely to smoke in adolescence. Susceptibility to smoking is a measure used to identify adolescent never-smokers who are at risk for smoking. Limited research has been conducted on the influence of SHS on susceptibility to smoking. The purpose of this study was to determine a robust measure of the strength of correlation between SHS exposure and susceptibility to smoking among never-smoking U.S. adolescents. This study used data from the 2009 National Youth Tobacco Survey to identify predictors of susceptibility to smoking in the full (pre-match) sample of adolescents and a smaller (post-match) sample created by propensity score matching. Results showed a significant association between SHS exposure and susceptibility to smoking among never-smoking adolescents in the pre-match (OR=1.47) and post-match (OR=1.52) samples. The odds ratio increase after matching suggests that the strength of the relationship was underestimated in the pre-match sample. Other significant correlates of susceptibility to smoking identified include: gender, race/ethnicity, personal income, smoke-free home rules, number of smoking friends, perception of SHS harm, perceived benefits of smoking, and exposure to pro-tobacco media messages. The use of propensity score matching procedures reduced bias in the post-match sample, and provided a more robust estimate of the influence of SHS exposure on susceptibility to smoking, compared to the pre-match sample estimates. Copyright © 2015 Elsevier Ltd. All rights reserved.

  3. Genome technologies and personalized dental medicine.

    Science.gov (United States)

    Eng, G; Chen, A; Vess, T; Ginsburg, G S

    2012-04-01

    The addition of genomic information to our understanding of oral disease is driving important changes in oral health care. It is anticipated that genome-derived information will promote a deeper understanding of disease etiology and permit earlier diagnosis, allowing for preventative measures prior to disease onset rather than treatment that attempts to repair the diseased state. Advances in genome technologies have fueled expectations for this proactive healthcare approach. Application of genomic testing is expanding and has already begun to find its way into the practice of clinical dentistry. To take full advantage of the information and technologies currently available, it is vital that dental care providers, consumers, and policymakers be aware of genomic approaches to understanding of oral diseases and the application of genomic testing to disease diagnosis and treatment. Ethical, legal, clinical, and educational initiatives are also required to responsibly incorporate genomic information into the practice of dentistry. This article provides an overview of the application of genomic technologies to oral health care and introduces issues that require consideration if we are to realize the full potential of genomics to enable the practice of personalized dental medicine. © 2011 John Wiley & Sons A/S.

  4. Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens.

    Science.gov (United States)

    Staats, Martijn; Erkens, Roy H J; van de Vossenberg, Bart; Wieringa, Jan J; Kraaijeveld, Ken; Stielow, Benjamin; Geml, József; Richardson, James E; Bakker, Freek T

    2013-01-01

    Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22-82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.4-97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2-71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more horizontal

  5. Merging Marine Ecosystem Models and Genomics

    Science.gov (United States)

    Coles, V.; Hood, R. R.; Stukel, M. R.; Moran, M. A.; Paul, J. H.; Satinsky, B.; Zielinski, B.; Yager, P. L.

    2015-12-01

    oceanography. One of the grand challenges of oceanography is to develop model techniques to more effectively incorporate genomic information. As one approach, we developed an ecosystem model whose community is determined by randomly assigning functional genes to build each organism's "DNA". Microbes are assigned a size that sets their baseline environmental responses using allometric response cuves. These responses are modified by the costs and benefits conferred by each gene in an organism's genome. The microbes are embedded in a general circulation model where environmental conditions shape the emergent population. This model is used to explore whether organisms constructed from randomized combinations of metabolic capability alone can self-organize to create realistic oceanic biogeochemical gradients. Realistic community size spectra and chlorophyll-a concentrations emerge in the model. The model is run repeatedly with randomly-generated microbial communities and each time realistic gradients in community size spectra, chlorophyll-a, and forms of nitrogen develop. This supports the hypothesis that the metabolic potential of a community rather than the realized species composition is the primary factor setting vertical and horizontal environmental gradients. Vertical distributions of nitrogen and transcripts for genes involved in nitrification are broadly consistent with observations. Modeled gene and transcript abundance for nitrogen cycling and processing of land-derived organic material match observations along the extreme gradients in the Amazon River plume, and they help to explain the factors controlling observed variability.

  6. Comparative genomic and proteomic analysis of high grade glioma primary cultures and matched tumor in situ.

    LENUS (Irish Health Repository)

    Howley, R

    2012-10-15

    Developing targeted therapies for high grade gliomas (HGG), the most common primary brain tumor in adults, relies largely on glioma cultures. However, it is unclear if HGG tumorigenic signaling pathways are retained under in-vitro conditions. Using array comparative genomic hybridization and immunohistochemical profiling, we contrasted the epidermal and platelet-derived growth factor receptor (EGFR\\/PDGFR) in-vitro pathway status of twenty-six primary HGG cultures with the pathway status of their original HGG biopsies. Genomic gains or amplifications were lost during culturing while genomic losses were more likely to be retained. Loss of EGFR amplification was further verified immunohistochemically when EGFR over expression was decreased in the majority of cultures. Conversely, PDGFRα and PDGFRβ were more abundantly expressed in primary cultures than in the original tumor (p<0.05). Despite these genomic and proteomic differences, primary HGG cultures retained key aspects of dysregulated tumorigenic signaling. Both in-vivo and in-vitro the presence of EGFR resulted in downstream activation of P70s6K while reduced downstream activation was associated with the presence of PDGFR and the tumor suppressor, PTEN. The preserved pathway dysregulation make this glioma model suitable for further studies of glioma tumorigenesis, however individual culture related differences must be taken into consideration when testing responsiveness to chemotherapeutic agents.

  7. Genome-first approach diagnosed Cabezas syndrome via novel CUL4B mutation detection.

    Science.gov (United States)

    Okamoto, Nobuhiko; Watanabe, Miki; Naruto, Takuya; Matsuda, Keiko; Kohmoto, Tomohiro; Saito, Masako; Masuda, Kiyoshi; Imoto, Issei

    2017-01-01

    Cabezas syndrome is a syndromic form of X-linked intellectual disability primarily characterized by a short stature, hypogonadism and abnormal gait, with other variable features resulting from mutations in the CUL4B gene. Here, we report a clinically undiagnosed 5-year-old male with severe intellectual disability. A genome-first approach using targeted exome sequencing identified a novel nonsense mutation [NM_003588.3:c.2698G>T, p.(Glu900*)] in the last coding exon of CUL4B , thus diagnosing this patient with Cabezas syndrome.

  8. New Markov Model Approaches to Deciphering Microbial Genome Function and Evolution: Comparative Genomics of Laterally Transferred Genes

    Energy Technology Data Exchange (ETDEWEB)

    Borodovsky, M.

    2013-04-11

    Algorithmic methods for gene prediction have been developed and successfully applied to many different prokaryotic genome sequences. As the set of genes in a particular genome is not homogeneous with respect to DNA sequence composition features, the GeneMark.hmm program utilizes two Markov models representing distinct classes of protein coding genes denoted "typical" and "atypical". Atypical genes are those whose DNA features deviate significantly from those classified as typical and they represent approximately 10% of any given genome. In addition to the inherent interest of more accurately predicting genes, the atypical status of these genes may also reflect their separate evolutionary ancestry from other genes in that genome. We hypothesize that atypical genes are largely comprised of those genes that have been relatively recently acquired through lateral gene transfer (LGT). If so, what fraction of atypical genes are such bona fide LGTs? We have made atypical gene predictions for all fully completed prokaryotic genomes; we have been able to compare these results to other "surrogate" methods of LGT prediction.

  9. Self-Similar Spin Images for Point Cloud Matching

    Science.gov (United States)

    Pulido, Daniel

    based on the concept of self-similarity to aid in the scale and feature matching steps. An open problem in fusion is how best to extract features from two point clouds and then perform feature-based matching. The proposed approach for this matching step is the use of local self-similarity as an invariant measure to match features. In particular, the proposed approach is to combine the concept of local self-similarity with a well-known feature descriptor, Spin Images, and thereby define "Self-Similar Spin Images". This approach is then extended to the case of matching two points clouds in very different coordinate systems (e.g., a geo-referenced Lidar point cloud and stereo-image derived point cloud without geo-referencing). The use of Self-Similar Spin Images is again applied to address this problem by introducing a "Self-Similar Keyscale" that matches the spatial scales of two point clouds. Another open problem is how best to detect changes in content between two point clouds. A method is proposed to find changes between two point clouds by analyzing the order statistics of the nearest neighbors between the two clouds, and thereby define the "Nearest Neighbor Order Statistic" method. Note that the well-known Hausdorff distance is a special case as being just the maximum order statistic. Therefore, by studying the entire histogram of these nearest neighbors it is expected to yield a more robust method to detect points that are present in one cloud but not the other. This approach is applied at multiple resolutions. Therefore, changes detected at the coarsest level will yield large missing targets and at finer levels will yield smaller targets.

  10. Simultaneous gene finding in multiple genomes.

    Science.gov (United States)

    König, Stefanie; Romoth, Lars W; Gerischer, Lizzy; Stanke, Mario

    2016-11-15

    As the tree of life is populated with sequenced genomes ever more densely, the new challenge is the accurate and consistent annotation of entire clades of genomes. We address this problem with a new approach to comparative gene finding that takes a multiple genome alignment of closely related species and simultaneously predicts the location and structure of protein-coding genes in all input genomes, thereby exploiting negative selection and sequence conservation. The model prefers potential gene structures in the different genomes that are in agreement with each other, or-if not-where the exon gains and losses are plausible given the species tree. We formulate the multi-species gene finding problem as a binary labeling problem on a graph. The resulting optimization problem is NP hard, but can be efficiently approximated using a subgradient-based dual decomposition approach. The proposed method was tested on whole-genome alignments of 12 vertebrate and 12 Drosophila species. The accuracy was evaluated for human, mouse and Drosophila melanogaster and compared to competing methods. Results suggest that our method is well-suited for annotation of (a large number of) genomes of closely related species within a clade, in particular, when RNA-Seq data are available for many of the genomes. The transfer of existing annotations from one genome to another via the genome alignment is more accurate than previous approaches that are based on protein-spliced alignments, when the genomes are at close to medium distances. The method is implemented in C ++ as part of Augustus and available open source at http://bioinf.uni-greifswald.de/augustus/ CONTACT: stefaniekoenig@ymail.com or mario.stanke@uni-greifswald.deSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  11. PopGenome: An Efficient Swiss Army Knife for Population Genomic Analyses in R

    OpenAIRE

    Pfeifer, Bastian; Wittelsbürger, Ulrich; Ramos-Onsins, Sebastian E.; Lercher, Martin J.

    2014-01-01

    Although many computer programs can perform population genetics calculations, they are typically limited in the analyses and data input formats they offer; few applications can process the large data sets produced by whole-genome resequencing projects. Furthermore, there is no coherent framework for the easy integration of new statistics into existing pipelines, hindering the development and application of new population genetics and genomics approaches. Here, we present PopGenome, a populati...

  12. Use of comparative genomics approaches to characterize interspecies differences in response to environmental chemicals: Challenges, opportunities, and research needs

    International Nuclear Information System (INIS)

    Burgess-Herbert, Sarah L.; Euling, Susan Y.

    2013-01-01

    A critical challenge for environmental chemical risk assessment is the characterization and reduction of uncertainties introduced when extrapolating inferences from one species to another. The purpose of this article is to explore the challenges, opportunities, and research needs surrounding the issue of how genomics data and computational and systems level approaches can be applied to inform differences in response to environmental chemical exposure across species. We propose that the data, tools, and evolutionary framework of comparative genomics be adapted to inform interspecies differences in chemical mechanisms of action. We compare and contrast existing approaches, from disciplines as varied as evolutionary biology, systems biology, mathematics, and computer science, that can be used, modified, and combined in new ways to discover and characterize interspecies differences in chemical mechanism of action which, in turn, can be explored for application to risk assessment. We consider how genetic, protein, pathway, and network information can be interrogated from an evolutionary biology perspective to effectively characterize variations in biological processes of toxicological relevance among organisms. We conclude that comparative genomics approaches show promise for characterizing interspecies differences in mechanisms of action, and further, for improving our understanding of the uncertainties inherent in extrapolating inferences across species in both ecological and human health risk assessment. To achieve long-term relevance and consistent use in environmental chemical risk assessment, improved bioinformatics tools, computational methods robust to data gaps, and quantitative approaches for conducting extrapolations across species are critically needed. Specific areas ripe for research to address these needs are recommended

  13. A multi-objective constraint-based approach for modeling genome-scale microbial ecosystems.

    Directory of Open Access Journals (Sweden)

    Marko Budinich

    Full Text Available Interplay within microbial communities impacts ecosystems on several scales, and elucidation of the consequent effects is a difficult task in ecology. In particular, the integration of genome-scale data within quantitative models of microbial ecosystems remains elusive. This study advocates the use of constraint-based modeling to build predictive models from recent high-resolution -omics datasets. Following recent studies that have demonstrated the accuracy of constraint-based models (CBMs for simulating single-strain metabolic networks, we sought to study microbial ecosystems as a combination of single-strain metabolic networks that exchange nutrients. This study presents two multi-objective extensions of CBMs for modeling communities: multi-objective flux balance analysis (MO-FBA and multi-objective flux variability analysis (MO-FVA. Both methods were applied to a hot spring mat model ecosystem. As a result, multiple trade-offs between nutrients and growth rates, as well as thermodynamically favorable relative abundances at community level, were emphasized. We expect this approach to be used for integrating genomic information in microbial ecosystems. Following models will provide insights about behaviors (including diversity that take place at the ecosystem scale.

  14. Integrated analysis of whole genome and transcriptome sequencing reveals diverse transcriptomic aberrations driven by somatic genomic changes in liver cancers.

    Directory of Open Access Journals (Sweden)

    Yuichi Shiraishi

    Full Text Available Recent studies applying high-throughput sequencing technologies have identified several recurrently mutated genes and pathways in multiple cancer genomes. However, transcriptional consequences from these genomic alterations in cancer genome remain unclear. In this study, we performed integrated and comparative analyses of whole genomes and transcriptomes of 22 hepatitis B virus (HBV-related hepatocellular carcinomas (HCCs and their matched controls. Comparison of whole genome sequence (WGS and RNA-Seq revealed much evidence that various types of genomic mutations triggered diverse transcriptional changes. Not only splice-site mutations, but also silent mutations in coding regions, deep intronic mutations and structural changes caused splicing aberrations. HBV integrations generated diverse patterns of virus-human fusion transcripts depending on affected gene, such as TERT, CDK15, FN1 and MLL4. Structural variations could drive over-expression of genes such as WNT ligands, with/without creating gene fusions. Furthermore, by taking account of genomic mutations causing transcriptional aberrations, we could improve the sensitivity of deleterious mutation detection in known cancer driver genes (TP53, AXIN1, ARID2, RPS6KA3, and identified recurrent disruptions in putative cancer driver genes such as HNF4A, CPS1, TSC1 and THRAP3 in HCCs. These findings indicate genomic alterations in cancer genome have diverse transcriptomic effects, and integrated analysis of WGS and RNA-Seq can facilitate the interpretation of a large number of genomic alterations detected in cancer genome.

  15. Genome engineering for microbial natural product discovery.

    Science.gov (United States)

    Choi, Si-Sun; Katsuyama, Yohei; Bai, Linquan; Deng, Zixin; Ohnishi, Yasuo; Kim, Eung-Soo

    2018-03-03

    The discovery and development of microbial natural products (MNPs) have played pivotal roles in the fields of human medicine and its related biotechnology sectors over the past several decades. The post-genomic era has witnessed the development of microbial genome mining approaches to isolate previously unsuspected MNP biosynthetic gene clusters (BGCs) hidden in the genome, followed by various BGC awakening techniques to visualize compound production. Additional microbial genome engineering techniques have allowed higher MNP production titers, which could complement a traditional culture-based MNP chasing approach. Here, we describe recent developments in the MNP research paradigm, including microbial genome mining, NP BGC activation, and NP overproducing cell factory design. Copyright © 2018 Elsevier Ltd. All rights reserved.

  16. The genome portal of the Department of Energy Joint Genome Institute: 2014 updates

    Energy Technology Data Exchange (ETDEWEB)

    Nordberg, Henrik [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Cantor, Michael [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Dusheyko, Serge [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Hua, Susan [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Poliakov, Alexander [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Shabalov, Igor [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Smirnova, Tatyana [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Grigoriev, Igor V. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Dubchak, Inna [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)

    2013-11-12

    The U.S. Department of Energy (DOE) Joint Genome Institute (JGI), a national user facility, serves the diverse scientific community by providing integrated high-throughput sequencing and computational analysis to enable system-based scientific approaches in support of DOE missions related to clean energy generation and environmental characterization. The JGI Genome Portal (http://genome.jgi.doe.gov) provides unified access to all JGI genomic databases and analytical tools. The JGI maintains extensive data management systems and specialized analytical capabilities to manage and interpret complex genomic data. A user can search, download and explore multiple data sets available for all DOE JGI sequencing projects including their status, assemblies and annotations of sequenced genomes. In this paper, we describe major updates of the Genome Portal in the past 2 years with a specific emphasis on efficient handling of the rapidly growing amount of diverse genomic data accumulated in JGI.

  17. Exceptionally diverse morphotypes and genomes of crenarchaeal hyperthermophilic viruses

    DEFF Research Database (Denmark)

    Prangishvili, D; Garrett, R A

    2004-01-01

    and Rudiviridae. They all have double-stranded DNA genomes and infect hyperthermophilic crenarchaea of the orders Sulfolobales and Thermoproteales. Representatives of the different viral families share a few homologous ORFs (open reading frames). However, about 90% of all ORFs in the seven sequenced genomes show...... no significant matches to sequences in public databases. This suggests that these hyperthermophilic viruses have exceptional biochemical solutions for biological functions. Specific features of genome organization, as well as strategies for DNA replication, suggest that phylogenetic relationships exist between...... crenarchaeal rudiviruses and the large eukaryal DNA viruses: poxviruses, the African swine fever virus and Chlorella viruses. Sequence patterns at the ends of the linear genome of the lipothrixvirus AFV1 are reminiscent of the telomeric ends of linear eukaryal chromosomes and suggest that a primitive telomeric...

  18. Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly

    DEFF Research Database (Denmark)

    Li, Yingrui; Zheng, Hancheng; Luo, Ruibang

    2011-01-01

    Here we use whole-genome de novo assembly of second-generation sequencing reads to map structural variation (SV) in an Asian genome and an African genome. Our approach identifies small- and intermediate-size homozygous variants (1-50 kb) including insertions, deletions, inversions and their precise...

  19. Towards 3D Face Recognition in the Real: A Registration-Free Approach Using Fine-Grained Matching of 3D Keypoint Descriptors

    KAUST Repository

    Li, Huibin

    2014-11-12

    Registration algorithms performed on point clouds or range images of face scans have been successfully used for automatic 3D face recognition under expression variations, but have rarely been investigated to solve pose changes and occlusions mainly since that the basic landmarks to initialize coarse alignment are not always available. Recently, local feature-based SIFT-like matching proves competent to handle all such variations without registration. In this paper, towards 3D face recognition for real-life biometric applications, we significantly extend the SIFT-like matching framework to mesh data and propose a novel approach using fine-grained matching of 3D keypoint descriptors. First, two principal curvature-based 3D keypoint detectors are provided, which can repeatedly identify complementary locations on a face scan where local curvatures are high. Then, a robust 3D local coordinate system is built at each keypoint, which allows extraction of pose-invariant features. Three keypoint descriptors, corresponding to three surface differential quantities, are designed, and their feature-level fusion is employed to comprehensively describe local shapes of detected keypoints. Finally, we propose a multi-task sparse representation based fine-grained matching algorithm, which accounts for the average reconstruction error of probe face descriptors sparsely represented by a large dictionary of gallery descriptors in identification. Our approach is evaluated on the Bosphorus database and achieves rank-one recognition rates of 96.56, 98.82, 91.14, and 99.21 % on the entire database, and the expression, pose, and occlusion subsets, respectively. To the best of our knowledge, these are the best results reported so far on this database. Additionally, good generalization ability is also exhibited by the experiments on the FRGC v2.0 database.

  20. FEATURE MATCHING OF HISTORICAL IMAGES BASED ON GEOMETRY OF QUADRILATERALS

    Directory of Open Access Journals (Sweden)

    F. Maiwald

    2018-05-01

    Full Text Available This contribution shows an approach to match historical images from the photo library of the Saxon State and University Library Dresden (SLUB in the context of a historical three-dimensional city model of Dresden. In comparison to recent images, historical photography provides diverse factors which make an automatical image analysis (feature detection, feature matching and relative orientation of images difficult. Due to e.g. film grain, dust particles or the digitalization process, historical images are often covered by noise interfering with the image signal needed for a robust feature matching. The presented approach uses quadrilaterals in image space as these are commonly available in man-made structures and façade images (windows, stones, claddings. It is explained how to generally detect quadrilaterals in images. Consequently, the properties of the quadrilaterals as well as the relationship to neighbouring quadrilaterals are used for the description and matching of feature points. The results show that most of the matches are robust and correct but still small in numbers.

  1. Automated design of genomic Southern blot probes

    Directory of Open Access Journals (Sweden)

    Komiyama Noboru H

    2010-01-01

    Full Text Available Abstract Background Sothern blotting is a DNA analysis technique that has found widespread application in molecular biology. It has been used for gene discovery and mapping and has diagnostic and forensic applications, including mutation detection in patient samples and DNA fingerprinting in criminal investigations. Southern blotting has been employed as the definitive method for detecting transgene integration, and successful homologous recombination in gene targeting experiments. The technique employs a labeled DNA probe to detect a specific DNA sequence in a complex DNA sample that has been separated by restriction-digest and gel electrophoresis. Critically for the technique to succeed the probe must be unique to the target locus so as not to cross-hybridize to other endogenous DNA within the sample. Investigators routinely employ a manual approach to probe design. A genome browser is used to extract DNA sequence from the locus of interest, which is searched against the target genome using a BLAST-like tool. Ideally a single perfect match is obtained to the target, with little cross-reactivity caused by homologous DNA sequence present in the genome and/or repetitive and low-complexity elements in the candidate probe. This is a labor intensive process often requiring several attempts to find a suitable probe for laboratory testing. Results We have written an informatic pipeline to automatically design genomic Sothern blot probes that specifically attempts to optimize the resultant probe, employing a brute-force strategy of generating many candidate probes of acceptable length in the user-specified design window, searching all against the target genome, then scoring and ranking the candidates by uniqueness and repetitive DNA element content. Using these in silico measures we can automatically design probes that we predict to perform as well, or better, than our previous manual designs, while considerably reducing design time. We went on to

  2. Genome-wide identification of the regulatory targets of a transcription factor using biochemical characterization and computational genomic analysis

    Directory of Open Access Journals (Sweden)

    Jolly Emmitt R

    2005-11-01

    Full Text Available Abstract Background A major challenge in computational genomics is the development of methodologies that allow accurate genome-wide prediction of the regulatory targets of a transcription factor. We present a method for target identification that combines experimental characterization of binding requirements with computational genomic analysis. Results Our method identified potential target genes of the transcription factor Ndt80, a key transcriptional regulator involved in yeast sporulation, using the combined information of binding affinity, positional distribution, and conservation of the binding sites across multiple species. We have also developed a mathematical approach to compute the false positive rate and the total number of targets in the genome based on the multiple selection criteria. Conclusion We have shown that combining biochemical characterization and computational genomic analysis leads to accurate identification of the genome-wide targets of a transcription factor. The method can be extended to other transcription factors and can complement other genomic approaches to transcriptional regulation.

  3. Finding function: evaluation methods for functional genomic data

    Directory of Open Access Journals (Sweden)

    Barrett Daniel R

    2006-07-01

    Full Text Available Abstract Background Accurate evaluation of the quality of genomic or proteomic data and computational methods is vital to our ability to use them for formulating novel biological hypotheses and directing further experiments. There is currently no standard approach to evaluation in functional genomics. Our analysis of existing approaches shows that they are inconsistent and contain substantial functional biases that render the resulting evaluations misleading both quantitatively and qualitatively. These problems make it essentially impossible to compare computational methods or large-scale experimental datasets and also result in conclusions that generalize poorly in most biological applications. Results We reveal issues with current evaluation methods here and suggest new approaches to evaluation that facilitate accurate and representative characterization of genomic methods and data. Specifically, we describe a functional genomics gold standard based on curation by expert biologists and demonstrate its use as an effective means of evaluation of genomic approaches. Our evaluation framework and gold standard are freely available to the community through our website. Conclusion Proper methods for evaluating genomic data and computational approaches will determine how much we, as a community, are able to learn from the wealth of available data. We propose one possible solution to this problem here but emphasize that this topic warrants broader community discussion.

  4. Practical Approaches for Detecting Selection in Microbial Genomes.

    Science.gov (United States)

    Hedge, Jessica; Wilson, Daniel J

    2016-02-01

    Microbial genome evolution is shaped by a variety of selective pressures. Understanding how these processes occur can help to address important problems in microbiology by explaining observed differences in phenotypes, including virulence and resistance to antibiotics. Greater access to whole-genome sequencing provides microbiologists with the opportunity to perform large-scale analyses of selection in novel settings, such as within individual hosts. This tutorial aims to guide researchers through the fundamentals underpinning popular methods for measuring selection in pathogens. These methods are transferable to a wide variety of organisms, and the exercises provided are designed for researchers with any level of programming experience.

  5. Genomics-assisted breeding in fruit trees.

    Science.gov (United States)

    Iwata, Hiroyoshi; Minamikawa, Mai F; Kajiya-Kanegae, Hiromi; Ishimori, Motoyuki; Hayashi, Takeshi

    2016-01-01

    Recent advancements in genomic analysis technologies have opened up new avenues to promote the efficiency of plant breeding. Novel genomics-based approaches for plant breeding and genetics research, such as genome-wide association studies (GWAS) and genomic selection (GS), are useful, especially in fruit tree breeding. The breeding of fruit trees is hindered by their long generation time, large plant size, long juvenile phase, and the necessity to wait for the physiological maturity of the plant to assess the marketable product (fruit). In this article, we describe the potential of genomics-assisted breeding, which uses these novel genomics-based approaches, to break through these barriers in conventional fruit tree breeding. We first introduce the molecular marker systems and whole-genome sequence data that are available for fruit tree breeding. Next we introduce the statistical methods for biparental linkage and quantitative trait locus (QTL) mapping as well as GWAS and GS. We then review QTL mapping, GWAS, and GS studies conducted on fruit trees. We also review novel technologies for rapid generation advancement. Finally, we note the future prospects of genomics-assisted fruit tree breeding and problems that need to be overcome in the breeding.

  6. An integrative approach to predicting the functional effects of small indels in non-coding regions of the human genome.

    Science.gov (United States)

    Ferlaino, Michael; Rogers, Mark F; Shihab, Hashem A; Mort, Matthew; Cooper, David N; Gaunt, Tom R; Campbell, Colin

    2017-10-06

    Small insertions and deletions (indels) have a significant influence in human disease and, in terms of frequency, they are second only to single nucleotide variants as pathogenic mutations. As the majority of mutations associated with complex traits are located outside the exome, it is crucial to investigate the potential pathogenic impact of indels in non-coding regions of the human genome. We present FATHMM-indel, an integrative approach to predict the functional effect, pathogenic or neutral, of indels in non-coding regions of the human genome. Our method exploits various genomic annotations in addition to sequence data. When validated on benchmark data, FATHMM-indel significantly outperforms CADD and GAVIN, state of the art models in assessing the pathogenic impact of non-coding variants. FATHMM-indel is available via a web server at indels.biocompute.org.uk. FATHMM-indel can accurately predict the functional impact and prioritise small indels throughout the whole non-coding genome.

  7. IMPROVED REAL-TIME SCAN MATCHING USING CORNER FEATURES

    Directory of Open Access Journals (Sweden)

    H. A. Mohamed

    2016-06-01

    Full Text Available The automation of unmanned vehicle operation has gained a lot of research attention, in the last few years, because of its numerous applications. The vehicle localization is more challenging in indoor environments where absolute positioning measurements (e.g. GPS are typically unavailable. Laser range finders are among the most widely used sensors that help the unmanned vehicles to localize themselves in indoor environments. Typically, automatic real-time matching of the successive scans is performed either explicitly or implicitly by any localization approach that utilizes laser range finders. Many accustomed approaches such as Iterative Closest Point (ICP, Iterative Matching Range Point (IMRP, Iterative Dual Correspondence (IDC, and Polar Scan Matching (PSM handles the scan matching problem in an iterative fashion which significantly affects the time consumption. Furthermore, the solution convergence is not guaranteed especially in cases of sharp maneuvers or fast movement. This paper proposes an automated real-time scan matching algorithm where the matching process is initialized using the detected corners. This initialization step aims to increase the convergence probability and to limit the number of iterations needed to reach convergence. The corner detection is preceded by line extraction from the laser scans. To evaluate the probability of line availability in indoor environments, various data sets, offered by different research groups, have been tested and the mean numbers of extracted lines per scan for these data sets are ranging from 4.10 to 8.86 lines of more than 7 points. The set of all intersections between extracted lines are detected as corners regardless of the physical intersection of these line segments in the scan. To account for the uncertainties of the detected corners, the covariance of the corners is estimated using the extracted lines variances. The detected corners are used to estimate the transformation parameters

  8. Revealing the biotechnological potential of Delftia sp. JD2 by a genomic approach

    Directory of Open Access Journals (Sweden)

    María A. Morel

    2016-04-01

    Full Text Available Delftia sp. JD2 is a chromium-resistant bacterium that reduces Cr(VI to Cr(III, accumulates Pb(II, produces the phytohormone indole-3-acetic acid and siderophores, and increases the plant growth performance of rhizobia in co-inoculation experiments. We aimed to analyze the biotechnological potential of JD2 using a genomic approach. JD2 has a genome of 6.76Mb, with 6,051 predicted protein coding sequences and 93 RNA genes (tRNA and rRNA. The indole-acetamide pathway was identified as responsible for the synthesis of indole-3-acetic acid. The genetic information involved in chromium resistance (the gene cluster, chrBACF, was found. At least 40 putative genes encoding for TonB-dependent receptors, probably involved in the utilization of siderophores and biopolymers, and genes for the synthesis, maturation, exportation and uptake of pyoverdine, and acquisition of Fe-pyochelin and Fe-enterobactin were also identified. The information also suggests that JD2 produce polyhydroxybutyrate, a carbon reserve polymer commonly used for manufacturing petrochemical free bioplastics. In addition, JD2 may degrade lignin-derived aromatic compounds to 2-pyrone-4,6-dicarboxylate, a molecule used in the bio-based polymer industry. Finally, a comparative genomic analysis of JD2, Delftia sp. Cs1-4 and Delftia acidovorans SPH-1 is also discussed. The present work provides insights into the physiology and genetics of a microorganism with many potential uses in biotechnology.

  9. Combining machine learning and matching techniques to improve causal inference in program evaluation.

    Science.gov (United States)

    Linden, Ariel; Yarnold, Paul R

    2016-12-01

    Program evaluations often utilize various matching approaches to emulate the randomization process for group assignment in experimental studies. Typically, the matching strategy is implemented, and then covariate balance is assessed before estimating treatment effects. This paper introduces a novel analytic framework utilizing a machine learning algorithm called optimal discriminant analysis (ODA) for assessing covariate balance and estimating treatment effects, once the matching strategy has been implemented. This framework holds several key advantages over the conventional approach: application to any variable metric and number of groups; insensitivity to skewed data or outliers; and use of accuracy measures applicable to all prognostic analyses. Moreover, ODA accepts analytic weights, thereby extending the methodology to any study design where weights are used for covariate adjustment or more precise (differential) outcome measurement. One-to-one matching on the propensity score was used as the matching strategy. Covariate balance was assessed using standardized difference in means (conventional approach) and measures of classification accuracy (ODA). Treatment effects were estimated using ordinary least squares regression and ODA. Using empirical data, ODA produced results highly consistent with those obtained via the conventional methodology for assessing covariate balance and estimating treatment effects. When ODA is combined with matching techniques within a treatment effects framework, the results are consistent with conventional approaches. However, given that it provides additional dimensions and robustness to the analysis versus what can currently be achieved using conventional approaches, ODA offers an appealing alternative. © 2016 John Wiley & Sons, Ltd.

  10. A review of the Match technique as applied to AASE-2/EASOE and SOLVE/THESEO 2000

    Directory of Open Access Journals (Sweden)

    G. A. Morris

    2005-01-01

    Full Text Available We apply the NASA Goddard Trajectory Model to data from a series of ozonesondes to derive ozone loss rates in the lower stratosphere for the AASE-2/EASOE mission (January-March 1992 and for the SOLVE/THESEO 2000 mission (January-March 2000 in an approach similar to Match. Ozone loss rates are computed by comparing the ozone concentrations provided by ozonesondes launched at the beginning and end of the trajectories connecting the launches. We investigate the sensitivity of the Match results to the various parameters used to reject potential matches in the original Match technique. While these filters effectively eliminate from consideration 80% of the matched sonde pairs and >99% of matched observations in our study, we conclude that only a filter based on potential vorticity changes along the calculated back trajectories seems warranted. Our study also demonstrates that the ozone loss rates estimated in Match can vary by up to a factor of two depending upon the precise trajectory paths calculated for each trajectory. As a result, the statistical uncertainties published with previous Match results might need to be augmented by an additional systematic error. The sensitivity to the trajectory path is particularly pronounced in the month of January, for which the largest ozone loss rate discrepancies between photochemical models and Match are found. For most of the two study periods, our ozone loss rates agree with those previously published. Notable exceptions are found for January 1992 at 475K and late February/early March 2000 at 450K, both periods during which we generally find smaller loss rates than the previous Match studies. Integrated ozone loss rates estimated by Match in both of those years compare well with those found in numerous other studies and in a potential vorticity/potential temperature approach shown previously and in this paper. Finally, we suggest an alternate approach to Match using trajectory mapping. This approach uses

  11. Toward integration of genomic selection with crop modelling: the development of an integrated approach to predicting rice heading dates.

    Science.gov (United States)

    Onogi, Akio; Watanabe, Maya; Mochizuki, Toshihiro; Hayashi, Takeshi; Nakagawa, Hiroshi; Hasegawa, Toshihiro; Iwata, Hiroyoshi

    2016-04-01

    It is suggested that accuracy in predicting plant phenotypes can be improved by integrating genomic prediction with crop modelling in a single hierarchical model. Accurate prediction of phenotypes is important for plant breeding and management. Although genomic prediction/selection aims to predict phenotypes on the basis of whole-genome marker information, it is often difficult to predict phenotypes of complex traits in diverse environments, because plant phenotypes are often influenced by genotype-environment interaction. A possible remedy is to integrate genomic prediction with crop/ecophysiological modelling, which enables us to predict plant phenotypes using environmental and management information. To this end, in the present study, we developed a novel method for integrating genomic prediction with phenological modelling of Asian rice (Oryza sativa, L.), allowing the heading date of untested genotypes in untested environments to be predicted. The method simultaneously infers the phenological model parameters and whole-genome marker effects on the parameters in a Bayesian framework. By cultivating backcross inbred lines of Koshihikari × Kasalath in nine environments, we evaluated the potential of the proposed method in comparison with conventional genomic prediction, phenological modelling, and two-step methods that applied genomic prediction to phenological model parameters inferred from Nelder-Mead or Markov chain Monte Carlo algorithms. In predicting heading dates of untested lines in untested environments, the proposed and two-step methods tended to provide more accurate predictions than the conventional genomic prediction methods, particularly in environments where phenotypes from environments similar to the target environment were unavailable for training genomic prediction. The proposed method showed greater accuracy in prediction than the two-step methods in all cross-validation schemes tested, suggesting the potential of the integrated approach in

  12. The other side of comparative genomics: genes with no orthologs between the cow and other mammalian species

    Directory of Open Access Journals (Sweden)

    Ajmone-Marsan Paolo

    2009-12-01

    Full Text Available Abstract Background With the rapid growth in the availability of genome sequence data, the automated identification of orthologous genes between species (orthologs is of fundamental importance to facilitate functional annotation and studies on comparative and evolutionary genomics. Genes with no apparent orthologs between the bovine and human genome may be responsible for major differences between the species, however, such genes are often neglected in functional genomics studies. Results A BLAST-based method was exploited to explore the current annotation and orthology predictions in Ensembl. Genes with no orthologs between the two genomes were classified into groups based on alignments, ontology, manual curation and publicly available information. Starting from a high quality and specific set of orthology predictions, as provided by Ensembl, hidden relationship between genes and genomes of different mammalian species were unveiled using a highly sensitive approach, based on sequence similarity and genomic comparison. Conclusions The analysis identified 3,801 bovine genes with no orthologs in human and 1010 human genes with no orthologs in cow, among which 411 and 43 genes, respectively, had no match at all in the other species. Most of the apparently non-orthologous genes may potentially have orthologs which were missed in the annotation process, despite having a high percentage of identity, because of differences in gene length and structure. The comparative analysis reported here identified gene variants, new genes and species-specific features and gave an overview of the other side of orthology which may help to improve the annotation of the bovine genome and the knowledge of structural differences between species.

  13. A Novel Real-Time Reference Key Frame Scan Matching Method

    Directory of Open Access Journals (Sweden)

    Haytham Mohamed

    2017-05-01

    Full Text Available Unmanned aerial vehicles represent an effective technology for indoor search and rescue operations. Typically, most indoor missions’ environments would be unknown, unstructured, and/or dynamic. Navigation of UAVs in such environments is addressed by simultaneous localization and mapping approach using either local or global approaches. Both approaches suffer from accumulated errors and high processing time due to the iterative nature of the scan matching method. Moreover, point-to-point scan matching is prone to outlier association processes. This paper proposes a low-cost novel method for 2D real-time scan matching based on a reference key frame (RKF. RKF is a hybrid scan matching technique comprised of feature-to-feature and point-to-point approaches. This algorithm aims at mitigating errors accumulation using the key frame technique, which is inspired from video streaming broadcast process. The algorithm depends on the iterative closest point algorithm during the lack of linear features which is typically exhibited in unstructured environments. The algorithm switches back to the RKF once linear features are detected. To validate and evaluate the algorithm, the mapping performance and time consumption are compared with various algorithms in static and dynamic environments. The performance of the algorithm exhibits promising navigational, mapping results and very short computational time, that indicates the potential use of the new algorithm with real-time systems.

  14. Unsupervised image matching based on manifold alignment.

    Science.gov (United States)

    Pei, Yuru; Huang, Fengchun; Shi, Fuhao; Zha, Hongbin

    2012-08-01

    This paper challenges the issue of automatic matching between two image sets with similar intrinsic structures and different appearances, especially when there is no prior correspondence. An unsupervised manifold alignment framework is proposed to establish correspondence between data sets by a mapping function in the mutual embedding space. We introduce a local similarity metric based on parameterized distance curves to represent the connection of one point with the rest of the manifold. A small set of valid feature pairs can be found without manual interactions by matching the distance curve of one manifold with the curve cluster of the other manifold. To avoid potential confusions in image matching, we propose an extended affine transformation to solve the nonrigid alignment in the embedding space. The comparatively tight alignments and the structure preservation can be obtained simultaneously. The point pairs with the minimum distance after alignment are viewed as the matchings. We apply manifold alignment to image set matching problems. The correspondence between image sets of different poses, illuminations, and identities can be established effectively by our approach.

  15. Whole-genome-based Mycobacterium tuberculosis surveillance: a standardized, portable, and expandable approach.

    Science.gov (United States)

    Kohl, Thomas A; Diel, Roland; Harmsen, Dag; Rothgänger, Jörg; Walter, Karen Meywald; Merker, Matthias; Weniger, Thomas; Niemann, Stefan

    2014-07-01

    Whole-genome sequencing (WGS) allows for effective tracing of Mycobacterium tuberculosis complex (MTBC) (tuberculosis pathogens) transmission. However, it is difficult to standardize and, therefore, is not yet employed for interlaboratory prospective surveillance. To allow its widespread application, solutions for data standardization and storage in an easily expandable database are urgently needed. To address this question, we developed a core genome multilocus sequence typing (cgMLST) scheme for clinical MTBC isolates using the Ridom SeqSphere(+) software, which transfers the genome-wide single nucleotide polymorphism (SNP) diversity into an allele numbering system that is standardized, portable, and not computationally intensive. To test its performance, we performed WGS analysis of 26 isolates with identical IS6110 DNA fingerprints and spoligotyping patterns from a longitudinal outbreak in the federal state of Hamburg, Germany (notified between 2001 and 2010). The cgMLST approach (3,041 genes) discriminated the 26 strains with a resolution comparable to that of SNP-based WGS typing (one major cluster of 22 identical or closely related and four outlier isolates with at least 97 distinct SNPs or 63 allelic variants). Resulting tree topologies are highly congruent and grouped the isolates in both cases analogously. Our data show that SNP- and cgMLST-based WGS analyses facilitate high-resolution discrimination of longitudinal MTBC outbreaks. cgMLST allows for a meaningful epidemiological interpretation of the WGS genotyping data. It enables standardized WGS genotyping for epidemiological investigations, e.g., on the regional public health office level, and the creation of web-accessible databases for global TB surveillance with an integrated early warning system. Copyright © 2014, American Society for Microbiology. All Rights Reserved.

  16. Probabilistic seismic history matching using binary images

    Science.gov (United States)

    Davolio, Alessandra; Schiozer, Denis Jose

    2018-02-01

    Currently, the goal of history-matching procedures is not only to provide a model matching any observed data but also to generate multiple matched models to properly handle uncertainties. One such approach is a probabilistic history-matching methodology based on the discrete Latin Hypercube sampling algorithm, proposed in previous works, which was particularly efficient for matching well data (production rates and pressure). 4D seismic (4DS) data have been increasingly included into history-matching procedures. A key issue in seismic history matching (SHM) is to transfer data into a common domain: impedance, amplitude or pressure, and saturation. In any case, seismic inversions and/or modeling are required, which can be time consuming. An alternative to avoid these procedures is using binary images in SHM as they allow the shape, rather than the physical values, of observed anomalies to be matched. This work presents the incorporation of binary images in SHM within the aforementioned probabilistic history matching. The application was performed with real data from a segment of the Norne benchmark case that presents strong 4D anomalies, including softening signals due to pressure build up. The binary images are used to match the pressurized zones observed in time-lapse data. Three history matchings were conducted using: only well data, well and 4DS data, and only 4DS. The methodology is very flexible and successfully utilized the addition of binary images for seismic objective functions. Results proved the good convergence of the method in few iterations for all three cases. The matched models of the first two cases provided the best results, with similar well matching quality. The second case provided models presenting pore pressure changes according to the expected dynamic behavior (pressurized zones) observed on 4DS data. The use of binary images in SHM is relatively new with few examples in the literature. This work enriches this discussion by presenting a new

  17. Genomic and epigenetic evidence for oxytocin receptor deficiency in autism

    Directory of Open Access Journals (Sweden)

    Worley Gordon

    2009-10-01

    Full Text Available Abstract Background Autism comprises a spectrum of behavioral and cognitive disturbances of childhood development and is known to be highly heritable. Although numerous approaches have been used to identify genes implicated in the development of autism, less than 10% of autism cases have been attributed to single gene disorders. Methods We describe the use of high-resolution genome-wide tilepath microarrays and comparative genomic hybridization to identify copy number variants within 119 probands from multiplex autism families. We next carried out DNA methylation analysis by bisulfite sequencing in a proband and his family, expanding this analysis to methylation analysis of peripheral blood and temporal cortex DNA of autism cases and matched controls from independent datasets. We also assessed oxytocin receptor (OXTR gene expression within the temporal cortex tissue by quantitative real-time polymerase chain reaction (PCR. Results Our analysis revealed a genomic deletion containing the oxytocin receptor gene, OXTR (MIM accession no.: 167055, previously implicated in autism, was present in an autism proband and his mother who exhibits symptoms of obsessive-compulsive disorder. The proband's affected sibling did not harbor this deletion but instead may exhibit epigenetic misregulation of this gene through aberrant gene silencing by DNA methylation. Further DNA methylation analysis of the CpG island known to regulate OXTR expression identified several CpG dinucleotides that show independent statistically significant increases in the DNA methylation status in the peripheral blood cells and temporal cortex in independent datasets of individuals with autism as compared to control samples. Associated with the increase in methylation of these CpG dinucleotides is our finding that OXTR mRNA showed decreased expression in the temporal cortex tissue of autism cases matched for age and sex compared to controls. Conclusion Together, these data provide

  18. Genome size analyses of Pucciniales reveal the largest fungal genomes.

    Science.gov (United States)

    Tavares, Sílvia; Ramos, Ana Paula; Pires, Ana Sofia; Azinheira, Helena G; Caldeirinha, Patrícia; Link, Tobias; Abranches, Rita; Silva, Maria do Céu; Voegele, Ralf T; Loureiro, João; Talhinhas, Pedro

    2014-01-01

    Rust fungi (Basidiomycota, Pucciniales) are biotrophic plant pathogens which exhibit diverse complexities in their life cycles and host ranges. The completion of genome sequencing of a few rust fungi has revealed the occurrence of large genomes. Sequencing efforts for other rust fungi have been hampered by uncertainty concerning their genome sizes. Flow cytometry was recently applied to estimate the genome size of a few rust fungi, and confirmed the occurrence of large genomes in this order (averaging 225.3 Mbp, while the average for Basidiomycota was 49.9 Mbp and was 37.7 Mbp for all fungi). In this work, we have used an innovative and simple approach to simultaneously isolate nuclei from the rust and its host plant in order to estimate the genome size of 30 rust species by flow cytometry. Genome sizes varied over 10-fold, from 70 to 893 Mbp, with an average genome size value of 380.2 Mbp. Compared to the genome sizes of over 1800 fungi, Gymnosporangium confusum possesses the largest fungal genome ever reported (893.2 Mbp). Moreover, even the smallest rust genome determined in this study is larger than the vast majority of fungal genomes (94%). The average genome size of the Pucciniales is now of 305.5 Mbp, while the average Basidiomycota genome size has shifted to 70.4 Mbp and the average for all fungi reached 44.2 Mbp. Despite the fact that no correlation could be drawn between the genome sizes, the phylogenomics or the life cycle of rust fungi, it is interesting to note that rusts with Fabaceae hosts present genomes clearly larger than those with Poaceae hosts. Although this study comprises only a small fraction of the more than 7000 rust species described, it seems already evident that the Pucciniales represent a group where genome size expansion could be a common characteristic. This is in sharp contrast to sister taxa, placing this order in a relevant position in fungal genomics research.

  19. Preimplantation genetic diagnosis with HLA matching.

    Science.gov (United States)

    Rechitsky, Svetlana; Kuliev, Anver; Tur-Kaspa, Illan; Morris, Randy; Verlinsky, Yury

    2004-08-01

    Preimplantation genetic diagnosis (PGD) has recently been offered in combination with HLA typing, which allowed a successful haematopoietic reconstitution in affected siblings with Fanconi anaemia by transplantation of stem cells obtained from the HLA-matched offspring resulting from PGD. This study presents the results of the first PGD practical experience performed in a group of couples at risk for producing children with genetic disorders. These parents also requested preimplantation HLA typing for treating the affected children in the family, who required HLA-matched stem cell transplantation. Using a standard IVF procedure, oocytes or embryos were tested for causative gene mutations simultaneously with HLA alleles, selecting and transferring only those unaffected embryos, which were HLA matched to the affected siblings. The procedure was performed for patients with children affected by Fanconi anaemia (FANC) A and C, different thalassaemia mutations, Wiscott-Aldrich syndrome, X-linked adrenoleukodystrophy, X-linked hyperimmunoglobulin M syndrome and X-linked hypohidrotic ectodermal displasia with immune deficiency. Overall, 46 PGD cycles were performed for 26 couples, resulting in selection and transfer of 50 unaffected HLA-matched embryos in 33 cycles, yielding six HLA-matched clinical pregnancies and the birth of five unaffected HLA-matched children. Despite the controversy of PGD use for HLA typing, the data demonstrate the usefulness of this approach for at-risk couples, not only to avoid the birth of affected children with an inherited disease, but also for having unaffected children who may also be potential HLA-matched donors of stem cells for treatment of affected siblings.

  20. Practical Approaches for Detecting Selection in Microbial Genomes.

    Directory of Open Access Journals (Sweden)

    Jessica Hedge

    2016-02-01

    Full Text Available Microbial genome evolution is shaped by a variety of selective pressures. Understanding how these processes occur can help to address important problems in microbiology by explaining observed differences in phenotypes, including virulence and resistance to antibiotics. Greater access to whole-genome sequencing provides microbiologists with the opportunity to perform large-scale analyses of selection in novel settings, such as within individual hosts. This tutorial aims to guide researchers through the fundamentals underpinning popular methods for measuring selection in pathogens. These methods are transferable to a wide variety of organisms, and the exercises provided are designed for researchers with any level of programming experience.

  1. Matching with transfer matrices

    International Nuclear Information System (INIS)

    Perez-Alvarez, R.; Velasco, V.R.; Garcia-Moliner, F.; Rodriguez-Coppola, H.

    1987-10-01

    An ABC configuration - which corresponds to various systems of physical interest, such as a barrier or a quantum well - is studied by combining a surface Green function matching analysis of the entire system with a description of the intermediate (B) region in terms of a transfer matrix in the sense of Mora et al. (1985). This hybrid approach proves very useful when it is very difficult to construct the corresponding Green function G B . An application is made to the calculation of quantised subband levels in a parabolic quantum well. Further possibilities of extension of this approach are pointed out. (author). 27 refs, 1 tab

  2. Implications of structural genomics target selection strategies: Pfam5000, whole genome, and random approaches

    Energy Technology Data Exchange (ETDEWEB)

    Chandonia, John-Marc; Brenner, Steven E.

    2004-07-14

    The structural genomics project is an international effort to determine the three-dimensional shapes of all important biological macromolecules, with a primary focus on proteins. Target proteins should be selected according to a strategy which is medically and biologically relevant, of good value, and tractable. As an option to consider, we present the Pfam5000 strategy, which involves selecting the 5000 most important families from the Pfam database as sources for targets. We compare the Pfam5000 strategy to several other proposed strategies that would require similar numbers of targets. These include including complete solution of several small to moderately sized bacterial proteomes, partial coverage of the human proteome, and random selection of approximately 5000 targets from sequenced genomes. We measure the impact that successful implementation of these strategies would have upon structural interpretation of the proteins in Swiss-Prot, TrEMBL, and 131 complete proteomes (including 10 of eukaryotes) from the Proteome Analysis database at EBI. Solving the structures of proteins from the 5000 largest Pfam families would allow accurate fold assignment for approximately 68 percent of all prokaryotic proteins (covering 59 percent of residues) and 61 percent of eukaryotic proteins (40 percent of residues). More fine-grained coverage which would allow accurate modeling of these proteins would require an order of magnitude more targets. The Pfam5000 strategy may be modified in several ways, for example to focus on larger families, bacterial sequences, or eukaryotic sequences; as long as secondary consideration is given to large families within Pfam, coverage results vary only slightly. In contrast, focusing structural genomics on a single tractable genome would have only a limited impact in structural knowledge of other proteomes: a significant fraction (about 30-40 percent of the proteins, and 40-60 percent of the residues) of each proteome is classified in small

  3. Effect of genomics-related literacy on non-communicable diseases.

    Science.gov (United States)

    Nakamura, Sho; Narimatsu, Hiroto; Katayama, Kayoko; Sho, Ri; Yoshioka, Takashi; Fukao, Akira; Kayama, Takamasa

    2017-09-01

    Recent progress in genomic research has raised expectations for the development of personalized preventive medicine, although genomics-related literacy of patients will be essential. Thus, enhancing genomics-related literacy is crucial, particularly for individuals with low genomics-related literacy because they might otherwise miss the opportunity to receive personalized preventive care. This should be especially emphasized when a lack of genomics-related literacy is associated with elevated disease risk, because patients could therefore be deprived of the added benefits of preventive interventions; however, whether such an association exists is unclear. Association between genomics-related literacy, calculated as the genomics literacy score (GLS), and the prevalence of non-communicable diseases was assessed using propensity score matching on 4646 participants (males: 1891; 40.7%). Notably, the low-GLS group (score below median) presented a higher risk of hypertension (relative risk (RR) 1.09, 95% confidence interval (CI) 1.03-1.16) and obesity (RR 1.11, 95% CI 1.01-1.22) than the high-GLS group. Our results suggest that a low level of genomics-related literacy could represent a risk factor for hypertension and obesity. Evaluating genomics-related literacy could be used to identify a more appropriate population for health and educational interventions.

  4. Adaptation of Lactococcus lactis to its environment : a genomics approach

    NARCIS (Netherlands)

    Zomer, Albertus Lambert

    2007-01-01

    This thesis describes a number of strategies of Lactococcus lactis to adapt to its ever-changing environment. Although the complete genome sequence of L. lactis subspecies lactis IL1403, became available when this research was started, the genome sequence of the lactic acid bacterial paradigm, L.

  5. So many genes, so little time: A practical approach to divergence-time estimation in the genomic era.

    Science.gov (United States)

    Smith, Stephen A; Brown, Joseph W; Walker, Joseph F

    2018-01-01

    Phylogenomic datasets have been successfully used to address questions involving evolutionary relationships, patterns of genome structure, signatures of selection, and gene and genome duplications. However, despite the recent explosion in genomic and transcriptomic data, the utility of these data sources for efficient divergence-time inference remains unexamined. Phylogenomic datasets pose two distinct problems for divergence-time estimation: (i) the volume of data makes inference of the entire dataset intractable, and (ii) the extent of underlying topological and rate heterogeneity across genes makes model mis-specification a real concern. "Gene shopping", wherein a phylogenomic dataset is winnowed to a set of genes with desirable properties, represents an alternative approach that holds promise in alleviating these issues. We implemented an approach for phylogenomic datasets (available in SortaDate) that filters genes by three criteria: (i) clock-likeness, (ii) reasonable tree length (i.e., discernible information content), and (iii) least topological conflict with a focal species tree (presumed to have already been inferred). Such a winnowing procedure ensures that errors associated with model (both clock and topology) mis-specification are minimized, therefore reducing error in divergence-time estimation. We demonstrated the efficacy of this approach through simulation and applied it to published animal (Aves, Diplopoda, and Hymenoptera) and plant (carnivorous Caryophyllales, broad Caryophyllales, and Vitales) phylogenomic datasets. By quantifying rate heterogeneity across both genes and lineages we found that every empirical dataset examined included genes with clock-like, or nearly clock-like, behavior. Moreover, many datasets had genes that were clock-like, exhibited reasonable evolutionary rates, and were mostly compatible with the species tree. We identified overlap in age estimates when analyzing these filtered genes under strict clock and uncorrelated

  6. One bacterial cell, one complete genome.

    Directory of Open Access Journals (Sweden)

    Tanja Woyke

    2010-04-01

    Full Text Available While the bulk of the finished microbial genomes sequenced to date are derived from cultured bacterial and archaeal representatives, the vast majority of microorganisms elude current culturing attempts, severely limiting the ability to recover complete or even partial genomes from these environmental species. Single cell genomics is a novel culture-independent approach, which enables access to the genetic material of an individual cell. No single cell genome has to our knowledge been closed and finished to date. Here we report the completed genome from an uncultured single cell of Candidatus Sulcia muelleri DMIN. Digital PCR on single symbiont cells isolated from the bacteriome of the green sharpshooter Draeculacephala minerva bacteriome allowed us to assess that this bacteria is polyploid with genome copies ranging from approximately 200-900 per cell, making it a most suitable target for single cell finishing efforts. For single cell shotgun sequencing, an individual Sulcia cell was isolated and whole genome amplified by multiple displacement amplification (MDA. Sanger-based finishing methods allowed us to close the genome. To verify the correctness of our single cell genome and exclude MDA-derived artifacts, we independently shotgun sequenced and assembled the Sulcia genome from pooled bacteriomes using a metagenomic approach, yielding a nearly identical genome. Four variations we detected appear to be genuine biological differences between the two samples. Comparison of the single cell genome with bacteriome metagenomic sequence data detected two single nucleotide polymorphisms (SNPs, indicating extremely low genetic diversity within a Sulcia population. This study demonstrates the power of single cell genomics to generate a complete, high quality, non-composite reference genome within an environmental sample, which can be used for population genetic analyzes.

  7. One Bacterial Cell, One Complete Genome

    Energy Technology Data Exchange (ETDEWEB)

    Woyke, Tanja; Tighe, Damon; Mavrommatis, Konstantinos; Clum, Alicia; Copeland, Alex; Schackwitz, Wendy; Lapidus, Alla; Wu, Dongying; McCutcheon, John P.; McDonald, Bradon R.; Moran, Nancy A.; Bristow, James; Cheng, Jan-Fang

    2010-04-26

    While the bulk of the finished microbial genomes sequenced to date are derived from cultured bacterial and archaeal representatives, the vast majority of microorganisms elude current culturing attempts, severely limiting the ability to recover complete or even partial genomes from these environmental species. Single cell genomics is a novel culture-independent approach, which enables access to the genetic material of an individual cell. No single cell genome has to our knowledge been closed and finished to date. Here we report the completed genome from an uncultured single cell of Candidatus Sulcia muelleri DMIN. Digital PCR on single symbiont cells isolated from the bacteriome of the green sharpshooter Draeculacephala minerva bacteriome allowed us to assess that this bacteria is polyploid with genome copies ranging from approximately 200?900 per cell, making it a most suitable target for single cell finishing efforts. For single cell shotgun sequencing, an individual Sulcia cell was isolated and whole genome amplified by multiple displacement amplification (MDA). Sanger-based finishing methods allowed us to close the genome. To verify the correctness of our single cell genome and exclude MDA-derived artifacts, we independently shotgun sequenced and assembled the Sulcia genome from pooled bacteriomes using a metagenomic approach, yielding a nearly identical genome. Four variations we detected appear to be genuine biological differences between the two samples. Comparison of the single cell genome with bacteriome metagenomic sequence data detected two single nucleotide polymorphisms (SNPs), indicating extremely low genetic diversity within a Sulcia population. This study demonstrates the power of single cell genomics to generate a complete, high quality, non-composite reference genome within an environmental sample, which can be used for population genetic analyzes.

  8. A critique of race-based and genomic medicine.

    Science.gov (United States)

    Meier, Robert J

    2012-03-01

    Now that a composite human genome has been sequenced (HGP), research has accelerated to discover precise genetic bases of several chronic health issues, particularly in the realms of cancer and cardiovascular disease. It is anticipated that in the future it will be possible and cost effective to regularly sequence individual genomes, and thereby produce a DNA profile that potentially can be used to assess the health risks for each person with respect to certain genetically predisposed conditions. Coupled with that enormous diagnostic power, it will then depend upon equally rapid research efforts to develop personalized courses of treatment, including that of pharmaceutical therapy. Initial treatment attempts have been made to match drug efficacy and safety to individuals of assigned or self-identified groups according to their genetic ancestry or presumed race. A prime example is that of BiDil, which was the first drug approved by the US FDA for the explicit treatment of heart patients of African American ancestry. This race-based approach to medicine has been met with justifiable criticism, notably on ethical grounds that have long plagued historical applications and misuses of human race classification, and also on questionable science. This paper will assess race-based medical research and practice in light of a more thorough understanding of human genetic variability. Additional concerns will be expressed with regard to the rapidly developing area of pharmacogenomics, promoted to be the future of personalized medicine. Genomic epidemiology will be discussed with several examples of on-going research that hopefully will provide a solid scientific grounding for personalized medicine to build upon.

  9. Reefgenomics.Org - a repository for marine genomics data

    KAUST Repository

    Liew, Yi Jin

    2016-11-01

    Over the last decade, technological advancements have substantially decreased the cost and time of obtaining large amounts of sequencing data. Paired with the exponentially increased computing power, individual labs are now able to sequence genomes or transcriptomes to investigate biological questions of interest. This has led to a significant increase in available sequence data. Although the bulk of data published in articles are stored in public sequence databases, very often, only raw sequencing data are available; miscellaneous data such as assembled transcriptomes, genome annotations etc. are not easily obtainable through the same means. Here, we introduce our website (http://reefgenomics.org) that aims to centralize genomic and transcriptomic data from marine organisms. Besides providing convenient means to download sequences, we provide (where applicable) a genome browser to explore available genomic features, and a BLAST interface to search through the hosted sequences. Through the interface, multiple datasets can be queried simultaneously, allowing for the retrieval of matching sequences from organisms of interest. The minimalistic, no-frills interface reduces visual clutter, making it convenient for end-users to search and explore processed sequence data.

  10. Reefgenomics.Org - a repository for marine genomics data

    KAUST Repository

    Liew, Yi Jin; Aranda, Manuel; Voolstra, Christian R.

    2016-01-01

    Over the last decade, technological advancements have substantially decreased the cost and time of obtaining large amounts of sequencing data. Paired with the exponentially increased computing power, individual labs are now able to sequence genomes or transcriptomes to investigate biological questions of interest. This has led to a significant increase in available sequence data. Although the bulk of data published in articles are stored in public sequence databases, very often, only raw sequencing data are available; miscellaneous data such as assembled transcriptomes, genome annotations etc. are not easily obtainable through the same means. Here, we introduce our website (http://reefgenomics.org) that aims to centralize genomic and transcriptomic data from marine organisms. Besides providing convenient means to download sequences, we provide (where applicable) a genome browser to explore available genomic features, and a BLAST interface to search through the hosted sequences. Through the interface, multiple datasets can be queried simultaneously, allowing for the retrieval of matching sequences from organisms of interest. The minimalistic, no-frills interface reduces visual clutter, making it convenient for end-users to search and explore processed sequence data.

  11. Anatomy Ontology Matching Using Markov Logic Networks

    Directory of Open Access Journals (Sweden)

    Chunhua Li

    2016-01-01

    Full Text Available The anatomy of model species is described in ontologies, which are used to standardize the annotations of experimental data, such as gene expression patterns. To compare such data between species, we need to establish relationships between ontologies describing different species. Ontology matching is a kind of solutions to find semantic correspondences between entities of different ontologies. Markov logic networks which unify probabilistic graphical model and first-order logic provide an excellent framework for ontology matching. We combine several different matching strategies through first-order logic formulas according to the structure of anatomy ontologies. Experiments on the adult mouse anatomy and the human anatomy have demonstrated the effectiveness of proposed approach in terms of the quality of result alignment.

  12. Bioinformatics of genomic association mapping

    NARCIS (Netherlands)

    Vaez Barzani, Ahmad

    2015-01-01

    In this thesis we present an overview of bioinformatics-based approaches for genomic association mapping, with emphasis on human quantitative traits and their contribution to complex diseases. We aim to provide a comprehensive walk-through of the classic steps of genomic association mapping

  13. A multi-pattern hash-binary hybrid algorithm for URL matching in the HTTP protocol.

    Directory of Open Access Journals (Sweden)

    Ping Zeng

    Full Text Available In this paper, based on our previous multi-pattern uniform resource locator (URL binary-matching algorithm called HEM, we propose an improved multi-pattern matching algorithm called MH that is based on hash tables and binary tables. The MH algorithm can be applied to the fields of network security, data analysis, load balancing, cloud robotic communications, and so on-all of which require string matching from a fixed starting position. Our approach effectively solves the performance problems of the classical multi-pattern matching algorithms. This paper explores ways to improve string matching performance under the HTTP protocol by using a hash method combined with a binary method that transforms the symbol-space matching problem into a digital-space numerical-size comparison and hashing problem. The MH approach has a fast matching speed, requires little memory, performs better than both the classical algorithms and HEM for matching fields in an HTTP stream, and it has great promise for use in real-world applications.

  14. Robust and accurate multi-view reconstruction by prioritized matching

    DEFF Research Database (Denmark)

    Ylimaki, Markus; Kannala, Juho; Holappa, Jukka

    2012-01-01

    a prioritized matching method which expands the most promising seeds first. The output of the method is a three-dimensional point cloud. Unlike previous correspondence growing approaches our method allows to use the best-first matching principle in the generic multi-view stereo setting with arbitrary number...... of input images. Our experiments show that matching the most promising seeds first provides very robust point cloud reconstructions efficiently with just a single expansion step. A comparison to the current state-of-the-art shows that our method produces reconstructions of similar quality but significantly...

  15. Defining functional DNA elements in the human genome

    Science.gov (United States)

    Kellis, Manolis; Wold, Barbara; Snyder, Michael P.; Bernstein, Bradley E.; Kundaje, Anshul; Marinov, Georgi K.; Ward, Lucas D.; Birney, Ewan; Crawford, Gregory E.; Dekker, Job; Dunham, Ian; Elnitski, Laura L.; Farnham, Peggy J.; Feingold, Elise A.; Gerstein, Mark; Giddings, Morgan C.; Gilbert, David M.; Gingeras, Thomas R.; Green, Eric D.; Guigo, Roderic; Hubbard, Tim; Kent, Jim; Lieb, Jason D.; Myers, Richard M.; Pazin, Michael J.; Ren, Bing; Stamatoyannopoulos, John A.; Weng, Zhiping; White, Kevin P.; Hardison, Ross C.

    2014-01-01

    With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease. PMID:24753594

  16. Fast group matching for MR fingerprinting reconstruction.

    Science.gov (United States)

    Cauley, Stephen F; Setsompop, Kawin; Ma, Dan; Jiang, Yun; Ye, Huihui; Adalsteinsson, Elfar; Griswold, Mark A; Wald, Lawrence L

    2015-08-01

    MR fingerprinting (MRF) is a technique for quantitative tissue mapping using pseudorandom measurements. To estimate tissue properties such as T1 , T2 , proton density, and B0 , the rapidly acquired data are compared against a large dictionary of Bloch simulations. This matching process can be a very computationally demanding portion of MRF reconstruction. We introduce a fast group matching algorithm (GRM) that exploits inherent correlation within MRF dictionaries to create highly clustered groupings of the elements. During matching, a group specific signature is first used to remove poor matching possibilities. Group principal component analysis (PCA) is used to evaluate all remaining tissue types. In vivo 3 Tesla brain data were used to validate the accuracy of our approach. For a trueFISP sequence with over 196,000 dictionary elements, 1000 MRF samples, and image matrix of 128 × 128, GRM was able to map MR parameters within 2s using standard vendor computational resources. This is an order of magnitude faster than global PCA and nearly two orders of magnitude faster than direct matching, with comparable accuracy (1-2% relative error). The proposed GRM method is a highly efficient model reduction technique for MRF matching and should enable clinically relevant reconstruction accuracy and time on standard vendor computational resources. © 2014 Wiley Periodicals, Inc.

  17. Mutation Detection with Next-Generation Resequencing through a Mediator Genome

    Energy Technology Data Exchange (ETDEWEB)

    Wurtzel, Omri; Dori-Bachash, Mally; Pietrokovski, Shmuel; Jurkevitch, Edouard; Sorek, Rotem; Ben-Jacob, Eshel

    2010-12-31

    The affordability of next generation sequencing (NGS) is transforming the field of mutation analysis in bacteria. The genetic basis for phenotype alteration can be identified directly by sequencing the entire genome of the mutant and comparing it to the wild-type (WT) genome, thus identifying acquired mutations. A major limitation for this approach is the need for an a-priori sequenced reference genome for the WT organism, as the short reads of most current NGS approaches usually prohibit de-novo genome assembly. To overcome this limitation we propose a general framework that utilizes the genome of relative organisms as mediators for comparing WT and mutant bacteria. Under this framework, both mutant and WT genomes are sequenced with NGS, and the short sequencing reads are mapped to the mediator genome. Variations between the mutant and the mediator that recur in the WT are ignored, thus pinpointing the differences between the mutant and the WT. To validate this approach we sequenced the genome of Bdellovibrio bacteriovorus 109J, an obligatory bacterial predator, and its prey-independent mutant, and compared both to the mediator species Bdellovibrio bacteriovorus HD100. Although the mutant and the mediator sequences differed in more than 28,000 nucleotide positions, our approach enabled pinpointing the single causative mutation. Experimental validation in 53 additional mutants further established the implicated gene. Our approach extends the applicability of NGS-based mutant analyses beyond the domain of available reference genomes.

  18. A Snapshot of the Emerging Tomato Genome Sequence

    Directory of Open Access Journals (Sweden)

    Lukas A. Mueller

    2009-03-01

    Full Text Available The genome of tomato ( L. is being sequenced by an international consortium of 10 countries (Korea, China, the United Kingdom, India, the Netherlands, France, Japan, Spain, Italy, and the United States as part of the larger “International Solanaceae Genome Project (SOL: Systems Approach to Diversity and Adaptation” initiative. The tomato genome sequencing project uses an ordered bacterial artificial chromosome (BAC approach to generate a high-quality tomato euchromatic genome sequence for use as a reference genome for the Solanaceae and euasterids. Sequence is deposited at GenBank and at the SOL Genomics Network (SGN. Currently, there are around 1000 BACs finished or in progress, representing more than a third of the projected euchromatic portion of the genome. An annotation effort is also underway by the International Tomato Annotation Group. The expected number of genes in the euchromatin is ∼40,000, based on an estimate from a preliminary annotation of 11% of finished sequence. Here, we present this first snapshot of the emerging tomato genome and its annotation, a short comparison with potato ( L. sequence data, and the tools available for the researchers to exploit this new resource are also presented. In the future, whole-genome shotgun techniques will be combined with the BAC-by-BAC approach to cover the entire tomato genome. The high-quality reference euchromatic tomato sequence is expected to be near completion by 2010.

  19. RMP: Reduced-set matching pursuit approach for efficient compressed sensing signal reconstruction

    Directory of Open Access Journals (Sweden)

    Michael M. Abdel-Sayed

    2016-11-01

    Full Text Available Compressed sensing enables the acquisition of sparse signals at a rate that is much lower than the Nyquist rate. Compressed sensing initially adopted ℓ1 minimization for signal reconstruction which is computationally expensive. Several greedy recovery algorithms have been recently proposed for signal reconstruction at a lower computational complexity compared to the optimal ℓ1 minimization, while maintaining a good reconstruction accuracy. In this paper, the Reduced-set Matching Pursuit (RMP greedy recovery algorithm is proposed for compressed sensing. Unlike existing approaches which either select too many or too few values per iteration, RMP aims at selecting the most sufficient number of correlation values per iteration, which improves both the reconstruction time and error. Furthermore, RMP prunes the estimated signal, and hence, excludes the incorrectly selected values. The RMP algorithm achieves a higher reconstruction accuracy at a significantly low computational complexity compared to existing greedy recovery algorithms. It is even superior to ℓ1 minimization in terms of the normalized time-error product, a new metric introduced to measure the trade-off between the reconstruction time and error. RMP superior performance is illustrated with both noiseless and noisy samples.

  20. RMP: Reduced-set matching pursuit approach for efficient compressed sensing signal reconstruction.

    Science.gov (United States)

    Abdel-Sayed, Michael M; Khattab, Ahmed; Abu-Elyazeed, Mohamed F

    2016-11-01

    Compressed sensing enables the acquisition of sparse signals at a rate that is much lower than the Nyquist rate. Compressed sensing initially adopted [Formula: see text] minimization for signal reconstruction which is computationally expensive. Several greedy recovery algorithms have been recently proposed for signal reconstruction at a lower computational complexity compared to the optimal [Formula: see text] minimization, while maintaining a good reconstruction accuracy. In this paper, the Reduced-set Matching Pursuit (RMP) greedy recovery algorithm is proposed for compressed sensing. Unlike existing approaches which either select too many or too few values per iteration, RMP aims at selecting the most sufficient number of correlation values per iteration, which improves both the reconstruction time and error. Furthermore, RMP prunes the estimated signal, and hence, excludes the incorrectly selected values. The RMP algorithm achieves a higher reconstruction accuracy at a significantly low computational complexity compared to existing greedy recovery algorithms. It is even superior to [Formula: see text] minimization in terms of the normalized time-error product, a new metric introduced to measure the trade-off between the reconstruction time and error. RMP superior performance is illustrated with both noiseless and noisy samples.

  1. Single virus genomics: a new tool for virus discovery.

    Directory of Open Access Journals (Sweden)

    Lisa Zeigler Allen

    Full Text Available Whole genome amplification and sequencing of single microbial cells has significantly influenced genomics and microbial ecology by facilitating direct recovery of reference genome data. However, viral genomics continues to suffer due to difficulties related to the isolation and characterization of uncultivated viruses. We report here on a new approach called 'Single Virus Genomics', which enabled the isolation and complete genome sequencing of the first single virus particle. A mixed assemblage comprised of two known viruses; E. coli bacteriophages lambda and T4, were sorted using flow cytometric methods and subsequently immobilized in an agarose matrix. Genome amplification was then achieved in situ via multiple displacement amplification (MDA. The complete lambda phage genome was recovered with an average depth of coverage of approximately 437X. The isolation and genome sequencing of uncultivated viruses using Single Virus Genomics approaches will enable researchers to address questions about viral diversity, evolution, adaptation and ecology that were previously unattainable.

  2. Covariant diagrams for one-loop matching

    International Nuclear Information System (INIS)

    Zhang, Zhengkang

    2016-10-01

    We present a diagrammatic formulation of recently-revived covariant functional approaches to one-loop matching from an ultraviolet (UV) theory to a low-energy effective field theory. Various terms following from a covariant derivative expansion (CDE) are represented by diagrams which, unlike conventional Feynman diagrams, involve gaugecovariant quantities and are thus dubbed ''covariant diagrams.'' The use of covariant diagrams helps organize and simplify one-loop matching calculations, which we illustrate with examples. Of particular interest is the derivation of UV model-independent universal results, which reduce matching calculations of specific UV models to applications of master formulas. We show how such derivation can be done in a more concise manner than the previous literature, and discuss how additional structures that are not directly captured by existing universal results, including mixed heavy-light loops, open covariant derivatives, and mixed statistics, can be easily accounted for.

  3. Covariant diagrams for one-loop matching

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Zhengkang [Michigan Center for Theoretical Physics (MCTP), University of Michigan,450 Church Street, Ann Arbor, MI 48109 (United States); Deutsches Elektronen-Synchrotron (DESY),Notkestraße 85, 22607 Hamburg (Germany)

    2017-05-30

    We present a diagrammatic formulation of recently-revived covariant functional approaches to one-loop matching from an ultraviolet (UV) theory to a low-energy effective field theory. Various terms following from a covariant derivative expansion (CDE) are represented by diagrams which, unlike conventional Feynman diagrams, involve gauge-covariant quantities and are thus dubbed “covariant diagrams.” The use of covariant diagrams helps organize and simplify one-loop matching calculations, which we illustrate with examples. Of particular interest is the derivation of UV model-independent universal results, which reduce matching calculations of specific UV models to applications of master formulas. We show how such derivation can be done in a more concise manner than the previous literature, and discuss how additional structures that are not directly captured by existing universal results, including mixed heavy-light loops, open covariant derivatives, and mixed statistics, can be easily accounted for.

  4. Covariant diagrams for one-loop matching

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Zhengkang [Michigan Univ., Ann Arbor, MI (United States). Michigan Center for Theoretical Physics; Deutsches Elektronen-Synchrotron (DESY), Hamburg (Germany)

    2016-10-15

    We present a diagrammatic formulation of recently-revived covariant functional approaches to one-loop matching from an ultraviolet (UV) theory to a low-energy effective field theory. Various terms following from a covariant derivative expansion (CDE) are represented by diagrams which, unlike conventional Feynman diagrams, involve gaugecovariant quantities and are thus dubbed ''covariant diagrams.'' The use of covariant diagrams helps organize and simplify one-loop matching calculations, which we illustrate with examples. Of particular interest is the derivation of UV model-independent universal results, which reduce matching calculations of specific UV models to applications of master formulas. We show how such derivation can be done in a more concise manner than the previous literature, and discuss how additional structures that are not directly captured by existing universal results, including mixed heavy-light loops, open covariant derivatives, and mixed statistics, can be easily accounted for.

  5. Covariant diagrams for one-loop matching

    International Nuclear Information System (INIS)

    Zhang, Zhengkang

    2017-01-01

    We present a diagrammatic formulation of recently-revived covariant functional approaches to one-loop matching from an ultraviolet (UV) theory to a low-energy effective field theory. Various terms following from a covariant derivative expansion (CDE) are represented by diagrams which, unlike conventional Feynman diagrams, involve gauge-covariant quantities and are thus dubbed “covariant diagrams.” The use of covariant diagrams helps organize and simplify one-loop matching calculations, which we illustrate with examples. Of particular interest is the derivation of UV model-independent universal results, which reduce matching calculations of specific UV models to applications of master formulas. We show how such derivation can be done in a more concise manner than the previous literature, and discuss how additional structures that are not directly captured by existing universal results, including mixed heavy-light loops, open covariant derivatives, and mixed statistics, can be easily accounted for.

  6. Genomic taxonomy of vibrios

    Directory of Open Access Journals (Sweden)

    Iida Tetsuya

    2009-10-01

    Full Text Available Abstract Background Vibrio taxonomy has been based on a polyphasic approach. In this study, we retrieve useful taxonomic information (i.e. data that can be used to distinguish different taxonomic levels, such as species and genera from 32 genome sequences of different vibrio species. We use a variety of tools to explore the taxonomic relationship between the sequenced genomes, including Multilocus Sequence Analysis (MLSA, supertrees, Average Amino Acid Identity (AAI, genomic signatures, and Genome BLAST atlases. Our aim is to analyse the usefulness of these tools for species identification in vibrios. Results We have generated four new genome sequences of three Vibrio species, i.e., V. alginolyticus 40B, V. harveyi-like 1DA3, and V. mimicus strains VM573 and VM603, and present a broad analyses of these genomes along with other sequenced Vibrio species. The genome atlas and pangenome plots provide a tantalizing image of the genomic differences that occur between closely related sister species, e.g. V. cholerae and V. mimicus. The vibrio pangenome contains around 26504 genes. The V. cholerae core genome and pangenome consist of 1520 and 6923 genes, respectively. Pangenomes might allow different strains of V. cholerae to occupy different niches. MLSA and supertree analyses resulted in a similar phylogenetic picture, with a clear distinction of four groups (Vibrio core group, V. cholerae-V. mimicus, Aliivibrio spp., and Photobacterium spp.. A Vibrio species is defined as a group of strains that share > 95% DNA identity in MLSA and supertree analysis, > 96% AAI, ≤ 10 genome signature dissimilarity, and > 61% proteome identity. Strains of the same species and species of the same genus will form monophyletic groups on the basis of MLSA and supertree. Conclusion The combination of different analytical and bioinformatics tools will enable the most accurate species identification through genomic computational analysis. This endeavour will culminate in

  7. MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes

    Directory of Open Access Journals (Sweden)

    Yang Yi-Fan

    2007-03-01

    Full Text Available Abstract Background Despite a remarkable success in the computational prediction of genes in Bacteria and Archaea, a lack of comprehensive understanding of prokaryotic gene structures prevents from further elucidation of differences among genomes. It continues to be interesting to develop new ab initio algorithms which not only accurately predict genes, but also facilitate comparative studies of prokaryotic genomes. Results This paper describes a new prokaryotic genefinding algorithm based on a comprehensive statistical model of protein coding Open Reading Frames (ORFs and Translation Initiation Sites (TISs. The former is based on a linguistic "Entropy Density Profile" (EDP model of coding DNA sequence and the latter comprises several relevant features related to the translation initiation. They are combined to form a so-called Multivariate Entropy Distance (MED algorithm, MED 2.0, that incorporates several strategies in the iterative program. The iterations enable us to develop a non-supervised learning process and to obtain a set of genome-specific parameters for the gene structure, before making the prediction of genes. Conclusion Results of extensive tests show that MED 2.0 achieves a competitive high performance in the gene prediction for both 5' and 3' end matches, compared to the current best prokaryotic gene finders. The advantage of the MED 2.0 is particularly evident for GC-rich genomes and archaeal genomes. Furthermore, the genome-specific parameters given by MED 2.0 match with the current understanding of prokaryotic genomes and may serve as tools for comparative genomic studies. In particular, MED 2.0 is shown to reveal divergent translation initiation mechanisms in archaeal genomes while making a more accurate prediction of TISs compared to the existing gene finders and the current GenBank annotation.

  8. Genome-scale identification of Legionella pneumophila effectors using a machine learning approach.

    Directory of Open Access Journals (Sweden)

    David Burstein

    2009-07-01

    Full Text Available A large number of highly pathogenic bacteria utilize secretion systems to translocate effector proteins into host cells. Using these effectors, the bacteria subvert host cell processes during infection. Legionella pneumophila translocates effectors via the Icm/Dot type-IV secretion system and to date, approximately 100 effectors have been identified by various experimental and computational techniques. Effector identification is a critical first step towards the understanding of the pathogenesis system in L. pneumophila as well as in other bacterial pathogens. Here, we formulate the task of effector identification as a classification problem: each L. pneumophila open reading frame (ORF was classified as either effector or not. We computationally defined a set of features that best distinguish effectors from non-effectors. These features cover a wide range of characteristics including taxonomical dispersion, regulatory data, genomic organization, similarity to eukaryotic proteomes and more. Machine learning algorithms utilizing these features were then applied to classify all the ORFs within the L. pneumophila genome. Using this approach we were able to predict and experimentally validate 40 new effectors, reaching a success rate of above 90%. Increasing the number of validated effectors to around 140, we were able to gain novel insights into their characteristics. Effectors were found to have low G+C content, supporting the hypothesis that a large number of effectors originate via horizontal gene transfer, probably from their protozoan host. In addition, effectors were found to cluster in specific genomic regions. Finally, we were able to provide a novel description of the C-terminal translocation signal required for effector translocation by the Icm/Dot secretion system. To conclude, we have discovered 40 novel L. pneumophila effectors, predicted over a hundred additional highly probable effectors, and shown the applicability of machine

  9. 76 FR 5235 - Privacy Act of 1974, as Amended; Computer Matching Program (SSA Internal Match)-Match Number 1014

    Science.gov (United States)

    2011-01-28

    ...; Computer Matching Program (SSA Internal Match)--Match Number 1014 AGENCY: Social Security Administration... regarding protections for such persons. The Privacy Act, as amended, regulates the use of computer matching....C. 552a, as amended, and the provisions of the Computer Matching and Privacy Protection Act of 1988...

  10. Evaluation of the Match External Load in Soccer: Methods Comparison.

    Science.gov (United States)

    Castagna, Carlo; Varley, Matthew; Póvoas, Susana C A; D'Ottavio, Stefano

    2017-04-01

    To test the interchangeability of 2 match-analysis approaches for external-load detection considering arbitrary selected speeds and metabolic power (MP) thresholds in male top-level soccer. Data analyses were performed considering match physical performance of 60 matches (1200 player cases) of randomly selected Spanish, German, and English first-division championship matches (2013-14 season). Match analysis was performed with a validated semiautomated multicamera system operating at 25 Hz. During a match, players covered 10,673 ± 348 m, of which 1778 ± 208 m and 2759 ± 241 m were performed at high intensity, as measured using speed (≥16 km/h, HI) and metabolic power (≥20 W/kg, MPHI) notations. High-intensity notations were nearly perfectly associated (r = .93, P Player high-intensity decelerations (≥-2 m/s 2 ) were very largely associated with MPHI (r = .73, P physical match-analysis methods can be independently used to track match external load in elite-level players. However, match-analyst decisions must be based on use of a single method to avoid bias in external-load determination.

  11. Elevated Rate of Genome Rearrangements in Radiation-Resistant Bacteria.

    Science.gov (United States)

    Repar, Jelena; Supek, Fran; Klanjscek, Tin; Warnecke, Tobias; Zahradka, Ksenija; Zahradka, Davor

    2017-04-01

    A number of bacterial, archaeal, and eukaryotic species are known for their resistance to ionizing radiation. One of the challenges these species face is a potent environmental source of DNA double-strand breaks, potential drivers of genome structure evolution. Efficient and accurate DNA double-strand break repair systems have been demonstrated in several unrelated radiation-resistant species and are putative adaptations to the DNA damaging environment. Such adaptations are expected to compensate for the genome-destabilizing effect of environmental DNA damage and may be expected to result in a more conserved gene order in radiation-resistant species. However, here we show that rates of genome rearrangements, measured as loss of gene order conservation with time, are higher in radiation-resistant species in multiple, phylogenetically independent groups of bacteria. Comparison of indicators of selection for genome organization between radiation-resistant and phylogenetically matched, nonresistant species argues against tolerance to disruption of genome structure as a strategy for radiation resistance. Interestingly, an important mechanism affecting genome rearrangements in prokaryotes, the symmetrical inversions around the origin of DNA replication, shapes genome structure of both radiation-resistant and nonresistant species. In conclusion, the opposing effects of environmental DNA damage and DNA repair result in elevated rates of genome rearrangements in radiation-resistant bacteria. Copyright © 2017 Repar et al.

  12. Comparative scaffolding and gap filling of ancient bacterial genomes applied to two ancient Yersinia pestis genomes

    Science.gov (United States)

    Doerr, Daniel; Chauve, Cedric

    2017-01-01

    Yersinia pestis is the causative agent of the bubonic plague, a disease responsible for several dramatic historical pandemics. Progress in ancient DNA (aDNA) sequencing rendered possible the sequencing of whole genomes of important human pathogens, including the ancient Y. pestis strains responsible for outbreaks of the bubonic plague in London in the 14th century and in Marseille in the 18th century, among others. However, aDNA sequencing data are still characterized by short reads and non-uniform coverage, so assembling ancient pathogen genomes remains challenging and often prevents a detailed study of genome rearrangements. It has recently been shown that comparative scaffolding approaches can improve the assembly of ancient Y. pestis genomes at a chromosome level. In the present work, we address the last step of genome assembly, the gap-filling stage. We describe an optimization-based method AGapEs (ancestral gap estimation) to fill in inter-contig gaps using a combination of a template obtained from related extant genomes and aDNA reads. We show how this approach can be used to refine comparative scaffolding by selecting contig adjacencies supported by a mix of unassembled aDNA reads and comparative signal. We applied our method to two Y. pestis data sets from the London and Marseilles outbreaks, for which we obtained highly improved genome assemblies for both genomes, comprised of, respectively, five and six scaffolds with 95 % of the assemblies supported by ancient reads. We analysed the genome evolution between both ancient genomes in terms of genome rearrangements, and observed a high level of synteny conservation between these strains. PMID:29114402

  13. Mudskipper genomes provide insights into the terrestrial adaptation of amphibious fishes

    DEFF Research Database (Denmark)

    You, Xinxin; Bian, Chao; Zan, Qijie

    2014-01-01

    Mudskippers are amphibious fishes that have developed morphological and physiological adaptations to match their unique lifestyles. Here we perform whole-genome sequencing of four representative mudskippers to elucidate the molecular mechanisms underlying these adaptations. We discover an expansi...

  14. Technical performance and match-to-match variation in elite football teams.

    Science.gov (United States)

    Liu, Hongyou; Gómez, Miguel-Angel; Gonçalves, Bruno; Sampaio, Jaime

    2016-01-01

    Recent research suggests that match-to-match variation adds important information to performance descriptors in team sports, as it helps measure how players fine-tune their tactical behaviours and technical actions to the extreme dynamical environments. The current study aims to identify the differences in technical performance of players from strong and weak teams and to explore match-to-match variation of players' technical match performance. Performance data of all the 380 matches of season 2012-2013 in the Spanish First Division Professional Football League were analysed. Twenty-one performance-related match actions and events were chosen as variables in the analyses. Players' technical performance profiles were established by unifying count values of each action or event of each player per match into the same scale. Means of these count values of players from Top3 and Bottom3 teams were compared and plotted into radar charts. Coefficient of variation of each match action or event within a player was calculated to represent his match-to-match variation of technical performance. Differences in the variation of technical performances of players across different match contexts (team and opposition strength, match outcome and match location) were compared. All the comparisons were achieved by the magnitude-based inferences. Results showed that technical performances differed between players of strong and weak teams from different perspectives across different field positions. Furthermore, the variation of the players' technical performance is affected by the match context, with effects from team and opposition strength greater than effects from match location and match outcome.

  15. Small Vocabulary with Saliency Matching for Video Copy Detection

    DEFF Research Database (Denmark)

    Ren, Huamin; Moeslund, Thomas B.; Tang, Sheng

    2013-01-01

    The importance of copy detection has led to a substantial amount of research in recent years, among which Bag of visual Words (BoW) plays an important role due to its ability to effectively handling occlusion and some minor transformations. One crucial issue in BoW approaches is the size of vocab......The importance of copy detection has led to a substantial amount of research in recent years, among which Bag of visual Words (BoW) plays an important role due to its ability to effectively handling occlusion and some minor transformations. One crucial issue in BoW approaches is the size...... matching algorithm based on salient visual words selection. More specifically, the variation of visual words across a given video are represented as trajectories and those containing locally asymptotically stable points are selected as salient visual words. Then we attempt to measure the similarity of two...... videos through saliency matching merely based on the selected salient visual words to remove false positives. Our experiments show that a small codebook with saliency matching is quite competitive in video copy detection. With the incorporation of the proposed saliency matching, the precision can...

  16. Genomics-assisted breeding in fruit trees

    OpenAIRE

    Iwata, Hiroyoshi; Minamikawa, Mai F.; Kajiya-Kanegae, Hiromi; Ishimori, Motoyuki; Hayashi, Takeshi

    2016-01-01

    Recent advancements in genomic analysis technologies have opened up new avenues to promote the efficiency of plant breeding. Novel genomics-based approaches for plant breeding and genetics research, such as genome-wide association studies (GWAS) and genomic selection (GS), are useful, especially in fruit tree breeding. The breeding of fruit trees is hindered by their long generation time, large plant size, long juvenile phase, and the necessity to wait for the physiological maturity of the pl...

  17. Comparison of endoscopic endonasal and bifrontal craniotomy approaches for olfactory groove meningiomas: A matched pair analysis of outcomes and frontal lobe changes on MRI.

    Science.gov (United States)

    de Almeida, John R; Carvalho, Felipe; Vaz Guimaraes Filho, Francisco; Kiehl, Tim-Rasmus; Koutourousiou, Maria; Su, Shirley; Vescan, Allan D; Witterick, Ian J; Zadeh, Gelareh; Wang, Eric W; Fernandez-Miranda, Juan C; Gardner, Paul A; Gentili, Fred; Snyderman, Carl H

    2015-11-01

    We compare the outcomes and postoperative MRI changes of endoscopic endonasal (EEA) and bifrontal craniotomy (BFC) approaches for olfactory groove meningiomas (OGM). All patients who underwent either BFC or EEA for OGM were eligible. Matched pairs were created by matching tumor volumes of an EEA patient with a BFC patient, and matching the timing of the postoperative scans. The tumor dimensions, peritumoral edema, resectability issues, and frontal lobe changes were recorded based on preoperative and postoperative MRI. Postoperative fluid-attenuated inversion recovery (FLAIR) hyperintensity and residual cystic cavity (porencephalic cave) volume were compared using univariable and multivariable analyses. From a total of 70 patients (46 EEA, 24 BFC), 10 matched pairs (20 patients) were created. Three patients (30%) in the EEA group and two (20%) in the BFC had postoperative cerebrospinal fluid leaks (p=0.61). Gross total resections were achieved in seven (70%) of the EEA group and nine (90%) of the BFC group (p=0.26), and one patient from each group developed a recurrence. On postoperative MRI, there was no significant difference in FLAIR signal volumes between EEA and BFC approaches (6.9 versus 13.3 cm(3); p=0.17) or in porencephalic cave volumes (1.7 versus 5.0 cm(3); p=0.11) in univariable analysis. However, in a multivariable analysis, EEA was associated with less postoperative FLAIR change (p=0.02) after adjusting for the volume of preoperative edema. This study provides preliminary evidence that EEA is associated with quantifiable improvements in postoperative frontal lobe imaging. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. Mix-and-match holography

    KAUST Repository

    Peng, Yifan; Dun, Xiong; Sun, Qilin; Heidrich, Wolfgang

    2017-01-01

    target images into pairs of front and rear phase-distorting surfaces. Different target holograms can be decoded by mixing and matching different front and rear surfaces under specific geometric alignments. Our approach, which we call mixWe derive a detailed image formation model for the setting of holographic projection displays, as well as a multiplexing method based on a combination of phase retrieval methods and complex matrix factorization. We demonstrate several application scenarios in both simulation and physical prototypes.

  19. Functional Genome Mining for Metabolites Encoded by Large Gene Clusters through Heterologous Expression of a Whole-Genome Bacterial Artificial Chromosome Library in Streptomyces spp.

    Science.gov (United States)

    Xu, Min; Wang, Yemin; Zhao, Zhilong; Gao, Guixi; Huang, Sheng-Xiong; Kang, Qianjin; He, Xinyi; Lin, Shuangjun; Pang, Xiuhua; Deng, Zixin

    2016-01-01

    ABSTRACT Genome sequencing projects in the last decade revealed numerous cryptic biosynthetic pathways for unknown secondary metabolites in microbes, revitalizing drug discovery from microbial metabolites by approaches called genome mining. In this work, we developed a heterologous expression and functional screening approach for genome mining from genomic bacterial artificial chromosome (BAC) libraries in Streptomyces spp. We demonstrate mining from a strain of Streptomyces rochei, which is known to produce streptothricins and borrelidin, by expressing its BAC library in the surrogate host Streptomyces lividans SBT5, and screening for antimicrobial activity. In addition to the successful capture of the streptothricin and borrelidin biosynthetic gene clusters, we discovered two novel linear lipopeptides and their corresponding biosynthetic gene cluster, as well as a novel cryptic gene cluster for an unknown antibiotic from S. rochei. This high-throughput functional genome mining approach can be easily applied to other streptomycetes, and it is very suitable for the large-scale screening of genomic BAC libraries for bioactive natural products and the corresponding biosynthetic pathways. IMPORTANCE Microbial genomes encode numerous cryptic biosynthetic gene clusters for unknown small metabolites with potential biological activities. Several genome mining approaches have been developed to activate and bring these cryptic metabolites to biological tests for future drug discovery. Previous sequence-guided procedures relied on bioinformatic analysis to predict potentially interesting biosynthetic gene clusters. In this study, we describe an efficient approach based on heterologous expression and functional screening of a whole-genome library for the mining of bioactive metabolites from Streptomyces. The usefulness of this function-driven approach was demonstrated by the capture of four large biosynthetic gene clusters for metabolites of various chemical types, including

  20. Sugar Metabolism of the First Thermophilic Planctomycete Thermogutta terrifontis: Comparative Genomic and Transcriptomic Approaches

    Directory of Open Access Journals (Sweden)

    Alexander G. Elcheninov

    2017-11-01

    Full Text Available Xanthan gum, a complex polysaccharide comprising glucose, mannose and glucuronic acid residues, is involved in numerous biotechnological applications in cosmetics, agriculture, pharmaceuticals, food and petroleum industries. Additionally, its oligosaccharides were shown to possess antimicrobial, antioxidant, and few other properties. Yet, despite its extensive usage, little is known about xanthan gum degradation pathways and mechanisms. Thermogutta terrifontis, isolated from a sample of microbial mat developed in a terrestrial hot spring of Kunashir island (Far-East of Russia, was described as the first thermophilic representative of the Planctomycetes phylum. It grows well on xanthan gum either at aerobic or anaerobic conditions. Genomic analysis unraveled the pathways of oligo- and polysaccharides utilization, as well as the mechanisms of aerobic and anaerobic respiration. The combination of genomic and transcriptomic approaches suggested a novel xanthan gum degradation pathway which involves novel glycosidase(s of DUF1080 family, hydrolyzing xanthan gum backbone beta-glucosidic linkages and beta-mannosidases instead of xanthan lyases, catalyzing cleavage of terminal beta-mannosidic linkages. Surprisingly, the genes coding DUF1080 proteins were abundant in T. terrifontis and in many other Planctomycetes genomes, which, together with our observation that xanthan gum being a selective substrate for many planctomycetes, suggest crucial role of DUF1080 in xanthan gum degradation. Our findings shed light on the metabolism of the first thermophilic planctomycete, capable to degrade a number of polysaccharides, either aerobically or anaerobically, including the biotechnologically important bacterial polysaccharide xanthan gum.

  1. Sugar Metabolism of the First Thermophilic Planctomycete Thermogutta terrifontis: Comparative Genomic and Transcriptomic Approaches

    Science.gov (United States)

    Elcheninov, Alexander G.; Menzel, Peter; Gudbergsdottir, Soley R.; Slesarev, Alexei I.; Kadnikov, Vitaly V.; Krogh, Anders; Bonch-Osmolovskaya, Elizaveta A.; Peng, Xu; Kublanov, Ilya V.

    2017-01-01

    Xanthan gum, a complex polysaccharide comprising glucose, mannose and glucuronic acid residues, is involved in numerous biotechnological applications in cosmetics, agriculture, pharmaceuticals, food and petroleum industries. Additionally, its oligosaccharides were shown to possess antimicrobial, antioxidant, and few other properties. Yet, despite its extensive usage, little is known about xanthan gum degradation pathways and mechanisms. Thermogutta terrifontis, isolated from a sample of microbial mat developed in a terrestrial hot spring of Kunashir island (Far-East of Russia), was described as the first thermophilic representative of the Planctomycetes phylum. It grows well on xanthan gum either at aerobic or anaerobic conditions. Genomic analysis unraveled the pathways of oligo- and polysaccharides utilization, as well as the mechanisms of aerobic and anaerobic respiration. The combination of genomic and transcriptomic approaches suggested a novel xanthan gum degradation pathway which involves novel glycosidase(s) of DUF1080 family, hydrolyzing xanthan gum backbone beta-glucosidic linkages and beta-mannosidases instead of xanthan lyases, catalyzing cleavage of terminal beta-mannosidic linkages. Surprisingly, the genes coding DUF1080 proteins were abundant in T. terrifontis and in many other Planctomycetes genomes, which, together with our observation that xanthan gum being a selective substrate for many planctomycetes, suggest crucial role of DUF1080 in xanthan gum degradation. Our findings shed light on the metabolism of the first thermophilic planctomycete, capable to degrade a number of polysaccharides, either aerobically or anaerobically, including the biotechnologically important bacterial polysaccharide xanthan gum. PMID:29163426

  2. Biodiversity Monitoring Using NGS Approaches on Unusual Substrates (2013 DOE JGI Genomics of Energy and Environment 8th Annual User Meeting)

    Energy Technology Data Exchange (ETDEWEB)

    Gilbert, Tom

    2013-03-01

    Tom Gilbert of the Natural History Museum of Denmark on "Biodiversity monitoring using NGS approaches on unusual substrates" at the 8th Annual Genomics of Energy & Environment Meeting in Walnut Creek, Calif.

  3. Improved Stereo Matching With Boosting Method

    Directory of Open Access Journals (Sweden)

    Shiny B

    2015-06-01

    Full Text Available Abstract This paper presents an approach based on classification for improving the accuracy of stereo matching methods. We propose this method for occlusion handling. This work employs classification of pixels for finding the erroneous disparity values. Due to the wide applications of disparity map in 3D television medical imaging etc the accuracy of disparity map has high significance. An initial disparity map is obtained using local or global stereo matching methods from the input stereo image pair. The various features for classification are computed from the input stereo image pair and the obtained disparity map. Then the computed feature vector is used for classification of pixels by using GentleBoost as the classification method. The erroneous disparity values in the disparity map found by classification are corrected through a completion stage or filling stage. A performance evaluation of stereo matching using AdaBoostM1 RUSBoost Neural networks and GentleBoost is performed.

  4. State of otolaryngology match: has competition increased since the "early" match?

    Science.gov (United States)

    Cabrera-Muffly, Cristina; Sheeder, Jeanelle; Abaza, Mona

    2015-05-01

    To examine fluctuations in supply and demand of otolaryngology residency positions after the shift from an "early match" coordinated by the San Francisco match to a "conventional" matching process through the National Residency Matching Program (NRMP). To determine whether competition among otolaryngology residency positions have changed during this time frame. Database analysis. Matching statistics from 1998 to 2013 were obtained for all first-year residency positions through the NRMP. Matching statistics from 1998 to 2005 were obtained for otolaryngology residency positions through the San Francisco match. Univariate analysis was performed, with a P value less than .05 determined as significant. The number of otolaryngology positions and applicants remained proportional to the overall number of positions and applicants in the NRMP match. Otolaryngology applicants per position and the matching rate of all applicants did not change between the 2 time periods studied. The overall match rate of US seniors applying to otolaryngology did not change, while the match rate of non-US seniors decreased significantly following initiation of the conventional match. There was no significant change in United States Medical Licensing Exam step 1 scores or percentage of unfilled otolaryngology residency positions between the 2 time periods. When comparing the early versus conventional otolaryngology match time periods, the only major change was the decreased percentage of matching among non-US senior applicants. Despite a significant shift in match timing after 2006, the supply, demand, and competitiveness of otolaryngology residency positions have not changed significantly. © American Academy of Otolaryngology—Head and Neck Surgery Foundation 2015.

  5. Allele coding in genomic evaluation

    Directory of Open Access Journals (Sweden)

    Christensen Ole F

    2011-06-01

    Full Text Available Abstract Background Genomic data are used in animal breeding to assist genetic evaluation. Several models to estimate genomic breeding values have been studied. In general, two approaches have been used. One approach estimates the marker effects first and then, genomic breeding values are obtained by summing marker effects. In the second approach, genomic breeding values are estimated directly using an equivalent model with a genomic relationship matrix. Allele coding is the method chosen to assign values to the regression coefficients in the statistical model. A common allele coding is zero for the homozygous genotype of the first allele, one for the heterozygote, and two for the homozygous genotype for the other allele. Another common allele coding changes these regression coefficients by subtracting a value from each marker such that the mean of regression coefficients is zero within each marker. We call this centered allele coding. This study considered effects of different allele coding methods on inference. Both marker-based and equivalent models were considered, and restricted maximum likelihood and Bayesian methods were used in inference. Results Theoretical derivations showed that parameter estimates and estimated marker effects in marker-based models are the same irrespective of the allele coding, provided that the model has a fixed general mean. For the equivalent models, the same results hold, even though different allele coding methods lead to different genomic relationship matrices. Calculated genomic breeding values are independent of allele coding when the estimate of the general mean is included into the values. Reliabilities of estimated genomic breeding values calculated using elements of the inverse of the coefficient matrix depend on the allele coding because different allele coding methods imply different models. Finally, allele coding affects the mixing of Markov chain Monte Carlo algorithms, with the centered coding being

  6. A zero-one programming approach to Gulliksen's matched random subtests method

    NARCIS (Netherlands)

    van der Linden, Willem J.; Boekkooi-Timminga, Ellen

    1986-01-01

    In order to estimate the classical coefficient of test reliability, parallel measurements are needed. H. Gulliksen's matched random subtests method, which is a graphical method for splitting a test into parallel test halves, has practical relevance because it maximizes the alpha coefficient as a

  7. Artificial intelligence (AI)-based relational matching and multimodal medical image fusion: generalized 3D approaches

    Science.gov (United States)

    Vajdic, Stevan M.; Katz, Henry E.; Downing, Andrew R.; Brooks, Michael J.

    1994-09-01

    A 3D relational image matching/fusion algorithm is introduced. It is implemented in the domain of medical imaging and is based on Artificial Intelligence paradigms--in particular, knowledge base representation and tree search. The 2D reference and target images are selected from 3D sets and segmented into non-touching and non-overlapping regions, using iterative thresholding and/or knowledge about the anatomical shapes of human organs. Selected image region attributes are calculated. Region matches are obtained using a tree search, and the error is minimized by evaluating a `goodness' of matching function based on similarities of region attributes. Once the matched regions are found and the spline geometric transform is applied to regional centers of gravity, images are ready for fusion and visualization into a single 3D image of higher clarity.

  8. From NGS assembly challenges to instability of fungal mitochondrial genomes: A case study in genome complexity.

    Science.gov (United States)

    Misas, Elizabeth; Muñoz, José Fernando; Gallo, Juan Esteban; McEwen, Juan Guillermo; Clay, Oliver Keatinge

    2016-04-01

    The presence of repetitive or non-unique DNA persisting over sizable regions of a eukaryotic genome can hinder the genome's successful de novo assembly from short reads: ambiguities in assigning genome locations to the non-unique subsequences can result in premature termination of contigs and thus overfragmented assemblies. Fungal mitochondrial (mtDNA) genomes are compact (typically less than 100 kb), yet often contain short non-unique sequences that can be shown to impede their successful de novo assembly in silico. Such repeats can also confuse processes in the cell in vivo. A well-studied example is ectopic (out-of-register, illegitimate) recombination associated with repeat pairs, which can lead to deletion of functionally important genes that are located between the repeats. Repeats that remain conserved over micro- or macroevolutionary timescales despite such risks may indicate functionally or structurally (e.g., for replication) important regions. This principle could form the basis of a mining strategy for accelerating discovery of function in genome sequences. We present here our screening of a sample of 11 fully sequenced fungal mitochondrial genomes by observing where exact k-mer repeats occurred several times; initial analyses motivated us to focus on 17-mers occurring more than three times. Based on the diverse repeats we observe, we propose that such screening may serve as an efficient expedient for gaining a rapid but representative first insight into the repeat landscapes of sparsely characterized mitochondrial chromosomes. Our matching of the flagged repeats to previously reported regions of interest supports the idea that systems of persisting, non-trivial repeats in genomes can often highlight features meriting further attention. Copyright © 2016 Elsevier Ltd. All rights reserved.

  9. Integration of prior knowledge into dense image matching for video surveillance

    Science.gov (United States)

    Menze, M.; Heipke, C.

    2014-08-01

    Three-dimensional information from dense image matching is a valuable input for a broad range of vision applications. While reliable approaches exist for dedicated stereo setups they do not easily generalize to more challenging camera configurations. In the context of video surveillance the typically large spatial extent of the region of interest and repetitive structures in the scene render the application of dense image matching a challenging task. In this paper we present an approach that derives strong prior knowledge from a planar approximation of the scene. This information is integrated into a graph-cut based image matching framework that treats the assignment of optimal disparity values as a labelling task. Introducing the planar prior heavily reduces ambiguities together with the search space and increases computational efficiency. The results provide a proof of concept of the proposed approach. It allows the reconstruction of dense point clouds in more general surveillance camera setups with wider stereo baselines.

  10. Comparative genomics and association mapping approaches for blast resistant genes in finger millet using SSRs.

    Science.gov (United States)

    Babu, B Kalyana; Dinesh, Pandey; Agrawal, Pawan K; Sood, S; Chandrashekara, C; Bhatt, Jagadish C; Kumar, Anil

    2014-01-01

    The major limiting factor for production and productivity of finger millet crop is blast disease caused by Magnaporthe grisea. Since, the genome sequence information available in finger millet crop is scarce, comparative genomics plays a very important role in identification of genes/QTLs linked to the blast resistance genes using SSR markers. In the present study, a total of 58 genic SSRs were developed for use in genetic analysis of a global collection of 190 finger millet genotypes. The 58 SSRs yielded ninety five scorable alleles and the polymorphism information content varied from 0.186 to 0.677 at an average of 0.385. The gene diversity was in the range of 0.208 to 0.726 with an average of 0.487. Association mapping for blast resistance was done using 104 SSR markers which identified four QTLs for finger blast and one QTL for neck blast resistance. The genomic marker RM262 and genic marker FMBLEST32 were linked to finger blast disease at a P value of 0.007 and explained phenotypic variance (R²) of 10% and 8% respectively. The genomic marker UGEP81 was associated to finger blast at a P value of 0.009 and explained 7.5% of R². The QTLs for neck blast was associated with the genomic SSR marker UGEP18 at a P value of 0.01, which explained 11% of R². Three QTLs for blast resistance were found common by using both GLM and MLM approaches. The resistant alleles were found to be present mostly in the exotic genotypes. Among the genotypes of NW Himalayan region of India, VHC3997, VHC3996 and VHC3930 were found highly resistant, which may be effectively used as parents for developing blast resistant cultivars in the NW Himalayan region of India. The markers linked to the QTLs for blast resistance in the present study can be further used for cloning of the full length gene, fine mapping and their further use in the marker assisted breeding programmes for introgression of blast resistant alleles into locally adapted cultivars.

  11. Shadow Areas Robust Matching Among Image Sequence in Planetary Landing

    Science.gov (United States)

    Ruoyan, Wei; Xiaogang, Ruan; Naigong, Yu; Xiaoqing, Zhu; Jia, Lin

    2017-01-01

    In this paper, an approach for robust matching shadow areas in autonomous visual navigation and planetary landing is proposed. The approach begins with detecting shadow areas, which are extracted by Maximally Stable Extremal Regions (MSER). Then, an affine normalization algorithm is applied to normalize the areas. Thirdly, a descriptor called Multiple Angles-SIFT (MA-SIFT) that coming from SIFT is proposed, the descriptor can extract more features of an area. Finally, for eliminating the influence of outliers, a method of improved RANSAC based on Skinner Operation Condition is proposed to extract inliers. At last, series of experiments are conducted to test the performance of the approach this paper proposed, the results show that the approach can maintain the matching accuracy at a high level even the differences among the images are obvious with no attitude measurements supplied.

  12. Reefgenomics.Org - a repository for marine genomics data.

    Science.gov (United States)

    Liew, Yi Jin; Aranda, Manuel; Voolstra, Christian R

    2016-01-01

    Over the last decade, technological advancements have substantially decreased the cost and time of obtaining large amounts of sequencing data. Paired with the exponentially increased computing power, individual labs are now able to sequence genomes or transcriptomes to investigate biological questions of interest. This has led to a significant increase in available sequence data. Although the bulk of data published in articles are stored in public sequence databases, very often, only raw sequencing data are available; miscellaneous data such as assembled transcriptomes, genome annotations etc. are not easily obtainable through the same means. Here, we introduce our website (http://reefgenomics.org) that aims to centralize genomic and transcriptomic data from marine organisms. Besides providing convenient means to download sequences, we provide (where applicable) a genome browser to explore available genomic features, and a BLAST interface to search through the hosted sequences. Through the interface, multiple datasets can be queried simultaneously, allowing for the retrieval of matching sequences from organisms of interest. The minimalistic, no-frills interface reduces visual clutter, making it convenient for end-users to search and explore processed sequence data. DATABASE URL: http://reefgenomics.org. © The Author(s) 2016. Published by Oxford University Press.

  13. CGI: Java software for mapping and visualizing data from array-based comparative genomic hybridization and expression profiling.

    Science.gov (United States)

    Gu, Joyce Xiuweu-Xu; Wei, Michael Yang; Rao, Pulivarthi H; Lau, Ching C; Behl, Sanjiv; Man, Tsz-Kwong

    2007-10-06

    With the increasing application of various genomic technologies in biomedical research, there is a need to integrate these data to correlate candidate genes/regions that are identified by different genomic platforms. Although there are tools that can analyze data from individual platforms, essential software for integration of genomic data is still lacking. Here, we present a novel Java-based program called CGI (Cytogenetics-Genomics Integrator) that matches the BAC clones from array-based comparative genomic hybridization (aCGH) to genes from RNA expression profiling datasets. The matching is computed via a fast, backend MySQL database containing UCSC Genome Browser annotations. This program also provides an easy-to-use graphical user interface for visualizing and summarizing the correlation of DNA copy number changes and RNA expression patterns from a set of experiments. In addition, CGI uses a Java applet to display the copy number values of a specific BAC clone in aCGH experiments side by side with the expression levels of genes that are mapped back to that BAC clone from the microarray experiments. The CGI program is built on top of extensible, reusable graphic components specifically designed for biologists. It is cross-platform compatible and the source code is freely available under the General Public License.

  14. CGI: Java Software for Mapping and Visualizing Data from Array-based Comparative Genomic Hybridization and Expression Profiling

    Directory of Open Access Journals (Sweden)

    Joyce Xiuweu-Xu Gu

    2007-01-01

    Full Text Available With the increasing application of various genomic technologies in biomedical research, there is a need to integrate these data to correlate candidate genes/regions that are identified by different genomic platforms. Although there are tools that can analyze data from individual platforms, essential software for integration of genomic data is still lacking. Here, we present a novel Java-based program called CGI (Cytogenetics-Genomics Integrator that matches the BAC clones from array-based comparative genomic hybridization (aCGH to genes from RNA expression profiling datasets. The matching is computed via a fast, backend MySQL database containing UCSC Genome Browser annotations. This program also provides an easy-to-use graphical user interface for visualizing and summarizing the correlation of DNA copy number changes and RNA expression patterns from a set of experiments. In addition, CGI uses a Java applet to display the copy number values of a specifi c BAC clone in aCGH experiments side by side with the expression levels of genes that are mapped back to that BAC clone from the microarray experiments. The CGI program is built on top of extensible, reusable graphic components specifically designed for biologists. It is cross-platform compatible and the source code is freely available under the General Public License.

  15. Enhancer Identification through Comparative Genomics

    Energy Technology Data Exchange (ETDEWEB)

    Visel, Axel; Bristow, James; Pennacchio, Len A.

    2006-10-01

    With the availability of genomic sequence from numerousvertebrates, a paradigm shift has occurred in the identification ofdistant-acting gene regulatory elements. In contrast to traditionalgene-centric studies in which investigators randomly scanned genomicfragments that flank genes of interest in functional assays, the modernapproach begins electronically with publicly available comparativesequence datasets that provide investigators with prioritized lists ofputative functional sequences based on their evolutionary conservation.However, although a large number of tools and resources are nowavailable, application of comparative genomic approaches remains far fromtrivial. In particular, it requires users to dynamically consider thespecies and methods for comparison depending on the specific biologicalquestion under investigation. While there is currently no single generalrule to this end, it is clear that when applied appropriately,comparative genomic approaches exponentially increase our power ingenerating biological hypotheses for subsequent experimentaltesting.

  16. Optimization of genome engineering approaches with the CRISPR/Cas9 system

    DEFF Research Database (Denmark)

    Li, Kai; Wang, Gang; Andersen, Troels

    2014-01-01

    Designer nucleases such as TALENS and Cas9 have opened new opportunities to scarlessly edit the mammalian genome. Here we explored several parameters that influence Cas9-mediated scarless genome editing efficiency in murine embryonic stem cells. Optimization of transfection conditions and enrichi...

  17. Hierarchical Matching of Traffic Information Services Using Semantic Similarity

    Directory of Open Access Journals (Sweden)

    Zongtao Duan

    2018-01-01

    Full Text Available Service matching aims to find the information similar to a given query, which has numerous applications in web search. Although existing methods yield promising results, they are not applicable for transportation. In this paper, we propose a multilevel matching method based on semantic technology, towards efficiently searching the traffic information requested. Our approach is divided into two stages: service clustering, which prunes candidate services that are not promising, and functional matching. The similarity at function level between services is computed by grouping the connections between the services into inheritance and noninheritance relationships. We also developed a three-layer framework with a semantic similarity measure that requires less time and space cost than existing method since the scale of candidate services is significantly smaller than the whole transportation network. The OWL_TC4 based service set was used to verify the proposed approach. The accuracy of offline service clustering reached 93.80%, and it reduced the response time to 651 ms when the total number of candidate services was 1000. Moreover, given the different thresholds for the semantic similarity measure, the proposed mixed matching model did better in terms of recall and precision (i.e., up to 72.7% and 80%, respectively, for more than 1000 services compared to the compared models based on information theory and taxonomic distance. These experimental results confirmed the effectiveness and validity of service matching for responding quickly and accurately to user queries.

  18. Impedance-matched Marx generators

    Directory of Open Access Journals (Sweden)

    W. A. Stygar

    2017-04-01

    Full Text Available We have conceived a new class of prime-power sources for pulsed-power accelerators: impedance-matched Marx generators (IMGs. The fundamental building block of an IMG is a brick, which consists of two capacitors connected electrically in series with a single switch. An IMG comprises a single stage or several stages distributed axially and connected in series. Each stage is powered by a single brick or several bricks distributed azimuthally within the stage and connected in parallel. The stages of a multistage IMG drive an impedance-matched coaxial transmission line with a conical center conductor. When the stages are triggered sequentially to launch a coherent traveling wave along the coaxial line, the IMG achieves electromagnetic-power amplification by triggered emission of radiation. Hence a multistage IMG is a pulsed-power analogue of a laser. To illustrate the IMG approach to prime power, we have developed conceptual designs of two ten-stage IMGs with LC time constants on the order of 100 ns. One design includes 20 bricks per stage, and delivers a peak electrical power of 1.05 TW to a matched-impedance 1.22-Ω load. The design generates 113 kV per stage and has a maximum energy efficiency of 89%. The other design includes a single brick per stage, delivers 68 GW to a matched-impedance 19-Ω load, generates 113 kV per stage, and has a maximum energy efficiency of 90%. For a given electrical-power-output time history, an IMG is less expensive and slightly more efficient than a linear transformer driver, since an IMG does not use ferromagnetic cores.

  19. Impedance-matched Marx generators

    Science.gov (United States)

    Stygar, W. A.; LeChien, K. R.; Mazarakis, M. G.; Savage, M. E.; Stoltzfus, B. S.; Austin, K. N.; Breden, E. W.; Cuneo, M. E.; Hutsel, B. T.; Lewis, S. A.; McKee, G. R.; Moore, J. K.; Mulville, T. D.; Muron, D. J.; Reisman, D. B.; Sceiford, M. E.; Wisher, M. L.

    2017-04-01

    We have conceived a new class of prime-power sources for pulsed-power accelerators: impedance-matched Marx generators (IMGs). The fundamental building block of an IMG is a brick, which consists of two capacitors connected electrically in series with a single switch. An IMG comprises a single stage or several stages distributed axially and connected in series. Each stage is powered by a single brick or several bricks distributed azimuthally within the stage and connected in parallel. The stages of a multistage IMG drive an impedance-matched coaxial transmission line with a conical center conductor. When the stages are triggered sequentially to launch a coherent traveling wave along the coaxial line, the IMG achieves electromagnetic-power amplification by triggered emission of radiation. Hence a multistage IMG is a pulsed-power analogue of a laser. To illustrate the IMG approach to prime power, we have developed conceptual designs of two ten-stage IMGs with L C time constants on the order of 100 ns. One design includes 20 bricks per stage, and delivers a peak electrical power of 1.05 TW to a matched-impedance 1.22 -Ω load. The design generates 113 kV per stage and has a maximum energy efficiency of 89%. The other design includes a single brick per stage, delivers 68 GW to a matched-impedance 19 -Ω load, generates 113 kV per stage, and has a maximum energy efficiency of 90%. For a given electrical-power-output time history, an IMG is less expensive and slightly more efficient than a linear transformer driver, since an IMG does not use ferromagnetic cores.

  20. Accelerating the Switchgrass (Panicum virgatum L.) Breeding Cycle Using Genomic Selection Approaches

    Science.gov (United States)

    Lipka, Alexander E.; Lu, Fei; Cherney, Jerome H.; Buckler, Edward S.; Casler, Michael D.; Costich, Denise E.

    2014-01-01

    Switchgrass (Panicum virgatum L.) is a perennial grass undergoing development as a biofuel feedstock. One of the most important factors hindering breeding efforts in this species is the need for accurate measurement of biomass yield on a per-hectare basis. Genomic selection on simple-to-measure traits that approximate biomass yield has the potential to significantly speed up the breeding cycle. Recent advances in switchgrass genomic and phenotypic resources are now making it possible to evaluate the potential of genomic selection of such traits. We leveraged these resources to study the ability of three widely-used genomic selection models to predict phenotypic values of morphological and biomass quality traits in an association panel consisting of predominantly northern adapted upland germplasm. High prediction accuracies were obtained for most of the traits, with standability having the highest ten-fold cross validation prediction accuracy (0.52). Moreover, the morphological traits generally had higher prediction accuracies than the biomass quality traits. Nevertheless, our results suggest that the quality of current genomic and phenotypic resources available for switchgrass is sufficiently high for genomic selection to significantly impact breeding efforts for biomass yield. PMID:25390940

  1. Physics-based shape matching for intraoperative image guidance

    Energy Technology Data Exchange (ETDEWEB)

    Suwelack, Stefan, E-mail: suwelack@kit.edu; Röhl, Sebastian; Bodenstedt, Sebastian; Reichard, Daniel; Dillmann, Rüdiger; Speidel, Stefanie [Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology, Adenauerring 2, Karlsruhe 76131 (Germany); Santos, Thiago dos; Maier-Hein, Lena [Computer-assisted Interventions, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, Heidelberg 69120 (Germany); Wagner, Martin; Wünscher, Josephine; Kenngott, Hannes; Müller, Beat P. [General, Visceral and Transplantation Surgery, Heidelberg University Hospital, Im Neuenheimer Feld 110, Heidelberg 69120 (Germany)

    2014-11-01

    Purpose: Soft-tissue deformations can severely degrade the validity of preoperative planning data during computer assisted interventions. Intraoperative imaging such as stereo endoscopic, time-of-flight or, laser range scanner data can be used to compensate these movements. In this context, the intraoperative surface has to be matched to the preoperative model. The shape matching is especially challenging in the intraoperative setting due to noisy sensor data, only partially visible surfaces, ambiguous shape descriptors, and real-time requirements. Methods: A novel physics-based shape matching (PBSM) approach to register intraoperatively acquired surface meshes to preoperative planning data is proposed. The key idea of the method is to describe the nonrigid registration process as an electrostatic–elastic problem, where an elastic body (preoperative model) that is electrically charged slides into an oppositely charged rigid shape (intraoperative surface). It is shown that the corresponding energy functional can be efficiently solved using the finite element (FE) method. It is also demonstrated how PBSM can be combined with rigid registration schemes for robust nonrigid registration of arbitrarily aligned surfaces. Furthermore, it is shown how the approach can be combined with landmark based methods and outline its application to image guidance in laparoscopic interventions. Results: A profound analysis of the PBSM scheme based on in silico and phantom data is presented. Simulation studies on several liver models show that the approach is robust to the initial rigid registration and to parameter variations. The studies also reveal that the method achieves submillimeter registration accuracy (mean error between 0.32 and 0.46 mm). An unoptimized, single core implementation of the approach achieves near real-time performance (2 TPS, 7–19 s total registration time). It outperforms established methods in terms of speed and accuracy. Furthermore, it is shown that the

  2. Labor tax reform and equilibrium unemployment : a search and matching approach

    NARCIS (Netherlands)

    Heijdra, Ben J.; Ligthart, Jenny E.

    2004-01-01

    The paper studies simple strategies of labor tax reform in a search and matching model of the labor market featuring endogenous labor supply. Changing the composition of the tax wedge---that is, reducing a payroll tax and increasing a progressive wage tax such that the marginal tax wedge remains

  3. Labor Tax Reform and Equilibrium Unemployment : A Search and Matching Approach

    NARCIS (Netherlands)

    Heijdra, B.J.; Ligthart, J.E.

    2004-01-01

    The paper studies simple strategies of labor tax reform in a search and matching model of the labor market featuring endogenous labor supply.Changing the composition of the tax wedge|that is, reducing a payroll tax and increasing a progressive wage tax such that the marginal tax wedge remains

  4. Pigeons ("Columba Livia") Approach Nash Equilibrium in Experimental Matching Pennies Competitions

    Science.gov (United States)

    Sanabria, Federico; Thrailkill, Eric

    2009-01-01

    The game of Matching Pennies (MP), a simplified version of the more popular Rock, Papers, Scissors, schematically represents competitions between organisms with incentives to predict each other's behavior. Optimal performance in iterated MP competitions involves the production of random choice patterns and the detection of nonrandomness in the…

  5. Genome semantics, in silico multicellular systems and the Central Dogma.

    Science.gov (United States)

    Werner, Eric

    2005-03-21

    Genomes with their complexity and size present what appears to be an impossible challenge. Scientists speak in terms of decades or even centuries before we will understand how genomes and their hosts the cell and the city of cells that make up the multicellular context function. We believe that there will be surprisingly quick progress made in our understanding of genomes. The key is to stop taking the Central Dogma as the only direction in which genome research can scale the semantics of genomes. Instead a top-down approach coupled with a bottom-up approach may snare the unwieldy beast and make sense of genomes. The method we propose is to take in silico biology seriously. By developing in silico models of genomes cells and multicellular systems, we position ourselves to develop a theory of meaning for artificial genomes. Then using that develop a natural semantics of genomes.

  6. Transit Matching for International Safeguards

    International Nuclear Information System (INIS)

    Gilligan, K.; Whitaker, M.; Oakberg, J.

    2015-01-01

    In 2013 the U.S. Department of Energy / National Nuclear Security Administration Office of Non-proliferation and International Security (NIS) supported a study of the International Atomic Energy Agency's (IAEA) processes and procedures for ensuring that shipments of nuclear material correspond to (match) their receipts (i.e., transit matching). Under Comprehensive Safeguards Agreements, Member States are obliged to declare such information within certain time frames. Nuclear weapons states voluntarily declare such information under INFCIRC/207. This study was funded by the NIS Next Generation Safeguards Initiative (NGSI) Concepts and Approaches program. Oak Ridge National Laboratory led the research, which included collaboration with the U.S. Nuclear Regulatory Commission, the U.S. Nuclear Material Management and Safeguards System (NMMSS), and the IAEA Section for Declared Information Analysis within the Department of Safeguards. The project studied the current transit matching methodologies, identified current challenges (e.g., level of effort and timeliness), and suggested improvements. This paper presents the recommendations that resulted from the study and discussions with IAEA staff. In particular, it includes a recommendation to collaboratively develop a set of best reporting practices for nuclear weapons states under INFCIRC/207. (author)

  7. Harnessing Whole Genome Sequencing in Medical Mycology.

    Science.gov (United States)

    Cuomo, Christina A

    2017-01-01

    Comparative genome sequencing studies of human fungal pathogens enable identification of genes and variants associated with virulence and drug resistance. This review describes current approaches, resources, and advances in applying whole genome sequencing to study clinically important fungal pathogens. Genomes for some important fungal pathogens were only recently assembled, revealing gene family expansions in many species and extreme gene loss in one obligate species. The scale and scope of species sequenced is rapidly expanding, leveraging technological advances to assemble and annotate genomes with higher precision. By using iteratively improved reference assemblies or those generated de novo for new species, recent studies have compared the sequence of isolates representing populations or clinical cohorts. Whole genome approaches provide the resolution necessary for comparison of closely related isolates, for example, in the analysis of outbreaks or sampled across time within a single host. Genomic analysis of fungal pathogens has enabled both basic research and diagnostic studies. The increased scale of sequencing can be applied across populations, and new metagenomic methods allow direct analysis of complex samples.

  8. Modelling relationships between match events and match outcome in elite football.

    Science.gov (United States)

    Liu, Hongyou; Hopkins, Will G; Gómez, Miguel-Angel

    2016-08-01

    Identifying match events that are related to match outcome is an important task in football match analysis. Here we have used generalised mixed linear modelling to determine relationships of 16 football match events and 1 contextual variable (game location: home/away) with the match outcome. Statistics of 320 close matches (goal difference ≤ 2) of season 2012-2013 in the Spanish First Division Professional Football League were analysed. Relationships were evaluated with magnitude-based inferences and were expressed as extra matches won or lost per 10 close matches for an increase of two within-team or between-team standard deviations (SD) of the match event (representing effects of changes in team values from match to match and of differences between average team values, respectively). There was a moderate positive within-team effect from shots on target (3.4 extra wins per 10 matches; 99% confidence limits ±1.0), and a small positive within-team effect from total shots (1.7 extra wins; ±1.0). Effects of most other match events were related to ball possession, which had a small negative within-team effect (1.2 extra losses; ±1.0) but a small positive between-team effect (1.7 extra wins; ±1.4). Game location showed a small positive within-team effect (1.9 extra wins; ±0.9). In analyses of nine combinations of team and opposition end-of-season rank (classified as high, medium, low), almost all between-team effects were unclear, while within-team effects varied depending on the strength of team and opposition. Some of these findings will be useful to coaches and performance analysts when planning training sessions and match tactics.

  9. Job Searchers, Job Matches and the Elasticity of Matching

    NARCIS (Netherlands)

    Broersma, L.; van Ours, J.C.

    1998-01-01

    This paper stresses the importance of a specification of the matching function in which the measure of job matches corresponds to the measure of job searchers. In many empirical studies on the matching function this requirement has not been fulfilled because it is difficult to find information about

  10. Next-Generation Sequencing Approaches in Genome-Wide Discovery of Single Nucleotide Polymorphism Markers Associated with Pungency and Disease Resistance in Pepper.

    Science.gov (United States)

    Manivannan, Abinaya; Kim, Jin-Hee; Yang, Eun-Young; Ahn, Yul-Kyun; Lee, Eun-Su; Choi, Sena; Kim, Do-Sun

    2018-01-01

    Pepper is an economically important horticultural plant that has been widely used for its pungency and spicy taste in worldwide cuisines. Therefore, the domestication of pepper has been carried out since antiquity. Owing to meet the growing demand for pepper with high quality, organoleptic property, nutraceutical contents, and disease tolerance, genomics assisted breeding techniques can be incorporated to develop novel pepper varieties with desired traits. The application of next-generation sequencing (NGS) approaches has reformed the plant breeding technology especially in the area of molecular marker assisted breeding. The availability of genomic information aids in the deeper understanding of several molecular mechanisms behind the vital physiological processes. In addition, the NGS methods facilitate the genome-wide discovery of DNA based markers linked to key genes involved in important biological phenomenon. Among the molecular markers, single nucleotide polymorphism (SNP) indulges various benefits in comparison with other existing DNA based markers. The present review concentrates on the impact of NGS approaches in the discovery of useful SNP markers associated with pungency and disease resistance in pepper. The information provided in the current endeavor can be utilized for the betterment of pepper breeding in future.

  11. Next-Generation Sequencing Approaches in Genome-Wide Discovery of Single Nucleotide Polymorphism Markers Associated with Pungency and Disease Resistance in Pepper

    Directory of Open Access Journals (Sweden)

    Abinaya Manivannan

    2018-01-01

    Full Text Available Pepper is an economically important horticultural plant that has been widely used for its pungency and spicy taste in worldwide cuisines. Therefore, the domestication of pepper has been carried out since antiquity. Owing to meet the growing demand for pepper with high quality, organoleptic property, nutraceutical contents, and disease tolerance, genomics assisted breeding techniques can be incorporated to develop novel pepper varieties with desired traits. The application of next-generation sequencing (NGS approaches has reformed the plant breeding technology especially in the area of molecular marker assisted breeding. The availability of genomic information aids in the deeper understanding of several molecular mechanisms behind the vital physiological processes. In addition, the NGS methods facilitate the genome-wide discovery of DNA based markers linked to key genes involved in important biological phenomenon. Among the molecular markers, single nucleotide polymorphism (SNP indulges various benefits in comparison with other existing DNA based markers. The present review concentrates on the impact of NGS approaches in the discovery of useful SNP markers associated with pungency and disease resistance in pepper. The information provided in the current endeavor can be utilized for the betterment of pepper breeding in future.

  12. Assembly of viral genomes from metagenomes

    NARCIS (Netherlands)

    S.L. Smits (Saskia); R. Bodewes (Rogier); A. Ruiz-Gonzalez (Aritz); V. Baumgärtner (Volkmar); M.P.G. Koopmans D.V.M. (Marion); A.D.M.E. Osterhaus (Albert); A. Schürch (Anita)

    2014-01-01

    textabstractViral infections remain a serious global health issue. Metagenomic approaches are increasingly used in the detection of novel viral pathogens but also to generate complete genomes of uncultivated viruses. In silico identification of complete viral genomes from sequence data would allow

  13. Implementing Genome-Driven Oncology

    Science.gov (United States)

    Hyman, David M.; Taylor, Barry S.; Baselga, José

    2017-01-01

    Early successes in identifying and targeting individual oncogenic drivers, together with the increasing feasibility of sequencing tumor genomes, have brought forth the promise of genome-driven oncology care. As we expand the breadth and depth of genomic analyses, the biological and clinical complexity of its implementation will be unparalleled. Challenges include target credentialing and validation, implementing drug combinations, clinical trial designs, targeting tumor heterogeneity, and deploying technologies beyond DNA sequencing, among others. We review how contemporary approaches are tackling these challenges and will ultimately serve as an engine for biological discovery and increase our insight into cancer and its treatment. PMID:28187282

  14. Machine Learning Detects Pan-cancer Ras Pathway Activation in The Cancer Genome Atlas

    Directory of Open Access Journals (Sweden)

    Gregory P. Way

    2018-04-01

    Full Text Available Summary: Precision oncology uses genomic evidence to match patients with treatment but often fails to identify all patients who may respond. The transcriptome of these “hidden responders” may reveal responsive molecular states. We describe and evaluate a machine-learning approach to classify aberrant pathway activity in tumors, which may aid in hidden responder identification. The algorithm integrates RNA-seq, copy number, and mutations from 33 different cancer types across The Cancer Genome Atlas (TCGA PanCanAtlas project to predict aberrant molecular states in tumors. Applied to the Ras pathway, the method detects Ras activation across cancer types and identifies phenocopying variants. The model, trained on human tumors, can predict response to MEK inhibitors in wild-type Ras cell lines. We also present data that suggest that multiple hits in the Ras pathway confer increased Ras activity. The transcriptome is underused in precision oncology and, combined with machine learning, can aid in the identification of hidden responders. : Way et al. develop a machine-learning approach using PanCanAtlas data to detect Ras activation in cancer. Integrating mutation, copy number, and expression data, the authors show that their method detects Ras-activating variants in tumors and sensitivity to MEK inhibitors in cell lines. Keywords: Gene expression, machine learning, Ras, NF1, KRAS, NRAS, HRAS, pan-cancer, TCGA, drug sensitivity

  15. From Genomics to Gene Therapy: Induced Pluripotent Stem Cells Meet Genome Editing.

    Science.gov (United States)

    Hotta, Akitsu; Yamanaka, Shinya

    2015-01-01

    The advent of induced pluripotent stem (iPS) cells has opened up numerous avenues of opportunity for cell therapy, including the initiation in September 2014 of the first human clinical trial to treat dry age-related macular degeneration. In parallel, advances in genome-editing technologies by site-specific nucleases have dramatically improved our ability to edit endogenous genomic sequences at targeted sites of interest. In fact, clinical trials have already begun to implement this technology to control HIV infection. Genome editing in iPS cells is a powerful tool and enables researchers to investigate the intricacies of the human genome in a dish. In the near future, the groundwork laid by such an approach may expand the possibilities of gene therapy for treating congenital disorders. In this review, we summarize the exciting progress being made in the utilization of genomic editing technologies in pluripotent stem cells and discuss remaining challenges toward gene therapy applications.

  16. Serendipitous discovery of Wolbachia genomes in multiple Drosophila species.

    Science.gov (United States)

    Salzberg, Steven L; Dunning Hotopp, Julie C; Delcher, Arthur L; Pop, Mihai; Smith, Douglas R; Eisen, Michael B; Nelson, William C

    2005-01-01

    The Trace Archive is a repository for the raw, unanalyzed data generated by large-scale genome sequencing projects. The existence of this data offers scientists the possibility of discovering additional genomic sequences beyond those originally sequenced. In particular, if the source DNA for a sequencing project came from a species that was colonized by another organism, then the project may yield substantial amounts of genomic DNA, including near-complete genomes, from the symbiotic or parasitic organism. By searching the publicly available repository of DNA sequencing trace data, we discovered three new species of the bacterial endosymbiont Wolbachia pipientis in three different species of fruit fly: Drosophila ananassae, D. simulans, and D. mojavensis. We extracted all sequences with partial matches to a previously sequenced Wolbachia strain and assembled those sequences using customized software. For one of the three new species, the data recovered were sufficient to produce an assembly that covers more than 95% of the genome; for a second species the data produce the equivalent of a 'light shotgun' sampling of the genome, covering an estimated 75-80% of the genome; and for the third species the data cover approximately 6-7% of the genome. The results of this study reveal an unexpected benefit of depositing raw data in a central genome sequence repository: new species can be discovered within this data. The differences between these three new Wolbachia genomes and the previously sequenced strain revealed numerous rearrangements and insertions within each lineage and hundreds of novel genes. The three new genomes, with annotation, have been deposited in GenBank.

  17. Recognizing human actions by learning and matching shape-motion prototype trees.

    Science.gov (United States)

    Jiang, Zhuolin; Lin, Zhe; Davis, Larry S

    2012-03-01

    A shape-motion prototype-based approach is introduced for action recognition. The approach represents an action as a sequence of prototypes for efficient and flexible action matching in long video sequences. During training, an action prototype tree is learned in a joint shape and motion space via hierarchical K-means clustering and each training sequence is represented as a labeled prototype sequence; then a look-up table of prototype-to-prototype distances is generated. During testing, based on a joint probability model of the actor location and action prototype, the actor is tracked while a frame-to-prototype correspondence is established by maximizing the joint probability, which is efficiently performed by searching the learned prototype tree; then actions are recognized using dynamic prototype sequence matching. Distance measures used for sequence matching are rapidly obtained by look-up table indexing, which is an order of magnitude faster than brute-force computation of frame-to-frame distances. Our approach enables robust action matching in challenging situations (such as moving cameras, dynamic backgrounds) and allows automatic alignment of action sequences. Experimental results demonstrate that our approach achieves recognition rates of 92.86 percent on a large gesture data set (with dynamic backgrounds), 100 percent on the Weizmann action data set, 95.77 percent on the KTH action data set, 88 percent on the UCF sports data set, and 87.27 percent on the CMU action data set.

  18. The genome editing revolution

    DEFF Research Database (Denmark)

    Stella, Stefano; Montoya, Guillermo

    2016-01-01

    -Cas system has become the main tool for genome editing in many laboratories. Currently the targeted genome editing technology has been used in many fields and may be a possible approach for human gene therapy. Furthermore, it can also be used to modifying the genomes of model organisms for studying human......In the last 10 years, we have witnessed a blooming of targeted genome editing systems and applications. The area was revolutionized by the discovery and characterization of the transcription activator-like effector proteins, which are easier to engineer to target new DNA sequences than...... sequence). This ribonucleoprotein complex protects bacteria from invading DNAs, and it was adapted to be used in genome editing. The CRISPR ribonucleic acid (RNA) molecule guides to the specific DNA site the Cas9 nuclease to cleave the DNA target. Two years and more than 1000 publications later, the CRISPR...

  19. A zero-one programming approach to Gulliksen's matched random subtests method

    NARCIS (Netherlands)

    van der Linden, Willem J.; Boekkooi-Timminga, Ellen

    1988-01-01

    Gulliksen’s matched random subtests method is a graphical method to split a test into parallel test halves. The method has practical relevance because it maximizes coefficient α as a lower bound to the classical test reliability coefficient. In this paper the same problem is formulated as a zero-one

  20. The complete mitochondrial genome of the tiger tail seahorse, Hippocampus comes (Teleostei, Syngnathidae).

    Science.gov (United States)

    Chang, Chia-Hao; Lin, Han-Yang; Jang-Liaw, Nian-Hong; Shao, Kwang-Tsao; Lin, Yeong-Shin; Ho, Hsuan-Ching

    2013-06-01

    The complete mitochondrial genome of the tiger tail seahorse was sequenced using a polymerase chain reaction-based method. The total length of mitochondrial DNA is 16,525 bp and includes 13 protein-coding genes, 2 ribosomal RNA, 22 transfer RNA genes, and a control region. The mitochondrial gene arrangement of the tiger tail seahorse is also matching the one observed in the most vertebrate creatures. Base composition of the genome is A (32.8%), T (29.8%), C (23.0%), and G (14.4%) with an A+T-rich hallmark as that of other vertebrate mitochondrial genomes.

  1. Genome-wide approaches towards identification of susceptibility genes in complex diseases

    NARCIS (Netherlands)

    Franke, L.H.

    2008-01-01

    Throughout the human genome millions of places exist where humans differ gentically. The aim of this PhD thesis was to systematically assess this genetic variation and its biological consequences in a genome-wide way, through the utilization of DNA oligonucleotide arrays that assess hundres of

  2. Construction of basic match schedules for sports competitions by using graph theory

    NARCIS (Netherlands)

    van Weert, Arjan; Schreuder, J.A.M.; Burke, Edmund; Carter, Michael

    1997-01-01

    Basic Match Schedules are important for constructing sports timetables. Firstly these schedules guarantee the fairness of the sports competitions and secondly they reduce the complexity of the problem. This paper presents an approach to the problem of finding Basic Match Schedules for sports

  3. 78 FR 73195 - Privacy Act of 1974: CMS Computer Matching Program Match No. 2013-01; HHS Computer Matching...

    Science.gov (United States)

    2013-12-05

    ... 1974: CMS Computer Matching Program Match No. 2013-01; HHS Computer Matching Program Match No. 1312 AGENCY: Centers for Medicare & Medicaid Services (CMS), Department of Health and Human Services (HHS... Privacy Act of 1974 (5 U.S.C. 552a), as amended, this notice announces the renewal of a CMP that CMS plans...

  4. Comparison of shade matching by visual observation and an intraoral dental colorimeter.

    Science.gov (United States)

    Li, Q; Wang, Y N

    2007-11-01

    The purpose of this study was to compare the applicability of two shade-matching approaches: Vintage Halo shade guide (visual method) and Shofu ShadeEye NCC colorimeter (instrumental method). Twenty participants' maxillary left central incisors were evaluated. Corresponding metal ceramic crowns were fabricated with each shade-matching approach. The colour distributions (L*, a* and b*) of the middle third region of each tooth and corresponding metal ceramic crowns were spectrophotometrically assessed. The colour difference (DeltaE) and colour distributions (DeltaL*, Deltaa* and Deltab*) between the tooth and the corresponding crowns were calculated. We found that the colour differences of both groups fell within the clinical unacceptable range (DeltaE > 2.75). Regarding DeltaE and the three colour distributions, no significant difference was found, expect for a* (P colorimeter nor the visual approach. However, the colorimeter can achieve better results within easy matching cases.

  5. Real-time UAV trajectory generation using feature points matching between video image sequences

    Science.gov (United States)

    Byun, Younggi; Song, Jeongheon; Han, Dongyeob

    2017-09-01

    Unmanned aerial vehicles (UAVs), equipped with navigation systems and video capability, are currently being deployed for intelligence, reconnaissance and surveillance mission. In this paper, we present a systematic approach for the generation of UAV trajectory using a video image matching system based on SURF (Speeded up Robust Feature) and Preemptive RANSAC (Random Sample Consensus). Video image matching to find matching points is one of the most important steps for the accurate generation of UAV trajectory (sequence of poses in 3D space). We used the SURF algorithm to find the matching points between video image sequences, and removed mismatching by using the Preemptive RANSAC which divides all matching points to outliers and inliers. The inliers are only used to determine the epipolar geometry for estimating the relative pose (rotation and translation) between image sequences. Experimental results from simulated video image sequences showed that our approach has a good potential to be applied to the automatic geo-localization of the UAVs system

  6. Comparison of phasing strategies for whole human genomes.

    Science.gov (United States)

    Choi, Yongwook; Chan, Agnes P; Kirkness, Ewen; Telenti, Amalio; Schork, Nicholas J

    2018-04-01

    Humans are a diploid species that inherit one set of chromosomes paternally and one homologous set of chromosomes maternally. Unfortunately, most human sequencing initiatives ignore this fact in that they do not directly delineate the nucleotide content of the maternal and paternal copies of the 23 chromosomes individuals possess (i.e., they do not 'phase' the genome) often because of the costs and complexities of doing so. We compared 11 different widely-used approaches to phasing human genomes using the publicly available 'Genome-In-A-Bottle' (GIAB) phased version of the NA12878 genome as a gold standard. The phasing strategies we compared included laboratory-based assays that prepare DNA in unique ways to facilitate phasing as well as purely computational approaches that seek to reconstruct phase information from general sequencing reads and constructs or population-level haplotype frequency information obtained through a reference panel of haplotypes. To assess the performance of the 11 approaches, we used metrics that included, among others, switch error rates, haplotype block lengths, the proportion of fully phase-resolved genes, phasing accuracy and yield between pairs of SNVs. Our comparisons suggest that a hybrid or combined approach that leverages: 1. population-based phasing using the SHAPEIT software suite, 2. either genome-wide sequencing read data or parental genotypes, and 3. a large reference panel of variant and haplotype frequencies, provides a fast and efficient way to produce highly accurate phase-resolved individual human genomes. We found that for population-based approaches, phasing performance is enhanced with the addition of genome-wide read data; e.g., whole genome shotgun and/or RNA sequencing reads. Further, we found that the inclusion of parental genotype data within a population-based phasing strategy can provide as much as a ten-fold reduction in phasing errors. We also considered a majority voting scheme for the construction of a

  7. Two low coverage bird genomes and a comparison of reference-guided versus de novo genome assemblies.

    Science.gov (United States)

    Card, Daren C; Schield, Drew R; Reyes-Velasco, Jacobo; Fujita, Matthew K; Andrew, Audra L; Oyler-McCance, Sara J; Fike, Jennifer A; Tomback, Diana F; Ruggiero, Robert P; Castoe, Todd A

    2014-01-01

    As a greater number and diversity of high-quality vertebrate reference genomes become available, it is increasingly feasible to use these references to guide new draft assemblies for related species. Reference-guided assembly approaches may substantially increase the contiguity and completeness of a new genome using only low levels of genome coverage that might otherwise be insufficient for de novo genome assembly. We used low-coverage (∼3.5-5.5x) Illumina paired-end sequencing to assemble draft genomes of two bird species (the Gunnison Sage-Grouse, Centrocercus minimus, and the Clark's Nutcracker, Nucifraga columbiana). We used these data to estimate de novo genome assemblies and reference-guided assemblies, and compared the information content and completeness of these assemblies by comparing CEGMA gene set representation, repeat element content, simple sequence repeat content, and GC isochore structure among assemblies. Our results demonstrate that even lower-coverage genome sequencing projects are capable of producing informative and useful genomic resources, particularly through the use of reference-guided assemblies.

  8. Two low coverage bird genomes and a comparison of reference-guided versus de novo genome assemblies

    Science.gov (United States)

    Card, Daren C.; Schield, Drew R.; Reyes-Velasco, Jacobo; Fujita, Matthre K.; Andrew, Audra L.; Oyler-McCance, Sara J.; Fike, Jennifer A.; Tomback, Diana F.; Ruggiero, Robert P.; Castoe, Todd A.

    2014-01-01

    As a greater number and diversity of high-quality vertebrate reference genomes become available, it is increasingly feasible to use these references to guide new draft assemblies for related species. Reference-guided assembly approaches may substantially increase the contiguity and completeness of a new genome using only low levels of genome coverage that might otherwise be insufficient for de novo genome assembly. We used low-coverage (~3.5–5.5x) Illumina paired-end sequencing to assemble draft genomes of two bird species (the Gunnison Sage-Grouse, Centrocercus minimus, and the Clark's Nutcracker, Nucifraga columbiana). We used these data to estimate de novo genome assemblies and reference-guided assemblies, and compared the information content and completeness of these assemblies by comparing CEGMA gene set representation, repeat element content, simple sequence repeat content, and GC isochore structure among assemblies. Our results demonstrate that even lower-coverage genome sequencing projects are capable of producing informative and useful genomic resources, particularly through the use of reference-guided assemblies.

  9. Matching Expectations for Successful University Student Volunteering

    Science.gov (United States)

    Paull, Megan; Omari, Maryam; MacCallum, Judith; Young, Susan; Walker, Gabrielle; Holmes, Kirsten; Haski-Leventha, Debbie; Scott, Rowena

    2017-01-01

    Purpose: The purpose of this paper is to demonstrate the importance of expectation formation and matching for university student volunteers and their hosts. Design/methodology/approach: This research involved a multi-stage data collection process including interviews with student volunteers, and university and host representatives from six…

  10. Entropy-Weighted Instance Matching Between Different Sourcing Points of Interest

    Directory of Open Access Journals (Sweden)

    Lin Li

    2016-01-01

    Full Text Available The crucial problem for integrating geospatial data is finding the corresponding objects (the counterpart from different sources. Most current studies focus on object matching with individual attributes such as spatial, name, or other attributes, which avoids the difficulty of integrating those attributes, but at the cost of an ineffective matching. In this study, we propose an approach for matching instances by integrating heterogeneous attributes with the allocation of suitable attribute weights via information entropy. First, a normalized similarity formula is developed, which can simplify the calculation of spatial attribute similarity. Second, sound-based and word segmentation-based methods are adopted to eliminate the semantic ambiguity when there is a lack of a normative coding standard in geospatial data to express the name attribute. Third, category mapping is established to address the heterogeneity among different classifications. Finally, to address the non-linear characteristic of attribute similarity, the weights of the attributes are calculated by the entropy of the attributes. Experiments demonstrate that the Entropy-Weighted Approach (EWA has good performance both in terms of precision and recall for instance matching from different data sets.

  11. Multi-patch matching for person re-identification

    Science.gov (United States)

    Labidi, Hocine; Luo, Sen-Lin; Boubekeur, Mohamed B.; Benlefki, Tarek

    2015-08-01

    Recognizing a target object across non-overlapping distributed cameras is known in the computer vision community as the problem of person re-identification. In this paper, a multi-patch matching method for person reidentification is presented. Starting from the assumption that: the appearance (clothes) of a person does not change during the time of passing in different cameras field of view , which means the regions with the same color in target image will be identical while crossing cameras. First, we extract distinctive features in the training procedure, where each image target is devised into small patches, the SIFT features and LAB color histograms are computed for each patch. Then we use the KNN approach to detect group of patches with high similarity in the target image and then we use a bi-directional weighted group matching mechanism for the re-identification. Experiments on a challenging VIPeR dataset show that the performances of the proposed method outperform several baselines and state of the art approaches.

  12. Source-to-accelerator quadrupole matching section for a compact linear accelerator

    Science.gov (United States)

    Seidl, P. A.; Persaud, A.; Ghiorso, W.; Ji, Q.; Waldron, W. L.; Lal, A.; Vinayakumar, K. B.; Schenkel, T.

    2018-05-01

    Recently, we presented a new approach for a compact radio-frequency (RF) accelerator structure and demonstrated the functionality of the individual components: acceleration units and focusing elements. In this paper, we combine these units to form a working accelerator structure: a matching section between the ion source extraction grids and the RF-acceleration unit and electrostatic focusing quadrupoles between successive acceleration units. The matching section consists of six electrostatic quadrupoles (ESQs) fabricated using 3D-printing techniques. The matching section enables us to capture more beam current and to match the beam envelope to conditions for stable transport in an acceleration lattice. We present data from an integrated accelerator consisting of the source, matching section, and an ESQ doublet sandwiched between two RF-acceleration units.

  13. Screening synteny blocks in pairwise genome comparisons through integer programming.

    Science.gov (United States)

    Tang, Haibao; Lyons, Eric; Pedersen, Brent; Schnable, James C; Paterson, Andrew H; Freeling, Michael

    2011-04-18

    It is difficult to accurately interpret chromosomal correspondences such as true orthology and paralogy due to significant divergence of genomes from a common ancestor. Analyses are particularly problematic among lineages that have repeatedly experienced whole genome duplication (WGD) events. To compare multiple "subgenomes" derived from genome duplications, we need to relax the traditional requirements of "one-to-one" syntenic matchings of genomic regions in order to reflect "one-to-many" or more generally "many-to-many" matchings. However this relaxation may result in the identification of synteny blocks that are derived from ancient shared WGDs that are not of interest. For many downstream analyses, we need to eliminate weak, low scoring alignments from pairwise genome comparisons. Our goal is to objectively select subset of synteny blocks whose total scores are maximized while respecting the duplication history of the genomes in comparison. We call this "quota-based" screening of synteny blocks in order to appropriately fill a quota of syntenic relationships within one genome or between two genomes having WGD events. We have formulated the synteny block screening as an optimization problem known as "Binary Integer Programming" (BIP), which is solved using existing linear programming solvers. The computer program QUOTA-ALIGN performs this task by creating a clear objective function that maximizes the compatible set of synteny blocks under given constraints on overlaps and depths (corresponding to the duplication history in respective genomes). Such a procedure is useful for any pairwise synteny alignments, but is most useful in lineages affected by multiple WGDs, like plants or fish lineages. For example, there should be a 1:2 ploidy relationship between genome A and B if genome B had an independent WGD subsequent to the divergence of the two genomes. We show through simulations and real examples using plant genomes in the rosid superorder that the quota

  14. Mining a database of single amplified genomes from Red Sea brine pool extremophiles – Improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA

    Directory of Open Access Journals (Sweden)

    Stefan Wolfgang Grötzinger

    2014-04-01

    Full Text Available Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs and poor homology of novel extremophile’s genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the INDIGO data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes may translate into false positives when searching for specific functions. The Profile & Pattern Matching (PPM strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO-terms (which represent enzyme function profiles and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern. The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2,577 E.C. numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from 6 different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter and PROSITE IDs (pattern filter. Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns are present. Scripts for annotation, as well as for the PPM algorithm, are available through the INDIGO website.

  15. Sequence Matching Analysis for Curriculum Development

    Directory of Open Access Journals (Sweden)

    Liem Yenny Bendatu

    2015-06-01

    Full Text Available Many organizations apply information technologies to support their business processes. Using the information technologies, the actual events are recorded and utilized to conform with predefined model. Conformance checking is an approach to measure the fitness and appropriateness between process model and actual events. However, when there are multiple events with the same timestamp, the traditional approach unfit to result such measures. This study attempts to develop a sequence matching analysis. Considering conformance checking as the basis of this approach, this proposed approach utilizes the current control flow technique in process mining domain. A case study in the field of educational process has been conducted. This study also proposes a curriculum analysis framework to test the proposed approach. By considering the learning sequence of students, it results some measurements for curriculum development. Finally, the result of the proposed approach has been verified by relevant instructors for further development.

  16. Integrated Approaches for Genome-wide Interrogation of the Druggable Non-olfactory G Protein-coupled Receptor Superfamily.

    Science.gov (United States)

    Roth, Bryan L; Kroeze, Wesley K

    2015-08-07

    G-protein-coupled receptors (GPCRs) are frequent and fruitful targets for drug discovery and development, as well as being off-targets for the side effects of a variety of medications. Much of the druggable non-olfactory human GPCR-ome remains under-interrogated, and we present here various approaches that we and others have used to shine light into these previously dark corners of the human genome. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.

  17. Myocardium tracking via matching distributions.

    Science.gov (United States)

    Ben Ayed, Ismail; Li, Shuo; Ross, Ian; Islam, Ali

    2009-01-01

    The goal of this study is to investigate automatic myocardium tracking in cardiac Magnetic Resonance (MR) sequences using global distribution matching via level-set curve evolution. Rather than relying on the pixelwise information as in existing approaches, distribution matching compares intensity distributions, and consequently, is well-suited to the myocardium tracking problem. Starting from a manual segmentation of the first frame, two curves are evolved in order to recover the endocardium (inner myocardium boundary) and the epicardium (outer myocardium boundary) in all the frames. For each curve, the evolution equation is sought following the maximization of a functional containing two terms: (1) a distribution matching term measuring the similarity between the non-parametric intensity distributions sampled from inside and outside the curve to the model distributions of the corresponding regions estimated from the previous frame; (2) a gradient term for smoothing the curve and biasing it toward high gradient of intensity. The Bhattacharyya coefficient is used as a similarity measure between distributions. The functional maximization is obtained by the Euler-Lagrange ascent equation of curve evolution, and efficiently implemented via level-set. The performance of the proposed distribution matching was quantitatively evaluated by comparisons with independent manual segmentations approved by an experienced cardiologist. The method was applied to ten 2D mid-cavity MR sequences corresponding to ten different subjects. Although neither shape prior knowledge nor curve coupling were used, quantitative evaluation demonstrated that the results were consistent with manual segmentations. The proposed method compares well with existing methods. The algorithm also yields a satisfying reproducibility. Distribution matching leads to a myocardium tracking which is more flexible and applicable than existing methods because the algorithm uses only the current data, i.e., does not

  18. Signature detection and matching for document image retrieval.

    Science.gov (United States)

    Zhu, Guangyu; Zheng, Yefeng; Doermann, David; Jaeger, Stefan

    2009-11-01

    As one of the most pervasive methods of individual identification and document authentication, signatures present convincing evidence and provide an important form of indexing for effective document image processing and retrieval in a broad range of applications. However, detection and segmentation of free-form objects such as signatures from clustered background is currently an open document analysis problem. In this paper, we focus on two fundamental problems in signature-based document image retrieval. First, we propose a novel multiscale approach to jointly detecting and segmenting signatures from document images. Rather than focusing on local features that typically have large variations, our approach captures the structural saliency using a signature production model and computes the dynamic curvature of 2D contour fragments over multiple scales. This detection framework is general and computationally tractable. Second, we treat the problem of signature retrieval in the unconstrained setting of translation, scale, and rotation invariant nonrigid shape matching. We propose two novel measures of shape dissimilarity based on anisotropic scaling and registration residual error and present a supervised learning framework for combining complementary shape information from different dissimilarity metrics using LDA. We quantitatively study state-of-the-art shape representations, shape matching algorithms, measures of dissimilarity, and the use of multiple instances as query in document image retrieval. We further demonstrate our matching techniques in offline signature verification. Extensive experiments using large real-world collections of English and Arabic machine-printed and handwritten documents demonstrate the excellent performance of our approaches.

  19. Assembly of highly repetitive genomes using short reads: the genome of discrete typing unit III Trypanosoma cruzi strain 231.

    Science.gov (United States)

    Baptista, Rodrigo P; Reis-Cunha, Joao Luis; DeBarry, Jeremy D; Chiari, Egler; Kissinger, Jessica C; Bartholomeu, Daniella C; Macedo, Andrea M

    2018-02-14

    Next-generation sequencing (NGS) methods are low-cost high-throughput technologies that produce thousands to millions of sequence reads. Despite the high number of raw sequence reads, their short length, relative to Sanger, PacBio or Nanopore reads, complicates the assembly of genomic repeats. Many genome tools are available, but the assembly of highly repetitive genome sequences using only NGS short reads remains challenging. Genome assembly of organisms responsible for important neglected diseases such as Trypanosoma cruzi, the aetiological agent of Chagas disease, is known to be challenging because of their repetitive nature. Only three of six recognized discrete typing units (DTUs) of the parasite have their draft genomes published and therefore genome evolution analyses in the taxon are limited. In this study, we developed a computational workflow to assemble highly repetitive genomes via a combination of de novo and reference-based assembly strategies to better overcome the intrinsic limitations of each, based on Illumina reads. The highly repetitive genome of the human-infecting parasite T. cruzi 231 strain was used as a test subject. The combined-assembly approach shown in this study benefits from the reference-based assembly ability to resolve highly repetitive sequences and from the de novo capacity to assemble genome-specific regions, improving the quality of the assembly. The acceptable confidence obtained by analyzing our results showed that our combined approach is an attractive option to assemble highly repetitive genomes with NGS short reads. Phylogenomic analysis including the 231 strain, the first representative of DTU III whose genome was sequenced, was also performed and provides new insights into T. cruzi genome evolution.

  20. Multi Data Reservoir History Matching using the Ensemble Kalman Filter

    KAUST Repository

    Katterbauer, Klemens

    2015-05-01

    Reservoir history matching is becoming increasingly important with the growing demand for higher quality formation characterization and forecasting and the increased complexity and expenses for modern hydrocarbon exploration projects. History matching has long been dominated by adjusting reservoir parameters based solely on well data whose spatial sparse sampling has been a challenge for characterizing the flow properties in areas away from the wells. Geophysical data are widely collected nowadays for reservoir monitoring purposes, but has not yet been fully integrated into history matching and forecasting fluid flow. In this thesis, I present a pioneering approach towards incorporating different time-lapse geophysical data together for enhancing reservoir history matching and uncertainty quantification. The thesis provides several approaches to efficiently integrate multiple geophysical data, analyze the sensitivity of the history matches to observation noise, and examine the framework’s performance in several settings, such as the Norne field in Norway. The results demonstrate the significant improvements in reservoir forecasting and characterization and the synergy effects encountered between the different geophysical data. In particular, the joint use of electromagnetic and seismic data improves the accuracy of forecasting fluid properties, and the usage of electromagnetic data has led to considerably better estimates of hydrocarbon fluid components. For volatile oil and gas reservoirs the joint integration of gravimetric and InSAR data has shown to be beneficial in detecting the influx of water and thereby improving the recovery rate. Summarizing, this thesis makes an important contribution towards integrated reservoir management and multiphysics integration for reservoir history matching.

  1. Insights from Human/Mouse genome comparisons

    Energy Technology Data Exchange (ETDEWEB)

    Pennacchio, Len A.

    2003-03-30

    Large-scale public genomic sequencing efforts have provided a wealth of vertebrate sequence data poised to provide insights into mammalian biology. These include deep genomic sequence coverage of human, mouse, rat, zebrafish, and two pufferfish (Fugu rubripes and Tetraodon nigroviridis) (Aparicio et al. 2002; Lander et al. 2001; Venter et al. 2001; Waterston et al. 2002). In addition, a high-priority has been placed on determining the genomic sequence of chimpanzee, dog, cow, frog, and chicken (Boguski 2002). While only recently available, whole genome sequence data have provided the unique opportunity to globally compare complete genome contents. Furthermore, the shared evolutionary ancestry of vertebrate species has allowed the development of comparative genomic approaches to identify ancient conserved sequences with functionality. Accordingly, this review focuses on the initial comparison of available mammalian genomes and describes various insights derived from such analysis.

  2. One-loop effective lagrangians after matching

    Energy Technology Data Exchange (ETDEWEB)

    Aguila, F. del; Santiago, J. [Universidad de Granada, Departamento de Fisica Teorica y del Cosmos and CAFPE, Granada (Spain); Kunszt, Z. [ETH Zuerich, Institute for Theoretical Physics, Zuerich (Switzerland)

    2016-05-15

    We discuss the limitations of the covariant derivative expansion prescription advocated to compute the one-loop Standard Model (SM) effective lagrangian when the heavy fields couple linearly to the SM. In particular, one-loop contributions resulting from the exchange of both heavy and light fields must be explicitly taken into account through matching because the proposed functional approach alone does not account for them. We review a simple case with a heavy scalar singlet of charge -1 to illustrate the argument. As two other examples where this matching is needed and this functional method gives a vanishing result, up to renormalization of the heavy sector parameters, we re-evaluate the one-loop corrections to the T-parameter due to a heavy scalar triplet with vanishing hypercharge coupling to the Brout-Englert-Higgs boson and to a heavy vector-like quark singlet of charged 2/3 mixing with the top quark, respectively. In all cases we make use of a new code for matching fundamental and effective theories in models with arbitrary heavy field additions. (orig.)

  3. Improving the Quality of the Supply-Demand-Match in Vocational Education and Training by Anticipation and "Matching Policy"

    Science.gov (United States)

    Lassnigg, Lorenz

    2008-01-01

    This article discusses the implications of a framework to improve matching supply and demand in VET by a policy to improve quality by using anticipation and foresight approaches. Analysis of the Austrian anticipation system identified some basic aspects such as policy. The analysis focused on two issues: the observation and measurement of…

  4. GPU Based N-Gram String Matching Algorithm with Score Table Approach for String Searching in Many Documents

    Science.gov (United States)

    Srinivasa, K. G.; Shree Devi, B. N.

    2017-10-01

    String searching in documents has become a tedious task with the evolution of Big Data. Generation of large data sets demand for a high performance search algorithm in areas such as text mining, information retrieval and many others. The popularity of GPU's for general purpose computing has been increasing for various applications. Therefore it is of great interest to exploit the thread feature of a GPU to provide a high performance search algorithm. This paper proposes an optimized new approach to N-gram model for string search in a number of lengthy documents and its GPU implementation. The algorithm exploits GPGPUs for searching strings in many documents employing character level N-gram matching with parallel Score Table approach and search using CUDA API. The new approach of Score table used for frequency storage of N-grams in a document, makes the search independent of the document's length and allows faster access to the frequency values, thus decreasing the search complexity. The extensive thread feature in a GPU has been exploited to enable parallel pre-processing of trigrams in a document for Score Table creation and parallel search in huge number of documents, thus speeding up the whole search process even for a large pattern size. Experiments were carried out for many documents of varied length and search strings from the standard Lorem Ipsum text on NVIDIA's GeForce GT 540M GPU with 96 cores. Results prove that the parallel approach for Score Table creation and searching gives a good speed up than the same approach executed serially.

  5. Short and long-term genome stability analysis of prokaryotic genomes.

    Science.gov (United States)

    Brilli, Matteo; Liò, Pietro; Lacroix, Vincent; Sagot, Marie-France

    2013-05-08

    Gene organization dynamics is actively studied because it provides useful evolutionary information, makes functional annotation easier and often enables to characterize pathogens. There is therefore a strong interest in understanding the variability of this trait and the possible correlations with life-style. Two kinds of events affect genome organization: on one hand translocations and recombinations change the relative position of genes shared by two genomes (i.e. the backbone gene order); on the other, insertions and deletions leave the backbone gene order unchanged but they alter the gene neighborhoods by breaking the syntenic regions. A complete picture about genome organization evolution therefore requires to account for both kinds of events. We developed an approach where we model chromosomes as graphs on which we compute different stability estimators; we consider genome rearrangements as well as the effect of gene insertions and deletions. In a first part of the paper, we fit a measure of backbone gene order conservation (hereinafter called backbone stability) against phylogenetic distance for over 3000 genome comparisons, improving existing models for the divergence in time of backbone stability. Intra- and inter-specific comparisons were treated separately to focus on different time-scales. The use of multiple genomes of a same species allowed to identify genomes with diverging gene order with respect to their conspecific. The inter-species analysis indicates that pathogens are more often unstable with respect to non-pathogens. In a second part of the text, we show that in pathogens, gene content dynamics (insertions and deletions) have a much more dramatic effect on genome organization stability than backbone rearrangements. In this work, we studied genome organization divergence taking into account the contribution of both genome order rearrangements and genome content dynamics. By studying species with multiple sequenced genomes available, we were

  6. Sparse reconstruction using distribution agnostic bayesian matching pursuit

    KAUST Repository

    Masood, Mudassir; Al-Naffouri, Tareq Y.

    2013-01-01

    A fast matching pursuit method using a Bayesian approach is introduced for sparse signal recovery. This method performs Bayesian estimates of sparse signals even when the signal prior is non-Gaussian or unknown. It is agnostic on signal statistics

  7. Comparative genomic analysis by microbial COGs self-attraction rate.

    Science.gov (United States)

    Santoni, Daniele; Romano-Spica, Vincenzo

    2009-06-21

    Whole genome analysis provides new perspectives to determine phylogenetic relationships among microorganisms. The availability of whole nucleotide sequences allows different levels of comparison among genomes by several approaches. In this work, self-attraction rates were considered for each cluster of orthologous groups of proteins (COGs) class in order to analyse gene aggregation levels in physical maps. Phylogenetic relationships among microorganisms were obtained by comparing self-attraction coefficients. Eighteen-dimensional vectors were computed for a set of 168 completely sequenced microbial genomes (19 archea, 149 bacteria). The components of the vector represent the aggregation rate of the genes belonging to each of 18 COGs classes. Genes involved in nonessential functions or related to environmental conditions showed the highest aggregation rates. On the contrary genes involved in basic cellular tasks showed a more uniform distribution along the genome, except for translation genes. Self-attraction clustering approach allowed classification of Proteobacteria, Bacilli and other species belonging to Firmicutes. Rearrangement and Lateral Gene Transfer events may influence divergences from classical taxonomy. Each set of COG classes' aggregation values represents an intrinsic property of the microbial genome. This novel approach provides a new point of view for whole genome analysis and bacterial characterization.

  8. Match Analysis in Volleyball: a systematic review

    Directory of Open Access Journals (Sweden)

    Miguel Silva

    2016-03-01

    Full Text Available The present article aims to review the available literature on match analysis in adult male Volleyball. Specific key words "performance analysis", "match analysis", "game analysis", "notational analysis", "tactical analysis", "technical analysis", "outcome" and "skills" were used to search relevant databases (PubMed, Web of Science, SportDiscus, Academic Search Complete and the International Journal of Performance Analysis in Sport. The research was conducted according to PRISMA (Preferred Reporting Items for Systematic reviews and Meta analyses guidelines. Of 3407 studies initially identified, only 34 were fully reviewed, and their outcome measures extracted and analyzed. Studies that fit all inclusion criteria were organized into two levels of analysis, according to their research design (comparative or predictive and depending on the type of variables analyzed (skills and their relationship with success, play position and match phase. Results show that from a methodological point of view, comparative studies where currently complemented with some predictive studies. This predictive approach emerged with the aim to identify the relationship between variables, considering their possible interactions and consequently its effect on team performance, contributing to a better understanding of Volleyball game performance through match analysis. Taking into account the limitations of the reviewed studies, future research should provide comprehensive operational definitions for the studied variables, using more recent samples, and consider integrating the player positions and match phase contexts into the analysis of Volleyball.

  9. The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes.

    Science.gov (United States)

    Treangen, Todd J; Ondov, Brian D; Koren, Sergey; Phillippy, Adam M

    2014-01-01

    Whole-genome sequences are now available for many microbial species and clades, however existing whole-genome alignment methods are limited in their ability to perform sequence comparisons of multiple sequences simultaneously. Here we present the Harvest suite of core-genome alignment and visualization tools for the rapid and simultaneous analysis of thousands of intraspecific microbial strains. Harvest includes Parsnp, a fast core-genome multi-aligner, and Gingr, a dynamic visual platform. Together they provide interactive core-genome alignments, variant calls, recombination detection, and phylogenetic trees. Using simulated and real data we demonstrate that our approach exhibits unrivaled speed while maintaining the accuracy of existing methods. The Harvest suite is open-source and freely available from: http://github.com/marbl/harvest.

  10. Initiating genomic selection in tetraploid potato

    DEFF Research Database (Denmark)

    Sverrisdóttir, Elsa; Janss, Luc; Byrne, Stephen

    Breeding for more space and resource efficient crops is important to feed the world’s increasing population. Potatoes produce approximately twice the amount of calories per hectare compared to cereals. The traditional “mate and phenotype” breeding approach is costly and time-consuming; however......, the completion of the genome sequence of potato has enabled the application of genomics-assisted breeding technologies. Genomic selection using genome-wide molecular markers is becoming increasingly applicable to crops as the genotyping costs continue to reduce and it is thus an attractive breeding alternative...... selection, can be obtained with good prediction accuracies in tetraploid potato....

  11. Genomics and the challenging translation into conservation practice

    Science.gov (United States)

    Aaron B. A. Shafer; Jochen B. W. Wolf; Paulo C. Alves; Linnea Bergstrom; Michael W. Bruford; Ioana Brannstrom; Guy Colling; Love Dalen; Luc De Meester; Robert Ekblom; Katie D. Fawcett; Simone Fior; Mehrdad Hajibabaei; Jason A. Hill; A. Rus Hoezel; Jacob Hoglund; Evelyn L. Jensen; Johannes Krause; Torsten N. Kristensen; Michael Krutzen; John K. McKay; Anita J. Norman; Rob Ogden; E. Martin Osterling; N. Joop Ouborg; John Piccolo; Danijela Popovic; Craig R. Primmer; Floyd A. Reed; Marie Roumet; Jordi Salmona; Tamara Schenekar; Michael K. Schwartz; Gernot Segelbacher; Helen Senn; Jens Thaulow; Mia Valtonen; Andrew Veale; Philippine Vergeer; Nagarjun Vijay; Carles Vila; Matthias Weissensteiner; Lovisa Wennerstrom; Christopher W. Wheat; Piotr Zielinski

    2015-01-01

    The global loss of biodiversity continues at an alarming rate. Genomic approaches have been suggested as a promising tool for conservation practice as scaling up to genome-wide data can improve traditional conservation genetic inferences and provide qualitatively novel insights. However, the generation of genomic data and subsequent analyses and interpretations remain...

  12. Comparative genomics and association mapping approaches for blast resistant genes in finger millet using SSRs.

    Directory of Open Access Journals (Sweden)

    B Kalyana Babu

    Full Text Available The major limiting factor for production and productivity of finger millet crop is blast disease caused by Magnaporthe grisea. Since, the genome sequence information available in finger millet crop is scarce, comparative genomics plays a very important role in identification of genes/QTLs linked to the blast resistance genes using SSR markers. In the present study, a total of 58 genic SSRs were developed for use in genetic analysis of a global collection of 190 finger millet genotypes. The 58 SSRs yielded ninety five scorable alleles and the polymorphism information content varied from 0.186 to 0.677 at an average of 0.385. The gene diversity was in the range of 0.208 to 0.726 with an average of 0.487. Association mapping for blast resistance was done using 104 SSR markers which identified four QTLs for finger blast and one QTL for neck blast resistance. The genomic marker RM262 and genic marker FMBLEST32 were linked to finger blast disease at a P value of 0.007 and explained phenotypic variance (R² of 10% and 8% respectively. The genomic marker UGEP81 was associated to finger blast at a P value of 0.009 and explained 7.5% of R². The QTLs for neck blast was associated with the genomic SSR marker UGEP18 at a P value of 0.01, which explained 11% of R². Three QTLs for blast resistance were found common by using both GLM and MLM approaches. The resistant alleles were found to be present mostly in the exotic genotypes. Among the genotypes of NW Himalayan region of India, VHC3997, VHC3996 and VHC3930 were found highly resistant, which may be effectively used as parents for developing blast resistant cultivars in the NW Himalayan region of India. The markers linked to the QTLs for blast resistance in the present study can be further used for cloning of the full length gene, fine mapping and their further use in the marker assisted breeding programmes for introgression of blast resistant alleles into locally adapted cultivars.

  13. QTL-seq approach identified genomic regions and diagnostic markers for rust and late leaf spot resistance in groundnut (Arachis hypogaea L.).

    Science.gov (United States)

    Pandey, Manish K; Khan, Aamir W; Singh, Vikas K; Vishwakarma, Manish K; Shasidhar, Yaduru; Kumar, Vinay; Garg, Vanika; Bhat, Ramesh S; Chitikineni, Annapurna; Janila, Pasupuleti; Guo, Baozhu; Varshney, Rajeev K

    2017-08-01

    Rust and late leaf spot (LLS) are the two major foliar fungal diseases in groundnut, and their co-occurrence leads to significant yield loss in addition to the deterioration of fodder quality. To identify candidate genomic regions controlling resistance to rust and LLS, whole-genome resequencing (WGRS)-based approach referred as 'QTL-seq' was deployed. A total of 231.67 Gb raw and 192.10 Gb of clean sequence data were generated through WGRS of resistant parent and the resistant and susceptible bulks for rust and LLS. Sequence analysis of bulks for rust and LLS with reference-guided resistant parent assembly identified 3136 single-nucleotide polymorphisms (SNPs) for rust and 66 SNPs for LLS with the read depth of ≥7 in the identified genomic region on pseudomolecule A03. Detailed analysis identified 30 nonsynonymous SNPs affecting 25 candidate genes for rust resistance, while 14 intronic and three synonymous SNPs affecting nine candidate genes for LLS resistance. Subsequently, allele-specific diagnostic markers were identified for three SNPs for rust resistance and one SNP for LLS resistance. Genotyping of one RIL population (TAG 24 × GPBD 4) with these four diagnostic markers revealed higher phenotypic variation for these two diseases. These results suggest usefulness of QTL-seq approach in precise and rapid identification of candidate genomic regions and development of diagnostic markers for breeding applications. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  14. Sunflower Hybrid Breeding: From Markers to Genomic Selection.

    Science.gov (United States)

    Dimitrijevic, Aleksandra; Horn, Renate

    2017-01-01

    In sunflower, molecular markers for simple traits as, e.g., fertility restoration, high oleic acid content, herbicide tolerance or resistances to Plasmopara halstedii, Puccinia helianthi , or Orobanche cumana have been successfully used in marker-assisted breeding programs for years. However, agronomically important complex quantitative traits like yield, heterosis, drought tolerance, oil content or selection for disease resistance, e.g., against Sclerotinia sclerotiorum have been challenging and will require genome-wide approaches. Plant genetic resources for sunflower are being collected and conserved worldwide that represent valuable resources to study complex traits. Sunflower association panels provide the basis for genome-wide association studies, overcoming disadvantages of biparental populations. Advances in technologies and the availability of the sunflower genome sequence made novel approaches on the whole genome level possible. Genotype-by-sequencing, and whole genome sequencing based on next generation sequencing technologies facilitated the production of large amounts of SNP markers for high density maps as well as SNP arrays and allowed genome-wide association studies and genomic selection in sunflower. Genome wide or candidate gene based association studies have been performed for traits like branching, flowering time, resistance to Sclerotinia head and stalk rot. First steps in genomic selection with regard to hybrid performance and hybrid oil content have shown that genomic selection can successfully address complex quantitative traits in sunflower and will help to speed up sunflower breeding programs in the future. To make sunflower more competitive toward other oil crops higher levels of resistance against pathogens and better yield performance are required. In addition, optimizing plant architecture toward a more complex growth type for higher plant densities has the potential to considerably increase yields per hectare. Integrative approaches

  15. Sunflower Hybrid Breeding: From Markers to Genomic Selection

    Directory of Open Access Journals (Sweden)

    Aleksandra Dimitrijevic

    2018-01-01

    Full Text Available In sunflower, molecular markers for simple traits as, e.g., fertility restoration, high oleic acid content, herbicide tolerance or resistances to Plasmopara halstedii, Puccinia helianthi, or Orobanche cumana have been successfully used in marker-assisted breeding programs for years. However, agronomically important complex quantitative traits like yield, heterosis, drought tolerance, oil content or selection for disease resistance, e.g., against Sclerotinia sclerotiorum have been challenging and will require genome-wide approaches. Plant genetic resources for sunflower are being collected and conserved worldwide that represent valuable resources to study complex traits. Sunflower association panels provide the basis for genome-wide association studies, overcoming disadvantages of biparental populations. Advances in technologies and the availability of the sunflower genome sequence made novel approaches on the whole genome level possible. Genotype-by-sequencing, and whole genome sequencing based on next generation sequencing technologies facilitated the production of large amounts of SNP markers for high density maps as well as SNP arrays and allowed genome-wide association studies and genomic selection in sunflower. Genome wide or candidate gene based association studies have been performed for traits like branching, flowering time, resistance to Sclerotinia head and stalk rot. First steps in genomic selection with regard to hybrid performance and hybrid oil content have shown that genomic selection can successfully address complex quantitative traits in sunflower and will help to speed up sunflower breeding programs in the future. To make sunflower more competitive toward other oil crops higher levels of resistance against pathogens and better yield performance are required. In addition, optimizing plant architecture toward a more complex growth type for higher plant densities has the potential to considerably increase yields per hectare

  16. Sunflower Hybrid Breeding: From Markers to Genomic Selection

    Science.gov (United States)

    Dimitrijevic, Aleksandra; Horn, Renate

    2018-01-01

    In sunflower, molecular markers for simple traits as, e.g., fertility restoration, high oleic acid content, herbicide tolerance or resistances to Plasmopara halstedii, Puccinia helianthi, or Orobanche cumana have been successfully used in marker-assisted breeding programs for years. However, agronomically important complex quantitative traits like yield, heterosis, drought tolerance, oil content or selection for disease resistance, e.g., against Sclerotinia sclerotiorum have been challenging and will require genome-wide approaches. Plant genetic resources for sunflower are being collected and conserved worldwide that represent valuable resources to study complex traits. Sunflower association panels provide the basis for genome-wide association studies, overcoming disadvantages of biparental populations. Advances in technologies and the availability of the sunflower genome sequence made novel approaches on the whole genome level possible. Genotype-by-sequencing, and whole genome sequencing based on next generation sequencing technologies facilitated the production of large amounts of SNP markers for high density maps as well as SNP arrays and allowed genome-wide association studies and genomic selection in sunflower. Genome wide or candidate gene based association studies have been performed for traits like branching, flowering time, resistance to Sclerotinia head and stalk rot. First steps in genomic selection with regard to hybrid performance and hybrid oil content have shown that genomic selection can successfully address complex quantitative traits in sunflower and will help to speed up sunflower breeding programs in the future. To make sunflower more competitive toward other oil crops higher levels of resistance against pathogens and better yield performance are required. In addition, optimizing plant architecture toward a more complex growth type for higher plant densities has the potential to considerably increase yields per hectare. Integrative approaches

  17. Template matching techniques in computer vision theory and practice

    CERN Document Server

    Brunelli, Roberto

    2009-01-01

    The detection and recognition of objects in images is a key research topic in the computer vision community.  Within this area, face recognition and interpretation has attracted increasing attention owing to the possibility of unveiling human perception mechanisms, and for the development of practical biometric systems. This book and the accompanying website, focus on template matching, a subset of object recognition techniques of wide applicability, which has proved to be particularly effective for face recognition applications. Using examples from face processing tasks throughout the book to illustrate more general object recognition approaches, Roberto Brunelli: examines the basics of digital image formation, highlighting points critical to the task of template matching;presents basic and  advanced template matching techniques, targeting grey-level images, shapes and point sets;discusses recent pattern classification paradigms from a template matching perspective;illustrates the development of a real fac...

  18. MULTI-TEMPORAL AND MULTI-SENSOR IMAGE MATCHING BASED ON LOCAL FREQUENCY INFORMATION

    Directory of Open Access Journals (Sweden)

    X. Liu

    2012-08-01

    Full Text Available Image Matching is often one of the first tasks in many Photogrammetry and Remote Sensing applications. This paper presents an efficient approach to automated multi-temporal and multi-sensor image matching based on local frequency information. Two new independent image representations, Local Average Phase (LAP and Local Weighted Amplitude (LWA, are presented to emphasize the common scene information, while suppressing the non-common illumination and sensor-dependent information. In order to get the two representations, local frequency information is firstly obtained from Log-Gabor wavelet transformation, which is similar to that of the human visual system; then the outputs of odd and even symmetric filters are used to construct the LAP and LWA. The LAP and LWA emphasize on the phase and amplitude information respectively. As these two representations are both derivative-free and threshold-free, they are robust to noise and can keep as much of the image details as possible. A new Compositional Similarity Measure (CSM is also presented to combine the LAP and LWA with the same weight for measuring the similarity of multi-temporal and multi-sensor images. The CSM can make the LAP and LWA compensate for each other and can make full use of the amplitude and phase of local frequency information. In many image matching applications, the template is usually selected without consideration of its matching robustness and accuracy. In order to overcome this problem, a local best matching point detection is presented to detect the best matching template. In the detection method, we employ self-similarity analysis to identify the template with the highest matching robustness and accuracy. Experimental results using some real images and simulation images demonstrate that the presented approach is effective for matching image pairs with significant scene and illumination changes and that it has advantages over other state-of-the-art approaches, which include: the

  19. Human genome project: revolutionizing biology through leveraging technology

    Science.gov (United States)

    Dahl, Carol A.; Strausberg, Robert L.

    1996-04-01

    The Human Genome Project (HGP) is an international project to develop genetic, physical, and sequence-based maps of the human genome. Since the inception of the HGP it has been clear that substantially improved technology would be required to meet the scientific goals, particularly in order to acquire the complete sequence of the human genome, and that these technologies coupled with the information forthcoming from the project would have a dramatic effect on the way biomedical research is performed in the future. In this paper, we discuss the state-of-the-art for genomic DNA sequencing, technological challenges that remain, and the potential technological paths that could yield substantially improved genomic sequencing technology. The impact of the technology developed from the HGP is broad-reaching and a discussion of other research and medical applications that are leveraging HGP-derived DNA analysis technologies is included. The multidisciplinary approach to the development of new technologies that has been successful for the HGP provides a paradigm for facilitating new genomic approaches toward understanding the biological role of functional elements and systems within the cell, including those encoded within genomic DNA and their molecular products.

  20. Glycogenomics as a mass spectrometry-guided genome-mining method for microbial glycosylated molecules.

    Science.gov (United States)

    Kersten, Roland D; Ziemert, Nadine; Gonzalez, David J; Duggan, Brendan M; Nizet, Victor; Dorrestein, Pieter C; Moore, Bradley S

    2013-11-19

    Glycosyl groups are an essential mediator of molecular interactions in cells and on cellular surfaces. There are very few methods that directly relate sugar-containing molecules to their biosynthetic machineries. Here, we introduce glycogenomics as an experiment-guided genome-mining approach for fast characterization of glycosylated natural products (GNPs) and their biosynthetic pathways from genome-sequenced microbes by targeting glycosyl groups in microbial metabolomes. Microbial GNPs consist of aglycone and glycosyl structure groups in which the sugar unit(s) are often critical for the GNP's bioactivity, e.g., by promoting binding to a target biomolecule. GNPs are a structurally diverse class of molecules with important pharmaceutical and agrochemical applications. Herein, O- and N-glycosyl groups are characterized in their sugar monomers by tandem mass spectrometry (MS) and matched to corresponding glycosylation genes in secondary metabolic pathways by a MS-glycogenetic code. The associated aglycone biosynthetic genes of the GNP genotype then classify the natural product to further guide structure elucidation. We highlight the glycogenomic strategy by the characterization of several bioactive glycosylated molecules and their gene clusters, including the anticancer agent cinerubin B from Streptomyces sp. SPB74 and an antibiotic, arenimycin B, from Salinispora arenicola CNB-527.

  1. Phylogeny and Taxonomy of Archaea: A Comparison of the Whole-Genome-Based CVTree Approach with 16S rRNA Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Guanghong Zuo

    2015-03-01

    Full Text Available A tripartite comparison of Archaea phylogeny and taxonomy at and above the rank order is reported: (1 the whole-genome-based and alignment-free CVTree using 179 genomes; (2 the 16S rRNA analysis exemplified by the All-Species Living Tree with 366 archaeal sequences; and (3 the Second Edition of Bergey’s Manual of Systematic Bacteriology complemented by some current literature. A high degree of agreement is reached at these ranks. From the newly proposed archaeal phyla, Korarchaeota, Thaumarchaeota, Nanoarchaeota and Aigarchaeota, to the recent suggestion to divide the class Halobacteria into three orders, all gain substantial support from CVTree. In addition, the CVTree helped to determine the taxonomic position of some newly sequenced genomes without proper lineage information. A few discrepancies between the CVTree and the 16S rRNA approaches call for further investigation.

  2. Copy number variation analysis of matched ovarian primary tumors and peritoneal metastasis.

    Directory of Open Access Journals (Sweden)

    Joel A Malek

    Full Text Available Ovarian cancer is the most deadly gynecological cancer. The high rate of mortality is due to the large tumor burden with extensive metastatic lesion of the abdominal cavity. Despite initial chemosensitivity and improved surgical procedures, abdominal recurrence remains an issue and results in patients' poor prognosis. Transcriptomic and genetic studies have revealed significant genome pathologies in the primary tumors and yielded important information regarding carcinogenesis. There are, however, few studies on genetic alterations and their consequences in peritoneal metastatic tumors when compared to their matched ovarian primary tumors. We used high-density SNP arrays to investigate copy number variations in matched primary and metastatic ovarian cancer from 9 patients. Here we show that copy number variations acquired by ovarian tumors are significantly different between matched primary and metastatic tumors and these are likely due to different functional requirements. We show that these copy number variations clearly differentially affect specific pathways including the JAK/STAT and cytokine signaling pathways. While many have shown complex involvement of cytokines in the ovarian cancer environment we provide evidence that ovarian tumors have specific copy number variation differences in many of these genes.

  3. Cyclic Matching Pursuits with Multiscale Time-frequency Dictionaries

    DEFF Research Database (Denmark)

    Sturm, Bob L.; Christensen, Mads Græsbøll

    2010-01-01

    We generalize cyclic matching pursuit (CMP), propose an orthogonal variant, and examine their performance using multiscale time-frequency dictionaries in the sparse approximation of signals. Overall, we find that the cyclic approach of CMP produces signal models that have a much lower approximation...

  4. A Ranking Approach to Genomic Selection.

    Science.gov (United States)

    Blondel, Mathieu; Onogi, Akio; Iwata, Hiroyoshi; Ueda, Naonori

    2015-01-01

    Genomic selection (GS) is a recent selective breeding method which uses predictive models based on whole-genome molecular markers. Until now, existing studies formulated GS as the problem of modeling an individual's breeding value for a particular trait of interest, i.e., as a regression problem. To assess predictive accuracy of the model, the Pearson correlation between observed and predicted trait values was used. In this paper, we propose to formulate GS as the problem of ranking individuals according to their breeding value. Our proposed framework allows us to employ machine learning methods for ranking which had previously not been considered in the GS literature. To assess ranking accuracy of a model, we introduce a new measure originating from the information retrieval literature called normalized discounted cumulative gain (NDCG). NDCG rewards more strongly models which assign a high rank to individuals with high breeding value. Therefore, NDCG reflects a prerequisite objective in selective breeding: accurate selection of individuals with high breeding value. We conducted a comparison of 10 existing regression methods and 3 new ranking methods on 6 datasets, consisting of 4 plant species and 25 traits. Our experimental results suggest that tree-based ensemble methods including McRank, Random Forests and Gradient Boosting Regression Trees achieve excellent ranking accuracy. RKHS regression and RankSVM also achieve good accuracy when used with an RBF kernel. Traditional regression methods such as Bayesian lasso, wBSR and BayesC were found less suitable for ranking. Pearson correlation was found to correlate poorly with NDCG. Our study suggests two important messages. First, ranking methods are a promising research direction in GS. Second, NDCG can be a useful evaluation measure for GS.

  5. Molecular cytogenetic and genomic analyses reveal new insights into the origin of the wheat B genome.

    Science.gov (United States)

    Zhang, Wei; Zhang, Mingyi; Zhu, Xianwen; Cao, Yaping; Sun, Qing; Ma, Guojia; Chao, Shiaoman; Yan, Changhui; Xu, Steven S; Cai, Xiwen

    2018-02-01

    This work pinpointed the goatgrass chromosomal segment in the wheat B genome using modern cytogenetic and genomic technologies, and provided novel insights into the origin of the wheat B genome. Wheat is a typical allopolyploid with three homoeologous subgenomes (A, B, and D). The donors of the subgenomes A and D had been identified, but not for the subgenome B. The goatgrass Aegilops speltoides (genome SS) has been controversially considered a possible candidate for the donor of the wheat B genome. However, the relationship of the Ae. speltoides S genome with the wheat B genome remains largely obscure. The present study assessed the homology of the B and S genomes using an integrative cytogenetic and genomic approach, and revealed the contribution of Ae. speltoides to the origin of the wheat B genome. We discovered noticeable homology between wheat chromosome 1B and Ae. speltoides chromosome 1S, but not between other chromosomes in the B and S genomes. An Ae. speltoides-originated segment spanning a genomic region of approximately 10.46 Mb was detected on the long arm of wheat chromosome 1B (1BL). The Ae. speltoides-originated segment on 1BL was found to co-evolve with the rest of the B genome. Evidently, Ae. speltoides had been involved in the origin of the wheat B genome, but should not be considered an exclusive donor of this genome. The wheat B genome might have a polyphyletic origin with multiple ancestors involved, including Ae. speltoides. These novel findings will facilitate genome studies in wheat and other polyploids.

  6. Detection of alien chromatin introgression from Thinopyrum into wheat using S genomic DNA as a probe--a landmark approach for Thinopyrum genome research.

    Science.gov (United States)

    Chen, Q

    2005-01-01

    The introduction of alien genetic variation from the genus Thinopyrum through chromosome engineering into wheat is a valuable and proven technique for wheat improvement. A number of economically important traits have been transferred into wheat as single genes, chromosome arms or entire chromosomes. Successful transfers can be greatly assisted by the precise identification of alien chromatin in the recipient progenies. Chromosome identification and characterization are useful for genetic manipulation and transfer in wheat breeding following chromosome engineering. Genomic in situ hybridization (GISH) using an S genomic DNA probe from the diploid species Pseudoroegneria has proven to be a powerful diagnostic cytogenetic tool for monitoring the transfer of many promising agronomic traits from Thinopyrum. This specific S genomic probe not only allows the direct determination of the chromosome composition in wheat-Thinopyrum hybrids, but also can separate the Th. intermedium chromosomes into the J, J(S) and S genomes. The J(S) genome, which consists of a modified J genome chromosome distinguished by S genomic sequences of Pseudoroegneria near the centromere and telomere, carries many disease and mite resistance genes. Utilization of this S genomic probe leads to a better understanding of genomic affinities between Thinopyrum and wheat, and provides a molecular cytogenetic marker for monitoring the transfer of alien Thinopyrum agronomic traits into wheat recipient lines. Copyright 2005 S. Karger AG, Basel.

  7. Automated typing of red blood cell and platelet antigens: a whole-genome sequencing study.

    Science.gov (United States)

    Lane, William J; Westhoff, Connie M; Gleadall, Nicholas S; Aguad, Maria; Smeland-Wagman, Robin; Vege, Sunitha; Simmons, Daimon P; Mah, Helen H; Lebo, Matthew S; Walter, Klaudia; Soranzo, Nicole; Di Angelantonio, Emanuele; Danesh, John; Roberts, David J; Watkins, Nick A; Ouwehand, Willem H; Butterworth, Adam S; Kaufman, Richard M; Rehm, Heidi L; Silberstein, Leslie E; Green, Robert C

    2018-06-01

    Seq genomes. Additional modifications led to the final algorithm, which was 99·2% concordant across 200 INTERVAL genomes (or 99·9% after adjustment for the lower depth of coverage). By enabling more precise antigen-matching of patients with blood donors, antigen typing based on whole-genome sequencing provides a novel approach to improve transfusion outcomes with the potential to transform the practice of transfusion medicine. National Human Genome Research Institute, Doris Duke Charitable Foundation, National Health Service Blood and Transplant, National Institute for Health Research, and Wellcome Trust. Copyright © 2018 Elsevier Ltd. All rights reserved.

  8. Understanding the physiology and adaptation of staphylococci: a post-genomic approach.

    Science.gov (United States)

    Becker, Karsten; Bierbaum, Gabriele; von Eiff, Christof; Engelmann, Susanne; Götz, Friedrich; Hacker, Jörg; Hecker, Michael; Peters, Georg; Rosenstein, Ralf; Ziebuhr, Wilma

    2007-11-01

    Staphylococcus aureus as well as coagulase-negative staphylococci are medically highly important pathogens characterized by an increasing resistance rate toward many antibiotics. Although normally being skin and mucosa commensals, some staphylococcal species and strains have the capacity to cause a wide range of infectious diseases. Many of these infections affect immunocompromised patients in hospitals. However, community-acquired staphylococcal infections due to resistant strains are also currently on the rise. In the light of this development, there is an urgent need for novel anti-staphylococcal therapeutic and prevention strategies for which a better understanding of the physiology of these bacteria is an essential prerequisite. Within the past years, staphylococci have been in the focus of genomic research, resulting in the determination and publication of a range of full-genome sequences of different staphylococcal species and strains which provided the basis for the design and application of DNA microarrays and other genomic tools. Here we summarize the results of the project group 'Staphylococci' within the research network 'Pathogenomics' giving new insights into the genome structure, molecular epidemiology, physiology, and genetic adaptation of both S. aureus and coagulase-negative staphylococci.

  9. A generic approach for the design of whole-genome oligoarrays, validated for genomotyping, deletion mapping and gene expression analysis on Staphylococcus aureus

    Directory of Open Access Journals (Sweden)

    Renzoni Adriana

    2005-06-01

    Full Text Available Abstract Background DNA microarray technology is widely used to determine the expression levels of thousands of genes in a single experiment, for a broad range of organisms. Optimal design of immobilized nucleic acids has a direct impact on the reliability of microarray results. However, despite small genome size and complexity, prokaryotic organisms are not frequently studied to validate selected bioinformatics approaches. Relying on parameters shown to affect the hybridization of nucleic acids, we designed freely available software and validated experimentally its performance on the bacterial pathogen Staphylococcus aureus. Results We describe an efficient procedure for selecting 40–60 mer oligonucleotide probes combining optimal thermodynamic properties with high target specificity, suitable for genomic studies of microbial species. The algorithm for filtering probes from extensive oligonucleotides libraries fitting standard thermodynamic criteria includes positional information of predicted target-probe binding regions. This algorithm efficiently selected probes recognizing homologous gene targets across three different sequenced genomes of Staphylococcus aureus. BLAST analysis of the final selection of 5,427 probes yielded >97%, 93%, and 81% of Staphylococcus aureus genome coverage in strains N315, Mu50, and COL, respectively. A manufactured oligoarray including a subset of control Escherichia coli probes was validated for applications in the fields of comparative genomics and molecular epidemiology, mapping of deletion mutations and transcription profiling. Conclusion This generic chip-design process merging sequence information from several related genomes improves genome coverage even in conserved regions.

  10. A bioinformatics approach for identifying transgene insertion sites using whole genome sequencing data.

    Science.gov (United States)

    Park, Doori; Park, Su-Hyun; Ban, Yong Wook; Kim, Youn Shic; Park, Kyoung-Cheul; Kim, Nam-Soo; Kim, Ju-Kon; Choi, Ik-Young

    2017-08-15

    Genetically modified crops (GM crops) have been developed to improve the agricultural traits of modern crop cultivars. Safety assessments of GM crops are of paramount importance in research at developmental stages and before releasing transgenic plants into the marketplace. Sequencing technology is developing rapidly, with higher output and labor efficiencies, and will eventually replace existing methods for the molecular characterization of genetically modified organisms. To detect the transgenic insertion locations in the three GM rice gnomes, Illumina sequencing reads are mapped and classified to the rice genome and plasmid sequence. The both mapped reads are classified to characterize the junction site between plant and transgene sequence by sequence alignment. Herein, we present a next generation sequencing (NGS)-based molecular characterization method, using transgenic rice plants SNU-Bt9-5, SNU-Bt9-30, and SNU-Bt9-109. Specifically, using bioinformatics tools, we detected the precise insertion locations and copy numbers of transfer DNA, genetic rearrangements, and the absence of backbone sequences, which were equivalent to results obtained from Southern blot analyses. NGS methods have been suggested as an effective means of characterizing and detecting transgenic insertion locations in genomes. Our results demonstrate the use of a combination of NGS technology and bioinformatics approaches that offers cost- and time-effective methods for assessing the safety of transgenic plants.

  11. Divergence of RNA polymerase ? subunits in angiosperm plastid genomes is mediated by genomic rearrangement

    OpenAIRE

    Blazier, J. Chris; Ruhlman, Tracey A.; Weng, Mao-Lun; Rehman, Sumaiyah K.; Sabir, Jamal S. M.; Jansen, Robert K.

    2016-01-01

    Genes for the plastid-encoded RNA polymerase (PEP) persist in the plastid genomes of all photosynthetic angiosperms. However, three unrelated lineages (Annonaceae, Passifloraceae and Geraniaceae) have been identified with unusually divergent open reading frames (ORFs) in the conserved region of rpoA, the gene encoding the PEP ? subunit. We used sequence-based approaches to evaluate whether these genes retain function. Both gene sequences and complete plastid genome sequences were assembled an...

  12. WormBase: Annotating many nematode genomes.

    Science.gov (United States)

    Howe, Kevin; Davis, Paul; Paulini, Michael; Tuli, Mary Ann; Williams, Gary; Yook, Karen; Durbin, Richard; Kersey, Paul; Sternberg, Paul W

    2012-01-01

    WormBase (www.wormbase.org) has been serving the scientific community for over 11 years as the central repository for genomic and genetic information for the soil nematode Caenorhabditis elegans. The resource has evolved from its beginnings as a database housing the genomic sequence and genetic and physical maps of a single species, and now represents the breadth and diversity of nematode research, currently serving genome sequence and annotation for around 20 nematodes. In this article, we focus on WormBase's role of genome sequence annotation, describing how we annotate and integrate data from a growing collection of nematode species and strains. We also review our approaches to sequence curation, and discuss the impact on annotation quality of large functional genomics projects such as modENCODE.

  13. Assembly of the Complete Sitka Spruce Chloroplast Genome Using 10X Genomics' GemCode Sequencing Data.

    Directory of Open Access Journals (Sweden)

    Lauren Coombe

    Full Text Available The linked read sequencing library preparation platform by 10X Genomics produces barcoded sequencing libraries, which are subsequently sequenced using the Illumina short read sequencing technology. In this new approach, long fragments of DNA are partitioned into separate micro-reactions, where the same index sequence is incorporated into each of the sequencing fragment inserts derived from a given long fragment. In this study, we exploited this property by using reads from index sequences associated with a large number of reads, to assemble the chloroplast genome of the Sitka spruce tree (Picea sitchensis. Here we report on the first Sitka spruce chloroplast genome assembled exclusively from P. sitchensis genomic libraries prepared using the 10X Genomics protocol. We show that the resulting 124,049 base pair long genome shares high sequence similarity with the related white spruce and Norway spruce chloroplast genomes, but diverges substantially from a previously published P. sitchensis- P. thunbergii chimeric genome. The use of reads from high-frequency indices enabled separation of the nuclear genome reads from that of the chloroplast, which resulted in the simplification of the de Bruijn graphs used at the various stages of assembly.

  14. Genomic suppression subtractive hybridization as a tool to identify differences in mycorrhizal fungal genomes.

    Science.gov (United States)

    Murat, Claude; Zampieri, Elisa; Vallino, Marta; Daghino, Stefania; Perotto, Silvia; Bonfante, Paola

    2011-05-01

    Characterization of genomic variation among different microbial species, or different strains of the same species, is a field of significant interest with a wide range of potential applications. We have investigated the genomic variation in mycorrhizal fungal genomes through genomic suppressive subtractive hybridization. The comparison was between phylogenetically distant and close truffle species (Tuber spp.), and between isolates of the ericoid mycorrhizal fungus Oidiodendron maius featuring different degrees of metal tolerance. In the interspecies experiment, almost all the sequences that were identified in the Tuber melanosporum genome and absent in Tuber borchii and Tuber indicum corresponded to transposable elements. In the intraspecies comparison, some specific sequences corresponded to regions coding for enzymes, among them a glutathione synthetase known to be involved in metal tolerance. This approach is a quick and rather inexpensive tool to develop molecular markers for mycorrhizal fungi tracking and barcoding, to identify functional genes and to investigate the genome plasticity, adaptation and evolution. © 2011 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.

  15. Decoding Synteny Blocks and Large-Scale Duplications in Mammalian and Plant Genomes

    Science.gov (United States)

    Peng, Qian; Alekseyev, Max A.; Tesler, Glenn; Pevzner, Pavel A.

    The existing synteny block reconstruction algorithms use anchors (e.g., orthologous genes) shared over all genomes to construct the synteny blocks for multiple genomes. This approach, while efficient for a few genomes, cannot be scaled to address the need to construct synteny blocks in many mammalian genomes that are currently being sequenced. The problem is that the number of anchors shared among all genomes quickly decreases with the increase in the number of genomes. Another problem is that many genomes (plant genomes in particular) had extensive duplications, which makes decoding of genomic architecture and rearrangement analysis in plants difficult. The existing synteny block generation algorithms in plants do not address the issue of generating non-overlapping synteny blocks suitable for analyzing rearrangements and evolution history of duplications. We present a new algorithm based on the A-Bruijn graph framework that overcomes these difficulties and provides a unified approach to synteny block reconstruction for multiple genomes, and for genomes with large duplications.

  16. Genomic DNA Enrichment Using Sequence Capture Microarrays: a Novel Approach to Discover Sequence Nucleotide Polymorphisms (SNP) in Brassica napus L

    Science.gov (United States)

    Clarke, Wayne E.; Parkin, Isobel A.; Gajardo, Humberto A.; Gerhardt, Daniel J.; Higgins, Erin; Sidebottom, Christine; Sharpe, Andrew G.; Snowdon, Rod J.; Federico, Maria L.; Iniguez-Luy, Federico L.

    2013-01-01

    Targeted genomic selection methodologies, or sequence capture, allow for DNA enrichment and large-scale resequencing and characterization of natural genetic variation in species with complex genomes, such as rapeseed canola (Brassica napus L., AACC, 2n=38). The main goal of this project was to combine sequence capture with next generation sequencing (NGS) to discover single nucleotide polymorphisms (SNPs) in specific areas of the B. napus genome historically associated (via quantitative trait loci –QTL– analysis) to traits of agronomical and nutritional importance. A 2.1 million feature sequence capture platform was designed to interrogate DNA sequence variation across 47 specific genomic regions, representing 51.2 Mb of the Brassica A and C genomes, in ten diverse rapeseed genotypes. All ten genotypes were sequenced using the 454 Life Sciences chemistry and to assess the effect of increased sequence depth, two genotypes were also sequenced using Illumina HiSeq chemistry. As a result, 589,367 potentially useful SNPs were identified. Analysis of sequence coverage indicated a four-fold increased representation of target regions, with 57% of the filtered SNPs falling within these regions. Sixty percent of discovered SNPs corresponded to transitions while 40% were transversions. Interestingly, fifty eight percent of the SNPs were found in genic regions while 42% were found in intergenic regions. Further, a high percentage of genic SNPs was found in exons (65% and 64% for the A and C genomes, respectively). Two different genotyping assays were used to validate the discovered SNPs. Validation rates ranged from 61.5% to 84% of tested SNPs, underpinning the effectiveness of this SNP discovery approach. Most importantly, the discovered SNPs were associated with agronomically important regions of the B. napus genome generating a novel data resource for research and breeding this crop species. PMID:24312619

  17. Re-annotation of the physical map of Glycine max for polyploid-like regions by BAC end sequence driven whole genome shotgun read assembly

    Directory of Open Access Journals (Sweden)

    Shultz Jeffry

    2008-07-01

    Full Text Available Abstract Background Many of the world's most important food crops have either polyploid genomes or homeologous regions derived from segmental shuffling following polyploid formation. The soybean (Glycine max genome has been shown to be composed of approximately four thousand short interspersed homeologous regions with 1, 2 or 4 copies per haploid genome by RFLP analysis, microsatellite anchors to BACs and by contigs formed from BAC fingerprints. Despite these similar regions,, the genome has been sequenced by whole genome shotgun sequence (WGS. Here the aim was to use BAC end sequences (BES derived from three minimum tile paths (MTP to examine the extent and homogeneity of polyploid-like regions within contigs and the extent of correlation between the polyploid-like regions inferred from fingerprinting and the polyploid-like sequences inferred from WGS matches. Results Results show that when sequence divergence was 1–10%, the copy number of homeologous regions could be identified from sequence variation in WGS reads overlapping BES. Homeolog sequence variants (HSVs were single nucleotide polymorphisms (SNPs; 89% and single nucleotide indels (SNIs 10%. Larger indels were rare but present (1%. Simulations that had predicted fingerprints of homeologous regions could be separated when divergence exceeded 2% were shown to be false. We show that a 5–10% sequence divergence is necessary to separate homeologs by fingerprinting. BES compared to WGS traces showed polyploid-like regions with less than 1% sequence divergence exist at 2.3% of the locations assayed. Conclusion The use of HSVs like SNPs and SNIs to characterize BACs wil improve contig building methods. The implications for bioinformatic and functional annotation of polyploid and paleopolyploid genomes show that a combined approach of BAC fingerprint based physical maps, WGS sequence and HSV-based partitioning of BAC clones from homeologous regions to separate contigs will allow reliable de

  18. On the total number of genes and their length distribution in complete microbial genomes

    DEFF Research Database (Denmark)

    Skovgaard, Marie; Jensen, L.J.; Brunak, Søren

    2001-01-01

    In sequenced microbial genomes, some of the annotated genes are actually not protein-coding genes, but rather open reading frames that occur by chance. Therefore, the number of annotated genes is higher than the actual number of genes for most of these microbes. Comparison of the length...... distribution of the annotated genes with the length distribution of those matching a known protein reveals that too many short genes are annotated in many genomes. Here we estimate the true number of protein-coding genes for sequenced genomes. Although it is often claimed that Escherichia coli has about 4300...... genes, we show that it probably has only similar to 3800 genes, and that a similar discrepancy exists for almost all published genomes....

  19. Cellular neural networks for the stereo matching problem

    International Nuclear Information System (INIS)

    Taraglio, S.; Zanela, A.

    1997-03-01

    The applicability of the Cellular Neural Network (CNN) paradigm to the problem of recovering information on the tridimensional structure of the environment is investigated. The approach proposed is the stereo matching of video images. The starting point of this work is the Zhou-Chellappa neural network implementation for the same problem. The CNN based system we present here yields the same results as the previous approach, but without the many existing drawbacks

  20. A second generation genetic map of the bumblebee Bombus terrestris (Linnaeus, 1758 reveals slow genome and chromosome evolution in the Apidae

    Directory of Open Access Journals (Sweden)

    Kube Michael

    2011-01-01

    Full Text Available Abstract Background The bumblebee Bombus terrestris is an ecologically and economically important pollinator and has become an important biological model system. To study fundamental evolutionary questions at the genomic level, a high resolution genetic linkage map is an essential tool for analyses ranging from quantitative trait loci (QTL mapping to genome assembly and comparative genomics. We here present a saturated linkage map and match it with the Apis mellifera genome using homologous markers. This genome-wide comparison allows insights into structural conservations and rearrangements and thus the evolution on a chromosomal level. Results The high density linkage map covers ~ 93% of the B. terrestris genome on 18 linkage groups (LGs and has a length of 2'047 cM with an average marker distance of 4.02 cM. Based on a genome size of ~ 430 Mb, the recombination rate estimate is 4.76 cM/Mb. Sequence homologies of 242 homologous markers allowed to match 15 B. terrestris with A. mellifera LGs, five of them as composites. Comparing marker orders between both genomes we detect over 14% of the genome to be organized in synteny and 21% in rearranged blocks on the same homologous LG. Conclusions This study demonstrates that, despite the very high recombination rates of both A. mellifera and B. terrestris and a long divergence time of about 100 million years, the genomes' genetic architecture is highly conserved. This reflects a slow genome evolution in these bees. We show that data on genome organization and conserved molecular markers can be used as a powerful tool for comparative genomics and evolutionary studies, opening up new avenues of research in the Apidae.

  1. Computational pan-genomics: status, promises and challenges

    NARCIS (Netherlands)

    The Computational Pan-Genomics Consortium; T. Marschall (Tobias); M. Marz (Manja); T. Abeel (Thomas); L.J. Dijkstra (Louis); B.E. Dutilh (Bas); A. Ghaffaari (Ali); P. Kersey (Paul); W.P. Kloosterman (Wigard); V. Mäkinen (Veli); A.M. Novak (Adam); B. Paten (Benedict); D. Porubsky (David); E. Rivals (Eric); C. Alkan (Can); J.A. Baaijens (Jasmijn); P.I.W. de Bakker (Paul); V. Boeva (Valentina); R.J.P. Bonnal (Raoul); F. Chiaromonte (Francesca); R. Chikhi (Rayan); F.D. Ciccarelli (Francesca); C.P. Cijvat (Robin); E. Datema (Erwin); C.M. van Duijn (Cornelia); E.E. Eichler (Evan); C. Ernst (Corinna); E. Eskin (Eleazar); E. Garrison (Erik); M. El-Kebir (Mohammed); G.W. Klau (Gunnar); J.O. Korbel (Jan); E.-W. Lameijer (Eric-Wubbo); B. Langmead (Benjamin); M. Martin; P. Medvedev (Paul); J.C. Mu (John); P.B.T. Neerincx (Pieter); K. Ouwens (Klaasjan); P. Peterlongo (Pierre); N. Pisanti (Nadia); S. Rahmann (Sven); B.J. Raphael (Benjamin); K. Reinert (Knut); D. de Ridder (Dick); J. de Ridder (Jeroen); M. Schlesner (Matthias); O. Schulz-Trieglaff (Ole); A.D. Sanders (Ashley); S. Sheikhizadeh (Siavash); C. Shneider (Carl); S. Smit (Sandra); D. Valenzuela (Daniel); J. Wang (Jiayin); L.F.A. Wessels (Lodewyk); Y. Zhang (Ying); V. Guryev (Victor); F. Vandin (Fabio); K. Ye (Kai); A. Schönhuth (Alexander)

    2018-01-01

    textabstractMany disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few

  2. Random Tagging Genotyping by Sequencing (rtGBS, an Unbiased Approach to Locate Restriction Enzyme Sites across the Target Genome.

    Directory of Open Access Journals (Sweden)

    Elena Hilario

    Full Text Available Genotyping by sequencing (GBS is a restriction enzyme based targeted approach developed to reduce the genome complexity and discover genetic markers when a priori sequence information is unavailable. Sufficient coverage at each locus is essential to distinguish heterozygous from homozygous sites accurately. The number of GBS samples able to be pooled in one sequencing lane is limited by the number of restriction sites present in the genome and the read depth required at each site per sample for accurate calling of single-nucleotide polymorphisms. Loci bias was observed using a slight modification of the Elshire et al.some restriction enzyme sites were represented in higher proportions while others were poorly represented or absent. This bias could be due to the quality of genomic DNA, the endonuclease and ligase reaction efficiency, the distance between restriction sites, the preferential amplification of small library restriction fragments, or bias towards cluster formation of small amplicons during the sequencing process. To overcome these issues, we have developed a GBS method based on randomly tagging genomic DNA (rtGBS. By randomly landing on the genome, we can, with less bias, find restriction sites that are far apart, and undetected by the standard GBS (stdGBS method. The study comprises two types of biological replicates: six different kiwifruit plants and two independent DNA extractions per plant; and three types of technical replicates: four samples of each DNA extraction, stdGBS vs. rtGBS methods, and two independent library amplifications, each sequenced in separate lanes. A statistically significant unbiased distribution of restriction fragment size by rtGBS showed that this method targeted 49% (39,145 of BamH I sites shared with the reference genome, compared to only 14% (11,513 by stdGBS.

  3. Automated genome mining of ribosomal peptide natural products

    Energy Technology Data Exchange (ETDEWEB)

    Mohimani, Hosein; Kersten, Roland; Liu, Wei; Wang, Mingxun; Purvine, Samuel O.; Wu, Si; Brewer, Heather M.; Pasa-Tolic, Ljiljana; Bandeira, Nuno; Moore, Bradley S.; Pevzner, Pavel A.; Dorrestein, Pieter C.

    2014-07-31

    Ribosomally synthesized and posttranslationally modified peptides (RiPPs), especially from microbial sources, are a large group of bioactive natural products that are a promising source of new (bio)chemistry and bioactivity (1). In light of exponentially increasing microbial genome databases and improved mass spectrometry (MS)-based metabolomic platforms, there is a need for computational tools that connect natural product genotypes predicted from microbial genome sequences with their corresponding chemotypes from metabolomic datasets. Here, we introduce RiPPquest, a tandem mass spectrometry database search tool for identification of microbial RiPPs and apply it for lanthipeptide discovery. RiPPquest uses genomics to limit search space to the vicinity of RiPP biosynthetic genes and proteomics to analyze extensive peptide modifications and compute p-values of peptide-spectrum matches (PSMs). We highlight RiPPquest by connection of multiple RiPPs from extracts of Streptomyces to their gene clusters and by the discovery of a new class III lanthipeptide, informatipeptin, from Streptomyces viridochromogenes DSM 40736 as the first natural product to be identified in an automated fashion by genome mining. The presented tool is available at cy-clo.ucsd.edu.

  4. GenoSets: visual analytic methods for comparative genomics.

    Directory of Open Access Journals (Sweden)

    Aurora A Cain

    Full Text Available Many important questions in biology are, fundamentally, comparative, and this extends to our analysis of a growing number of sequenced genomes. Existing genomic analysis tools are often organized around literal views of genomes as linear strings. Even when information is highly condensed, these views grow cumbersome as larger numbers of genomes are added. Data aggregation and summarization methods from the field of visual analytics can provide abstracted comparative views, suitable for sifting large multi-genome datasets to identify critical similarities and differences. We introduce a software system for visual analysis of comparative genomics data. The system automates the process of data integration, and provides the analysis platform to identify and explore features of interest within these large datasets. GenoSets borrows techniques from business intelligence and visual analytics to provide a rich interface of interactive visualizations supported by a multi-dimensional data warehouse. In GenoSets, visual analytic approaches are used to enable querying based on orthology, functional assignment, and taxonomic or user-defined groupings of genomes. GenoSets links this information together with coordinated, interactive visualizations for both detailed and high-level categorical analysis of summarized data. GenoSets has been designed to simplify the exploration of multiple genome datasets and to facilitate reasoning about genomic comparisons. Case examples are included showing the use of this system in the analysis of 12 Brucella genomes. GenoSets software and the case study dataset are freely available at http://genosets.uncc.edu. We demonstrate that the integration of genomic data using a coordinated multiple view approach can simplify the exploration of large comparative genomic data sets, and facilitate reasoning about comparisons and features of interest.

  5. Tolerating Correlated Failures for Generalized Cartesian Distributions via Bipartite Matching

    International Nuclear Information System (INIS)

    Ali, Nawab; Krishnamoorthy, Sriram; Halappanavar, Mahantesh; Daily, Jeffrey A.

    2011-01-01

    Faults are expected to play an increasingly important role in how algorithms and applications are designed to run on future extreme-scale systems. A key ingredient of any approach to fault tolerance is effective support for fault tolerant data storage. A typical application execution consists of phases in which certain data structures are modified while others are read-only. Often, read-only data structures constitute a large fraction of total memory consumed. Fault tolerance for read-only data can be ensured through the use of checksums or parities, without resorting to expensive in-memory duplication or checkpointing to secondary storage. In this paper, we present a graph-matching approach to compute and store parity data for read-only matrices that are compatible with fault tolerant linear algebra (FTLA). Typical approaches only support blocked data distributions with each process holding one block with the parity located on additional processes. The matrices are assumed to be blocked by a cartesian grid with each block assigned to a process. We consider a generalized distribution in which each process can be assigned arbitrary blocks. We also account for the fact that multiple processes might be part of the same failure unit, say an SMP node. The flexibility enabled by our novel application of graph matching extends fault tolerance support to data distributions beyond those supported by prior work. We evaluate the matching implementations and cost to compute the parity and recover lost data, demonstrating the low overhead incurred by our approach.

  6. Data of 10 SSR markers for genomes of homo sapiens and monkeys.

    Science.gov (United States)

    Reddy, K K V V V S; Raju, S Viswanadha; Someswara Rao, Chinta

    2017-06-01

    In this data, we present 10 Simple Sequence Repeat(SSR) markers TAGA, TCAT, GAAT, AGAT, AGAA, GATA, TATC, CTTT, TCTG and TCTA which are extracted from the genomes of homo sapiens and monkeys using string matching mechanism [1]. All loci showed 4 Base Pair(bp) in allele size, indicating that there are some polymorphisms between individuals correlating to the number of SSR repeats that maybe useful for the detection of similarity among the genotypes. Collectively, these data show that the SSR extraction is a valuable method to illustrate genetic variation of genomes.

  7. Reappraising the Impact of Offending on Victimization: A Propensity Score Matching Approach.

    Science.gov (United States)

    Posick, Chad

    2017-05-01

    Existing evidence clearly supports an empirical connection between offending and victimization. Often called the "victim-offender overlap," this relationship holds for both sexes, across the life course, and across a wide range of countries and cultural environments. In addition, the relationship is sustained regardless of the study sample and statistical methods applied in the analyses of the sample data. However, there has yet to be a study that examines this relationship for violent and property crime using quasi-experimental methods accounting for a wide range of potential confounders including individual differences and cultural contexts. This study subjects the victim-offender relationship to testing through propensity score matching for both violent and property crimes using an international dataset. The results show that previous violent and theft offending increases the odds of victimization when matching on individual and contextual factors. This finding supports previous literature and suggests that delinquent behavior may act as a "switch" that exposes one to subsequent violent and theft victimization.

  8. Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences

    Directory of Open Access Journals (Sweden)

    Holland Barbara R

    2006-07-01

    Full Text Available Abstract Background Phylogenetic methods which do not rely on multiple sequence alignments are important tools in inferring trees directly from completely sequenced genomes. Here, we extend the recently described Genome BLAST Distance Phylogeny (GBDP strategy to compute phylogenetic trees from all completely sequenced plastid genomes currently available and from a selection of mitochondrial genomes representing the major eukaryotic lineages. BLASTN, TBLASTX, or combinations of both are used to locate high-scoring segment pairs (HSPs between two sequences from which pairwise similarities and distances are computed in different ways resulting in a total of 96 GBDP variants. The suitability of these distance formulae for phylogeny reconstruction is directly estimated by computing a recently described measure of "treelikeness", the so-called δ value, from the respective distance matrices. Additionally, we compare the trees inferred from these matrices using UPGMA, NJ, BIONJ, FastME, or STC, respectively, with the NCBI taxonomy tree of the taxa under study. Results Our results indicate that, at this taxonomic level, plastid genomes are much more valuable for inferring phylogenies than are mitochondrial genomes, and that distances based on breakpoints are of little use. Distances based on the proportion of "matched" HSP length to average genome length were best for tree estimation. Additionally we found that using TBLASTX instead of BLASTN and, particularly, combining TBLASTX and BLASTN leads to a small but significant increase in accuracy. Other factors do not significantly affect the phylogenetic outcome. The BIONJ algorithm results in phylogenies most in accordance with the current NCBI taxonomy, with NJ and FastME performing insignificantly worse, and STC performing as well if applied to high quality distance matrices. δ values are found to be a reliable predictor of phylogenetic accuracy. Conclusion Using the most treelike distance matrices, as

  9. Single Cell Genomics and Transcriptomics for Unicellular Eukaryotes

    Energy Technology Data Exchange (ETDEWEB)

    Ciobanu, Doina; Clum, Alicia; Singh, Vasanth; Salamov, Asaf; Han, James; Copeland, Alex; Grigoriev, Igor; James, Timothy; Singer, Steven; Woyke, Tanja; Malmstrom, Rex; Cheng, Jan-Fang

    2014-03-14

    Despite their small size, unicellular eukaryotes have complex genomes with a high degree of plasticity that allow them to adapt quickly to environmental changes. Unicellular eukaryotes live with prokaryotes and higher eukaryotes, frequently in symbiotic or parasitic niches. To this day their contribution to the dynamics of the environmental communities remains to be understood. Unfortunately, the vast majority of eukaryotic microorganisms are either uncultured or unculturable, making genome sequencing impossible using traditional approaches. We have developed an approach to isolate unicellular eukaryotes of interest from environmental samples, and to sequence and analyze their genomes and transcriptomes. We have tested our methods with six species: an uncharacterized protist from cellulose-enriched compost identified as Platyophrya, a close relative of P. vorax; the fungus Metschnikowia bicuspidate, a parasite of water flea Daphnia; the mycoparasitic fungi Piptocephalis cylindrospora, a parasite of Cokeromyces and Mucor; Caulochytrium protosteloides, a parasite of Sordaria; Rozella allomycis, a parasite of the water mold Allomyces; and the microalgae Chlamydomonas reinhardtii. Here, we present the four components of our approach: pre-sequencing methods, sequence analysis for single cell genome assembly, sequence analysis of single cell transcriptomes, and genome annotation. This technology has the potential to uncover the complexity of single cell eukaryotes and their role in the environmental samples.

  10. Brute-Force Approach for Mass Spectrometry-Based Variant Peptide Identification in Proteogenomics without Personalized Genomic Data

    Science.gov (United States)

    Ivanov, Mark V.; Lobas, Anna A.; Levitsky, Lev I.; Moshkovskii, Sergei A.; Gorshkov, Mikhail V.

    2018-02-01

    In a proteogenomic approach based on tandem mass spectrometry analysis of proteolytic peptide mixtures, customized exome or RNA-seq databases are employed for identifying protein sequence variants. However, the problem of variant peptide identification without personalized genomic data is important for a variety of applications. Following the recent proposal by Chick et al. (Nat. Biotechnol. 33, 743-749, 2015) on the feasibility of such variant peptide search, we evaluated two available approaches based on the previously suggested "open" search and the "brute-force" strategy. To improve the efficiency of these approaches, we propose an algorithm for exclusion of false variant identifications from the search results involving analysis of modifications mimicking single amino acid substitutions. Also, we propose a de novo based scoring scheme for assessment of identified point mutations. In the scheme, the search engine analyzes y-type fragment ions in MS/MS spectra to confirm the location of the mutation in the variant peptide sequence.

  11. Assessment of genomic relationship between Oryza sativa and ...

    African Journals Online (AJOL)

    STORAGESEVER

    2010-03-01

    Mar 1, 2010 ... For genomic in situ hybridization, genomic DNA from O. australiensis was used as probe for the mitotic and meiotic ... Wide hybridization is one of the plant breeding approaches ..... Disease and insect resistance in rice.

  12. Magnetic safety matches

    Science.gov (United States)

    Lindén, J.; Lindberg, M.; Greggas, A.; Jylhävuori, N.; Norrgrann, H.; Lill, J. O.

    2017-07-01

    In addition to the main ingredients; sulfur, potassium chlorate and carbon, ordinary safety matches contain various dyes, glues etc, giving the head of the match an even texture and appealing color. Among the common reddish-brown matches there are several types, which after ignition can be attracted by a strong magnet. Before ignition the match head is generally not attracted by the magnet. An elemental analysis based on proton-induced x-ray emission was performed to single out iron as the element responsible for the observed magnetism. 57Fe Mössbauer spectroscopy was used for identifying the various types of iron-compounds, present before and after ignition, responsible for the macroscopic magnetism: Fe2O3 before and Fe3O4 after. The reaction was verified by mixing the main chemicals in the match-head with Fe2O3 in glue and mounting the mixture on a match stick. The ash residue after igniting the mixture was magnetic.

  13. Next-Generation Genomics Facility at C-CAMP: Accelerating Genomic Research in India

    Science.gov (United States)

    S, Chandana; Russiachand, Heikham; H, Pradeep; S, Shilpa; M, Ashwini; S, Sahana; B, Jayanth; Atla, Goutham; Jain, Smita; Arunkumar, Nandini; Gowda, Malali

    2014-01-01

    Next-Generation Sequencing (NGS; http://www.genome.gov/12513162) is a recent life-sciences technological revolution that allows scientists to decode genomes or transcriptomes at a much faster rate with a lower cost. Genomic-based studies are in a relatively slow pace in India due to the non-availability of genomics experts, trained personnel and dedicated service providers. Using NGS there is a lot of potential to study India's national diversity (of all kinds). We at the Centre for Cellular and Molecular Platforms (C-CAMP) have launched the Next Generation Genomics Facility (NGGF) to provide genomics service to scientists, to train researchers and also work on national and international genomic projects. We have HiSeq1000 from Illumina and GS-FLX Plus from Roche454. The long reads from GS FLX Plus, and high sequence depth from HiSeq1000, are the best and ideal hybrid approaches for de novo and re-sequencing of genomes and transcriptomes. At our facility, we have sequenced around 70 different organisms comprising of more than 388 genomes and 615 transcriptomes – prokaryotes and eukaryotes (fungi, plants and animals). In addition we have optimized other unique applications such as small RNA (miRNA, siRNA etc), long Mate-pair sequencing (2 to 20 Kb), Coding sequences (Exome), Methylome (ChIP-Seq), Restriction Mapping (RAD-Seq), Human Leukocyte Antigen (HLA) typing, mixed genomes (metagenomes) and target amplicons, etc. Translating DNA sequence data from NGS sequencer into meaningful information is an important exercise. Under NGGF, we have bioinformatics experts and high-end computing resources to dissect NGS data such as genome assembly and annotation, gene expression, target enrichment, variant calling (SSR or SNP), comparative analysis etc. Our services (sequencing and bioinformatics) have been utilized by more than 45 organizations (academia and industry) both within India and outside, resulting several publications in peer-reviewed journals and several genomic

  14. Image matching as a data source for forest inventory - Comparison of Semi-Global Matching and Next-Generation Automatic Terrain Extraction algorithms in a typical managed boreal forest environment

    Science.gov (United States)

    Kukkonen, M.; Maltamo, M.; Packalen, P.

    2017-08-01

    Image matching is emerging as a compelling alternative to airborne laser scanning (ALS) as a data source for forest inventory and management. There is currently an open discussion in the forest inventory community about whether, and to what extent, the new method can be applied to practical inventory campaigns. This paper aims to contribute to this discussion by comparing two different image matching algorithms (Semi-Global Matching [SGM] and Next-Generation Automatic Terrain Extraction [NGATE]) and ALS in a typical managed boreal forest environment in southern Finland. Spectral features from unrectified aerial images were included in the modeling and the potential of image matching in areas without a high resolution digital terrain model (DTM) was also explored. Plot level predictions for total volume, stem number, basal area, height of basal area median tree and diameter of basal area median tree were modeled using an area-based approach. Plot level dominant tree species were predicted using a random forest algorithm, also using an area-based approach. The statistical difference between the error rates from different datasets was evaluated using a bootstrap method. Results showed that ALS outperformed image matching with every forest attribute, even when a high resolution DTM was used for height normalization and spectral information from images was included. Dominant tree species classification with image matching achieved accuracy levels similar to ALS regardless of the resolution of the DTM when spectral metrics were used. Neither of the image matching algorithms consistently outperformed the other, but there were noticeably different error rates depending on the parameter configuration, spectral band, resolution of DTM, or response variable. This study showed that image matching provides reasonable point cloud data for forest inventory purposes, especially when a high resolution DTM is available and information from the understory is redundant.

  15. Bioinformatic approach in the identification of arabidopsis gene homologous in amaranthus

    Directory of Open Access Journals (Sweden)

    Jana Žiarovská

    2015-05-01

    Full Text Available Bioinfomatics offers an efficient tool for molecular genetics applications and sequence homology search algorithms became an inevitable part for many different research strategies. Appropriate managing of known data that are stored in public available databases can be used in many ways in the research. Here, we report the identification of RmlC-like cupins superfamily protein DNA sequence than is known in Arabidopsis genome for the Amaranthus - plant specie where this sequence was still not sequenced. A BLAST based approach was used to identify the homologous sequences in the nucleotide database and to find suitable parts of the Arabidopsis sequence were primers can be designed. In total, 64 hits were found in nucleotide database for Arabidopsis RmlC-like cupins sequence. A query cover ranged from 10% up to the 100% among RmlC-like cupins nucleotides and its homologues that are actually stored in public nucleotide databases. The most conserved region was identified for matches that posses nucleotides in the range of 1506 up to the 1925 bp of RmlC-like cupins DNA sequence stored in the database. The in silico approach was subsequently used in PCR analysis where the specifity of designed primers was approved. A unique, 250 bp long fragment was obtained for Amaranthus cruentus and a hybride Amaranthus hypochondriacus x hybridus in our analysis. Bioinformatic based analysis of unknown parts of the plant genomes as showed in this study is a very good additional tool in PCR based analysis of plant variability. This approach is suitable in the case for plants, where concrete genomic data are still missing for the appropriate genes, as was demonstrated for Amaranthus. 

  16. Big Data Analytics for Genomic Medicine.

    Science.gov (United States)

    He, Karen Y; Ge, Dongliang; He, Max M

    2017-02-15

    Genomic medicine attempts to build individualized strategies for diagnostic or therapeutic decision-making by utilizing patients' genomic information. Big Data analytics uncovers hidden patterns, unknown correlations, and other insights through examining large-scale various data sets. While integration and manipulation of diverse genomic data and comprehensive electronic health records (EHRs) on a Big Data infrastructure exhibit challenges, they also provide a feasible opportunity to develop an efficient and effective approach to identify clinically actionable genetic variants for individualized diagnosis and therapy. In this paper, we review the challenges of manipulating large-scale next-generation sequencing (NGS) data and diverse clinical data derived from the EHRs for genomic medicine. We introduce possible solutions for different challenges in manipulating, managing, and analyzing genomic and clinical data to implement genomic medicine. Additionally, we also present a practical Big Data toolset for identifying clinically actionable genetic variants using high-throughput NGS data and EHRs.

  17. Multiscale modeling of three-dimensional genome

    Science.gov (United States)

    Zhang, Bin; Wolynes, Peter

    The genome, the blueprint of life, contains nearly all the information needed to build and maintain an entire organism. A comprehensive understanding of the genome is of paramount interest to human health and will advance progress in many areas, including life sciences, medicine, and biotechnology. The overarching goal of my research is to understand the structure-dynamics-function relationships of the human genome. In this talk, I will be presenting our efforts in moving towards that goal, with a particular emphasis on studying the three-dimensional organization, the structure of the genome with multi-scale approaches. Specifically, I will discuss the reconstruction of genome structures at both interphase and metaphase by making use of data from chromosome conformation capture experiments. Computationally modeling of chromatin fiber at atomistic level from first principles will also be presented as our effort for studying the genome structure from bottom up.

  18. Exact Methods for Solving the Train Departure Matching Problem

    DEFF Research Database (Denmark)

    Haahr, Jørgen Thorlund; Bull, Simon Henry

    In this paper we consider the train departure matching problem which is an important subproblem of the Rolling Stock Unit Management on Railway Sites problem introduced in the ROADEF/EURO Challenge 2014. The subproblem entails matching arriving train units to scheduled departing trains at a railway...... site while respecting multiple physical and operational constraints. In this paper we formally define that subproblem, prove its NP- hardness, and present two exact method approaches for solving the problem. First, we present a compact Mixed Integer Program formulation which we solve using a MIP solver...

  19. Reexamining microRNA site accessibility in Drosophila: a population genomics study.

    Directory of Open Access Journals (Sweden)

    Kevin Chen

    Full Text Available Kertesz et al. (Nature Genetics 2008 described PITA, a miRNA target prediction algorithm based on hybridization energy and site accessibility. In this note, we used a population genomics approach to reexamine their data and found that the PITA algorithm had lower specificity than methods based on evolutionary conservation at comparable levels of sensitivity.We also showed that deeply conserved miRNAs tend to have stronger hybridization energies to their targets than do other miRNAs. Although PITA had higher specificity in predicting targets than a naïve seed-match method, this signal was primarily due to the use of a single cutoff score for all miRNAs and to the observed correlation between conservation and hybridization energy. Overall, our results clarify the accuracy of different miRNA target prediction algorithms in Drosophila and the role of site accessibility in miRNA target prediction.

  20. Whole genome sequencing options for bacterial strain typing and epidemiologic analysis based on single nucleotide polymorphism versus gene-by-gene-based approaches.

    Science.gov (United States)

    Schürch, A C; Arredondo-Alonso, S; Willems, R J L; Goering, R V

    2018-04-01

    Whole genome sequence (WGS)-based strain typing finds increasing use in the epidemiologic analysis of bacterial pathogens in both public health as well as more localized infection control settings. This minireview describes methodologic approaches that have been explored for WGS-based epidemiologic analysis and considers the challenges and pitfalls of data interpretation. Personal collection of relevant publications. When applying WGS to study the molecular epidemiology of bacterial pathogens, genomic variability between strains is translated into measures of distance by determining single nucleotide polymorphisms in core genome alignments or by indexing allelic variation in hundreds to thousands of core genes, assigning types to unique allelic profiles. Interpreting isolate relatedness from these distances is highly organism specific, and attempts to establish species-specific cutoffs are unlikely to be generally applicable. In cases where single nucleotide polymorphism or core gene typing do not provide the resolution necessary for accurate assessment of the epidemiology of bacterial pathogens, inclusion of accessory gene or plasmid sequences may provide the additional required discrimination. As with all epidemiologic analysis, realizing the full potential of the revolutionary advances in WGS-based approaches requires understanding and dealing with issues related to the fundamental steps of data generation and interpretation. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.

  1. Characterization of soybean genomic features by analysis of its expressed sequence tags

    DEFF Research Database (Denmark)

    Tian, Ai-Guo; Wang, Jun; Cui, Peng

    2004-01-01

    to be fast-evolving. Soybean unigenes with no match to genes within the Arabidopsis genome were identified as soybean-specific genes. These genes were mainly involved in nodule development and the synthesis of seed storage proteins. In addition, we also identified 61 genes regulated by salicylic acid, 1...

  2. Combining genetical genomics and bulked segregant analysis differential expression: an approach to gene localization

    NARCIS (Netherlands)

    Chen, Xinwei; Hedley, P.E.; Morris, J.; Liu, Hui; Niks, R.E.; Waugh, R.

    2011-01-01

    Positional gene isolation in unsequenced species generally requires either a reference genome sequence or an inference of gene content based on conservation of synteny with a genomic model. In the large unsequenced genomes of the Triticeae cereals the latter, i.e. conservation of synteny with the

  3. Genome evolution during progression to breast cancer

    KAUST Repository

    Newburger, D. E.; Kashef-Haghighi, D.; Weng, Z.; Salari, R.; Sweeney, R. T.; Brunner, A. L.; Zhu, S. X.; Guo, X.; Varma, S.; Troxell, M. L.; West, R. B.; Batzoglou, S.; Sidow, A.

    2013-01-01

    Cancer evolution involves cycles of genomic damage, epigenetic deregulation, and increased cellular proliferation that eventually culminate in the carcinoma phenotype. Early neoplasias, which are often found concurrently with carcinomas and are histologically distinguishable from normal breast tissue, are less advanced in phenotype than carcinomas and are thought to represent precursor stages. To elucidate their role in cancer evolution we performed comparative whole-genome sequencing of early neoplasias, matched normal tissue, and carcinomas from six patients, for a total of 31 samples. By using somatic mutations as lineage markers we built trees that relate the tissue samples within each patient. On the basis of these lineage trees we inferred the order, timing, and rates of genomic events. In four out of six cases, an early neoplasia and the carcinoma share a mutated common ancestor with recurring aneuploidies, and in all six cases evolution accelerated in the carcinoma lineage. Transition spectra of somatic mutations are stable and consistent across cases, suggesting that accumulation of somatic mutations is a result of increased ancestral cell division rather than specific mutational mechanisms. In contrast to highly advanced tumors that are the focus of much of the current cancer genome sequencing, neither the early neoplasia genomes nor the carcinomas are enriched with potentially functional somatic point mutations. Aneuploidies that occur in common ancestors of neoplastic and tumor cells are the earliest events that affect a large number of genes and may predispose breast tissue to eventual development of invasive carcinoma.

  4. Genome evolution during progression to breast cancer

    KAUST Repository

    Newburger, D. E.

    2013-04-08

    Cancer evolution involves cycles of genomic damage, epigenetic deregulation, and increased cellular proliferation that eventually culminate in the carcinoma phenotype. Early neoplasias, which are often found concurrently with carcinomas and are histologically distinguishable from normal breast tissue, are less advanced in phenotype than carcinomas and are thought to represent precursor stages. To elucidate their role in cancer evolution we performed comparative whole-genome sequencing of early neoplasias, matched normal tissue, and carcinomas from six patients, for a total of 31 samples. By using somatic mutations as lineage markers we built trees that relate the tissue samples within each patient. On the basis of these lineage trees we inferred the order, timing, and rates of genomic events. In four out of six cases, an early neoplasia and the carcinoma share a mutated common ancestor with recurring aneuploidies, and in all six cases evolution accelerated in the carcinoma lineage. Transition spectra of somatic mutations are stable and consistent across cases, suggesting that accumulation of somatic mutations is a result of increased ancestral cell division rather than specific mutational mechanisms. In contrast to highly advanced tumors that are the focus of much of the current cancer genome sequencing, neither the early neoplasia genomes nor the carcinomas are enriched with potentially functional somatic point mutations. Aneuploidies that occur in common ancestors of neoplastic and tumor cells are the earliest events that affect a large number of genes and may predispose breast tissue to eventual development of invasive carcinoma.

  5. Mapping genomic features to functional traits through microbial whole genome sequences.

    Science.gov (United States)

    Zhang, Wei; Zeng, Erliang; Liu, Dan; Jones, Stuart E; Emrich, Scott

    2014-01-01

    Recently, the utility of trait-based approaches for microbial communities has been identified. Increasing availability of whole genome sequences provide the opportunity to explore the genetic foundations of a variety of functional traits. We proposed a machine learning framework to quantitatively link the genomic features with functional traits. Genes from bacteria genomes belonging to different functional traits were grouped to Cluster of Orthologs (COGs), and were used as features. Then, TF-IDF technique from the text mining domain was applied to transform the data to accommodate the abundance and importance of each COG. After TF-IDF processing, COGs were ranked using feature selection methods to identify their relevance to the functional trait of interest. Extensive experimental results demonstrated that functional trait related genes can be detected using our method. Further, the method has the potential to provide novel biological insights.

  6. A Genome-Wide Methylation Study of Severe Vitamin D Deficiency in African American Adolescents

    NARCIS (Netherlands)

    Zhu, Haidong; Wang, Xiaoling; Shi, Huidong; Su, Shaoyong; Harshfield, Gregory A.; Gutin, Bernard; Snieder, Harold; Dong, Yanbin

    Objectives To test the hypothesis that changes in DNA methylation are involved in vitamin D deficiency-related immune cell regulation using an unbiased genome-wide approach combined with a genomic and epigenomic integrative approach. Study design We performed a genome-wide methylation scan using the

  7. Composite Match Index with Application of Interior Deformation Field Measurement from Magnetic Resonance Volumetric Images of Human Tissues

    Directory of Open Access Journals (Sweden)

    Penglin Zhang

    2012-01-01

    Full Text Available Whereas a variety of different feature-point matching approaches have been reported in computer vision, few feature-point matching approaches employed in images from nonrigid, nonuniform human tissues have been reported. The present work is concerned with interior deformation field measurement of complex human tissues from three-dimensional magnetic resonance (MR volumetric images. To improve the reliability of matching results, this paper proposes composite match index (CMI as the foundation of multimethod fusion methods to increase the reliability of these various methods. Thereinto, we discuss the definition, components, and weight determination of CMI. To test the validity of the proposed approach, it is applied to actual MR volumetric images obtained from a volunteer’s calf. The main result is consistent with the actual condition.

  8. Accounting for discovery bias in genomic prediction

    Science.gov (United States)

    Our objective was to evaluate an approach to mitigating discovery bias in genomic prediction. Accuracy may be improved by placing greater emphasis on regions of the genome expected to be more influential on a trait. Methods emphasizing regions result in a phenomenon known as “discovery bias” if info...

  9. Whole-Genome DNA Methylation Status Associated with Clinical PTSD Measures of OIF/OEF Veterans (Open Access)

    Science.gov (United States)

    2017-07-11

    OIF) veterans with PTSD and 51 age/ethnicity/ gender -matched combat-exposed PTSD-negative controls. Agilent whole-genome array detected ~ 5600...exclusion criteria were used19,20 to identify a training set comprising 48 male veterans with PTSD (PTSD+) and 51 age-/ethnicity-/ gender -matched controls...568 Doughten Drive, Fort Detrick, Frederick, MD 21702-5010, USA. E-mail: Rasha.Hammamieh1.civ@mail.mil 11These authors contributed equally to this

  10. Comparative Reannotation of 21 Aspergillus Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Salamov, Asaf; Riley, Robert; Kuo, Alan; Grigoriev, Igor

    2013-03-08

    We used comparative gene modeling to reannotate 21 Aspergillus genomes. Initial automatic annotation of individual genomes may contain some errors of different nature, e.g. missing genes, incorrect exon-intron structures, 'chimeras', which fuse 2 or more real genes or alternatively splitting some real genes into 2 or more models. The main premise behind the comparative modeling approach is that for closely related genomes most orthologous families have the same conserved gene structure. The algorithm maps all gene models predicted in each individual Aspergillus genome to the other genomes and, for each locus, selects from potentially many competing models, the one which most closely resembles the orthologous genes from other genomes. This procedure is iterated until no further change in gene models is observed. For Aspergillus genomes we predicted in total 4503 new gene models ( ~;;2percent per genome), supported by comparative analysis, additionally correcting ~;;18percent of old gene models. This resulted in a total of 4065 more genes with annotated PFAM domains (~;;3percent increase per genome). Analysis of a few genomes with EST/transcriptomics data shows that the new annotation sets also have a higher number of EST-supported splice sites at exon-intron boundaries.

  11. Computer face-matching technology using two-dimensional photographs accurately matches the facial gestalt of unrelated individuals with the same syndromic form of intellectual disability.

    Science.gov (United States)

    Dudding-Byth, Tracy; Baxter, Anne; Holliday, Elizabeth G; Hackett, Anna; O'Donnell, Sheridan; White, Susan M; Attia, John; Brunner, Han; de Vries, Bert; Koolen, David; Kleefstra, Tjitske; Ratwatte, Seshika; Riveros, Carlos; Brain, Steve; Lovell, Brian C

    2017-12-19

    Massively parallel genetic sequencing allows rapid testing of known intellectual disability (ID) genes. However, the discovery of novel syndromic ID genes requires molecular confirmation in at least a second or a cluster of individuals with an overlapping phenotype or similar facial gestalt. Using computer face-matching technology we report an automated approach to matching the faces of non-identical individuals with the same genetic syndrome within a database of 3681 images [1600 images of one of 10 genetic syndrome subgroups together with 2081 control images]. Using the leave-one-out method, two research questions were specified: 1) Using two-dimensional (2D) photographs of individuals with one of 10 genetic syndromes within a database of images, did the technology correctly identify more than expected by chance: i) a top match? ii) at least one match within the top five matches? or iii) at least one in the top 10 with an individual from the same syndrome subgroup? 2) Was there concordance between correct technology-based matches and whether two out of three clinical geneticists would have considered the diagnosis based on the image alone? The computer face-matching technology correctly identifies a top match, at least one correct match in the top five and at least one in the top 10 more than expected by chance (P syndromes except Kabuki syndrome. Although the accuracy of the computer face-matching technology was tested on images of individuals with known syndromic forms of intellectual disability, the results of this pilot study illustrate the potential utility of face-matching technology within deep phenotyping platforms to facilitate the interpretation of DNA sequencing data for individuals who remain undiagnosed despite testing the known developmental disorder genes.

  12. Genomics of Salmonella Species

    Science.gov (United States)

    Canals, Rocio; McClelland, Michael; Santiviago, Carlos A.; Andrews-Polymenis, Helene

    Progress in the study of Salmonella survival, colonization, and virulence has increased rapidly with the advent of complete genome sequencing and higher capacity assays for transcriptomic and proteomic analysis. Although many of these techniques have yet to be used to directly assay Salmonella growth on foods, these assays are currently in use to determine Salmonella factors necessary for growth in animal models including livestock animals and in in vitro conditions that mimic many different environments. As sequencing of the Salmonella genome and microarray analysis have revolutionized genomics and transcriptomics of salmonellae over the last decade, so are new high-throughput sequencing technologies currently accelerating the pace of our studies and allowing us to approach complex problems that were not previously experimentally tractable.

  13. Chemical biology on the genome.

    Science.gov (United States)

    Balasubramanian, Shankar

    2014-08-15

    In this article I discuss studies towards understanding the structure and function of DNA in the context of genomes from the perspective of a chemist. The first area I describe concerns the studies that led to the invention and subsequent development of a method for sequencing DNA on a genome scale at high speed and low cost, now known as Solexa/Illumina sequencing. The second theme will feature the four-stranded DNA structure known as a G-quadruplex with a focus on its fundamental properties, its presence in cellular genomic DNA and the prospects for targeting such a structure in cels with small molecules. The final topic for discussion is naturally occurring chemically modified DNA bases with an emphasis on chemistry for decoding (or sequencing) such modifications in genomic DNA. The genome is a fruitful topic to be further elucidated by the creation and application of chemical approaches. Copyright © 2014 Elsevier Ltd. All rights reserved.

  14. CoGI: Towards Compressing Genomes as an Image.

    Science.gov (United States)

    Xie, Xiaojing; Zhou, Shuigeng; Guan, Jihong

    2015-01-01

    Genomic science is now facing an explosive increase of data thanks to the fast development of sequencing technology. This situation poses serious challenges to genomic data storage and transferring. It is desirable to compress data to reduce storage and transferring cost, and thus to boost data distribution and utilization efficiency. Up to now, a number of algorithms / tools have been developed for compressing genomic sequences. Unlike the existing algorithms, most of which treat genomes as one-dimensional text strings and compress them based on dictionaries or probability models, this paper proposes a novel approach called CoGI (the abbreviation of Compressing Genomes as an Image) for genome compression, which transforms the genomic sequences to a two-dimensional binary image (or bitmap), then applies a rectangular partition coding algorithm to compress the binary image. CoGI can be used as either a reference-based compressor or a reference-free compressor. For the former, we develop two entropy-based algorithms to select a proper reference genome. Performance evaluation is conducted on various genomes. Experimental results show that the reference-based CoGI significantly outperforms two state-of-the-art reference-based genome compressors GReEn and RLZ-opt in both compression ratio and compression efficiency. It also achieves comparable compression ratio but two orders of magnitude higher compression efficiency in comparison with XM--one state-of-the-art reference-free genome compressor. Furthermore, our approach performs much better than Gzip--a general-purpose and widely-used compressor, in both compression speed and compression ratio. So, CoGI can serve as an effective and practical genome compressor. The source code and other related documents of CoGI are available at: http://admis.fudan.edu.cn/projects/cogi.htm.

  15. Annotation of the Clostridium Acetobutylicum Genome

    Energy Technology Data Exchange (ETDEWEB)

    Daly, M. J.

    2004-06-09

    The genome sequence of the solvent producing bacterium Clostridium acetobutylicum ATCC824, has been determined by the shotgun approach. The genome consists of a 3.94 Mb chromosome and a 192 kb megaplasmid that contains the majority of genes responsible for solvent production. Comparison of C. acetobutylicum to Bacillus subtilis reveals significant local conservation of gene order, which has not been seen in comparisons of other genomes with similar, or, in some cases, closer, phylogenetic proximity. This conservation allows the prediction of many previously undetected operons in both bacteria.

  16. Fingerprint matching algorithm for poor quality images

    Directory of Open Access Journals (Sweden)

    Vedpal Singh

    2015-04-01

    Full Text Available The main aim of this study is to establish an efficient platform for fingerprint matching for low-quality images. Generally, fingerprint matching approaches use the minutiae points for authentication. However, it is not such a reliable authentication method for low-quality images. To overcome this problem, the current study proposes a fingerprint matching methodology based on normalised cross-correlation, which would improve the performance and reduce the miscalculations during authentication. It would decrease the computational complexities. The error rate of the proposed method is 5.4%, which is less than the two-dimensional (2D dynamic programming (DP error rate of 5.6%, while Lee's method produces 5.9% and the combined method has 6.1% error rate. Genuine accept rate at 1% false accept rate is 89.3% but at 0.1% value it is 96.7%, which is higher. The outcome of this study suggests that the proposed methodology has a low error rate with minimum computational effort as compared with existing methods such as Lee's method and 2D DP and the combined method.

  17. Origins of the Human Genome Project.

    Science.gov (United States)

    Watson, J D; Cook-Deegan, R M

    1991-01-01

    The Human Genome Project has become a reality. Building on a debate that dates back to 1985, several genome projects are now in full stride around the world, and more are likely to form in the next several years. Italy began its genome program in 1987, and the United Kingdom and U.S.S.R. in 1988. The European communities mounted several genome projects on yeast, bacteria, Drosophila, and Arabidospis thaliana (a rapidly growing plant with a small genome) in 1988, and in 1990 commenced a new 2-year program on the human genome. In the United States, we have completed the first year of operation of the National Center for Human Genome Research at the National Institutes of Health (NIH), now the largest single funding source for genome research in the world. There have been dedicated budgets focused on genome-scale research at NIH, the U.S. Department of Energy, and the Howard Hughes Medical Institute for several years, and results are beginning to accumulate. There were three annual meetings on genome mapping and sequencing at Cold Spring Harbor, New York, in the spring of 1988, 1989, and 1990; the talks have shifted from a discussion about how to approach problems to presenting results from experiments already performed. We have finally begun to work rather than merely talk. The purpose of genome projects is to assemble data on the structure of DNA in human chromosomes and those of other organisms. A second goal is to develop new technologies to perform mapping and sequencing. There have been impressive technical advances in the past 5 years since the debate about the human genome project began. We are on the verge of beginning pilot projects to test several approaches to sequencing long stretches of DNA, using both automation and manual methods. Ordered sets of yeast artificial chromosome and cosmid clones have been assembled to span more than 2 million base pairs of several human chromosomes, and a region of 10 million base pairs has been assembled for

  18. The Global Invertebrate Genomics Alliance (GIGA): Developing Community Resources to Study Diverse Invertebrate Genomes

    KAUST Repository

    Bracken-Grissom, Heather

    2013-12-12

    Over 95% of all metazoan (animal) species comprise the invertebrates, but very few genomes from these organisms have been sequenced. We have, therefore, formed a Global Invertebrate Genomics Alliance (GIGA). Our intent is to build a collaborative network of diverse scientists to tackle major challenges (e.g., species selection, sample collection and storage, sequence assembly, annotation, analytical tools) associated with genome/transcriptome sequencing across a large taxonomic spectrum. We aim to promote standards that will facilitate comparative approaches to invertebrate genomics and collaborations across the international scientific community. Candidate study taxa include species from Porifera, Ctenophora, Cnidaria, Placozoa, Mollusca, Arthropoda, Echinodermata, Annelida, Bryozoa, and Platyhelminthes, among others. GIGA will target 7000 noninsect/nonnematode species, with an emphasis on marine taxa because of the unrivaled phyletic diversity in the oceans. Priorities for selecting invertebrates for sequencing will include, but are not restricted to, their phylogenetic placement; relevance to organismal, ecological, and conservation research; and their importance to fisheries and human health. We highlight benefits of sequencing both whole genomes (DNA) and transcriptomes and also suggest policies for genomic-level data access and sharing based on transparency and inclusiveness. The GIGA Web site () has been launched to facilitate this collaborative venture.

  19. Matching factorization theorems with an inverse-error weighting

    Science.gov (United States)

    Echevarria, Miguel G.; Kasemets, Tomas; Lansberg, Jean-Philippe; Pisano, Cristian; Signori, Andrea

    2018-06-01

    We propose a new fast method to match factorization theorems applicable in different kinematical regions, such as the transverse-momentum-dependent and the collinear factorization theorems in Quantum Chromodynamics. At variance with well-known approaches relying on their simple addition and subsequent subtraction of double-counted contributions, ours simply builds on their weighting using the theory uncertainties deduced from the factorization theorems themselves. This allows us to estimate the unknown complete matched cross section from an inverse-error-weighted average. The method is simple and provides an evaluation of the theoretical uncertainty of the matched cross section associated with the uncertainties from the power corrections to the factorization theorems (additional uncertainties, such as the nonperturbative ones, should be added for a proper comparison with experimental data). Its usage is illustrated with several basic examples, such as Z boson, W boson, H0 boson and Drell-Yan lepton-pair production in hadronic collisions, and compared to the state-of-the-art Collins-Soper-Sterman subtraction scheme. It is also not limited to the transverse-momentum spectrum, and can straightforwardly be extended to match any (un)polarized cross section differential in other variables, including multi-differential measurements.

  20. Laser Stimulated Genomic Exchange in Stem Cells. Laser Non-cloning Techniques

    Science.gov (United States)

    Stefan, V. Alexander

    2012-02-01

    I propose a novel technique for a pluripotent stem cell generation. Genomic exchange is stimulated by the beat-wave free electron laser, (B-W FEL), frequency matching with the frequencies of the DNAootnotetextJ.D. Watson and F. H. C. Crick, Nature, 171, 737-738 (1953). eigen-oscillations. B-W FEL-1ootnotetextV. Stefan, B.I.Cohen, C. Joshi Science, 243,4890, (Jan 27,1989); Stefan, et al., Bull. APS. 32, No. 9, 1713 (1987); Stefan, APS March-2011, #S1.143; APS- March-2009, #K1.276. scans entire stem cell; B-W FEL-2 probes the chromosomes. The scanning and probing lasers: 300-500nm and 100-300nm, respectively; irradiances: the order-of-10s mW/cm^2 (above the threshold value for a particular gene structure); repetition rate of few-100s Hz. A variety of genetic-matching conditions can be arranged. Genomic glitches, (the cell nucleus transferootnotetextScott Noggle et al. Nature, 478, 70-75 (06 October 2011).), can be hedged by the use of lasers.

  1. GeNemo: a search engine for web-based functional genomic data.

    Science.gov (United States)

    Zhang, Yongqing; Cao, Xiaoyi; Zhong, Sheng

    2016-07-08

    A set of new data types emerged from functional genomic assays, including ChIP-seq, DNase-seq, FAIRE-seq and others. The results are typically stored as genome-wide intensities (WIG/bigWig files) or functional genomic regions (peak/BED files). These data types present new challenges to big data science. Here, we present GeNemo, a web-based search engine for functional genomic data. GeNemo searches user-input data against online functional genomic datasets, including the entire collection of ENCODE and mouse ENCODE datasets. Unlike text-based search engines, GeNemo's searches are based on pattern matching of functional genomic regions. This distinguishes GeNemo from text or DNA sequence searches. The user can input any complete or partial functional genomic dataset, for example, a binding intensity file (bigWig) or a peak file. GeNemo reports any genomic regions, ranging from hundred bases to hundred thousand bases, from any of the online ENCODE datasets that share similar functional (binding, modification, accessibility) patterns. This is enabled by a Markov Chain Monte Carlo-based maximization process, executed on up to 24 parallel computing threads. By clicking on a search result, the user can visually compare her/his data with the found datasets and navigate the identified genomic regions. GeNemo is available at www.genemo.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. In Depth Characterization of Repetitive DNA in 23 Plant Genomes Reveals Sources of Genome Size Variation in the Legume Tribe Fabeae.

    Science.gov (United States)

    Macas, Jiří; Novák, Petr; Pellicer, Jaume; Čížková, Jana; Koblížková, Andrea; Neumann, Pavel; Fuková, Iva; Doležel, Jaroslav; Kelly, Laura J; Leitch, Ilia J

    2015-01-01

    The differential accumulation and elimination of repetitive DNA are key drivers of genome size variation in flowering plants, yet there have been few studies which have analysed how different types of repeats in related species contribute to genome size evolution within a phylogenetic context. This question is addressed here by conducting large-scale comparative analysis of repeats in 23 species from four genera of the monophyletic legume tribe Fabeae, representing a 7.6-fold variation in genome size. Phylogenetic analysis and genome size reconstruction revealed that this diversity arose from genome size expansions and contractions in different lineages during the evolution of Fabeae. Employing a combination of low-pass genome sequencing with novel bioinformatic approaches resulted in identification and quantification of repeats making up 55-83% of the investigated genomes. In turn, this enabled an analysis of how each major repeat type contributed to the genome size variation encountered. Differential accumulation of repetitive DNA was found to account for 85% of the genome size differences between the species, and most (57%) of this variation was found to be driven by a single lineage of Ty3/gypsy LTR-retrotransposons, the Ogre elements. Although the amounts of several other lineages of LTR-retrotransposons and the total amount of satellite DNA were also positively correlated with genome size, their contributions to genome size variation were much smaller (up to 6%). Repeat analysis within a phylogenetic framework also revealed profound differences in the extent of sequence conservation between different repeat types across Fabeae. In addition to these findings, the study has provided a proof of concept for the approach combining recent developments in sequencing and bioinformatics to perform comparative analyses of repetitive DNAs in a large number of non-model species without the need to assemble their genomes.

  3. In Depth Characterization of Repetitive DNA in 23 Plant Genomes Reveals Sources of Genome Size Variation in the Legume Tribe Fabeae.

    Directory of Open Access Journals (Sweden)

    Jiří Macas

    Full Text Available The differential accumulation and elimination of repetitive DNA are key drivers of genome size variation in flowering plants, yet there have been few studies which have analysed how different types of repeats in related species contribute to genome size evolution within a phylogenetic context. This question is addressed here by conducting large-scale comparative analysis of repeats in 23 species from four genera of the monophyletic legume tribe Fabeae, representing a 7.6-fold variation in genome size. Phylogenetic analysis and genome size reconstruction revealed that this diversity arose from genome size expansions and contractions in different lineages during the evolution of Fabeae. Employing a combination of low-pass genome sequencing with novel bioinformatic approaches resulted in identification and quantification of repeats making up 55-83% of the investigated genomes. In turn, this enabled an analysis of how each major repeat type contributed to the genome size variation encountered. Differential accumulation of repetitive DNA was found to account for 85% of the genome size differences between the species, and most (57% of this variation was found to be driven by a single lineage of Ty3/gypsy LTR-retrotransposons, the Ogre elements. Although the amounts of several other lineages of LTR-retrotransposons and the total amount of satellite DNA were also positively correlated with genome size, their contributions to genome size variation were much smaller (up to 6%. Repeat analysis within a phylogenetic framework also revealed profound differences in the extent of sequence conservation between different repeat types across Fabeae. In addition to these findings, the study has provided a proof of concept for the approach combining recent developments in sequencing and bioinformatics to perform comparative analyses of repetitive DNAs in a large number of non-model species without the need to assemble their genomes.

  4. Identifying Loci for the Overlap between Attention-Deficit/Hyperactivity Disorder and Autism Spectrum Disorder Using a Genome-Wide QTL Linkage Approach

    Science.gov (United States)

    Nijmeijer, Judith S.; Arias-Vasquez, Alejandro; Rommelse, Nanda N. J.; Altink, Marieke E.; Anney, Richard J. L.; Asherson, Philip; Banaschewski, Tobias; Buschgens, Cathelijne J. M.; Fliers, Ellen A.; Gill, Michael; Minderaa, Ruud B.; Poustka, Luise; Sergeant, Joseph A.; Buitelaar, Jan K.; Franke, Barbara; Ebstein, Richard P.; Miranda, Ana; Mulas, Fernando; Oades, Robert D.; Roeyers, Herbert; Rothenberger, Aribert; Sonuga-Barke, Edmund J. S.; Steinhausen, Hans-Christoph; Faraone, Stephen V.; Hartman, Catharina A.; Hoekstra, Pieter J.

    2010-01-01

    Objective: The genetic basis for autism spectrum disorder (ASD) symptoms in children with attention-deficit/hyperactivity disorder (ADHD) was addressed using a genome-wide linkage approach. Method: Participants of the International Multi-Center ADHD Genetics study comprising 1,143 probands with ADHD and 1,453 siblings were analyzed. The total and…

  5. Big Data Analytics for Genomic Medicine

    Science.gov (United States)

    He, Karen Y.; Ge, Dongliang; He, Max M.

    2017-01-01

    Genomic medicine attempts to build individualized strategies for diagnostic or therapeutic decision-making by utilizing patients’ genomic information. Big Data analytics uncovers hidden patterns, unknown correlations, and other insights through examining large-scale various data sets. While integration and manipulation of diverse genomic data and comprehensive electronic health records (EHRs) on a Big Data infrastructure exhibit challenges, they also provide a feasible opportunity to develop an efficient and effective approach to identify clinically actionable genetic variants for individualized diagnosis and therapy. In this paper, we review the challenges of manipulating large-scale next-generation sequencing (NGS) data and diverse clinical data derived from the EHRs for genomic medicine. We introduce possible solutions for different challenges in manipulating, managing, and analyzing genomic and clinical data to implement genomic medicine. Additionally, we also present a practical Big Data toolset for identifying clinically actionable genetic variants using high-throughput NGS data and EHRs. PMID:28212287

  6. Functional assessment of human enhancer activities using whole-genome STARR-sequencing.

    Science.gov (United States)

    Liu, Yuwen; Yu, Shan; Dhiman, Vineet K; Brunetti, Tonya; Eckart, Heather; White, Kevin P

    2017-11-20

    Genome-wide quantification of enhancer activity in the human genome has proven to be a challenging problem. Recent efforts have led to the development of powerful tools for enhancer quantification. However, because of genome size and complexity, these tools have yet to be applied to the whole human genome.  In the current study, we use a human prostate cancer cell line, LNCaP as a model to perform whole human genome STARR-seq (WHG-STARR-seq) to reliably obtain an assessment of enhancer activity. This approach builds upon previously developed STARR-seq in the fly genome and CapSTARR-seq techniques in targeted human genomic regions. With an improved library preparation strategy, our approach greatly increases the library complexity per unit of starting material, which makes it feasible and cost-effective to explore the landscape of regulatory activity in the much larger human genome. In addition to our ability to identify active, accessible enhancers located in open chromatin regions, we can also detect sequences with the potential for enhancer activity that are located in inaccessible, closed chromatin regions. When treated with the histone deacetylase inhibitor, Trichostatin A, genes nearby this latter class of enhancers are up-regulated, demonstrating the potential for endogenous functionality of these regulatory elements. WHG-STARR-seq provides an improved approach to current pipelines for analysis of high complexity genomes to gain a better understanding of the intricacies of transcriptional regulation.

  7. Best matching theory & applications

    CERN Document Server

    Moghaddam, Mohsen

    2017-01-01

    Mismatch or best match? This book demonstrates that best matching of individual entities to each other is essential to ensure smooth conduct and successful competitiveness in any distributed system, natural and artificial. Interactions must be optimized through best matching in planning and scheduling, enterprise network design, transportation and construction planning, recruitment, problem solving, selective assembly, team formation, sensor network design, and more. Fundamentals of best matching in distributed and collaborative systems are explained by providing: § Methodical analysis of various multidimensional best matching processes § Comprehensive taxonomy, comparing different best matching problems and processes § Systematic identification of systems’ hierarchy, nature of interactions, and distribution of decision-making and control functions § Practical formulation of solutions based on a library of best matching algorithms and protocols, ready for direct applications and apps development. Design...

  8. GENOME-BASED MODELING AND DESIGN OF METABOLIC INTERACTIONS IN MICROBIAL COMMUNITIES

    Directory of Open Access Journals (Sweden)

    Radhakrishnan Mahadevan

    2012-10-01

    With the advent of genome sequencing, omics technologies, bioinformatics and genome-scale modeling, researchers now have unprecedented capabilities to analyze and engineer the metabolism of microbial communities. The goal of this review is to summarize recent applications of genome-scale metabolic modeling to microbial communities. A brief introduction to lumped community models is used to motivate the need for genome-level descriptions of individual species and their metabolic interactions. The review of genome-scale models begins with static modeling approaches, which are appropriate for communities where the extracellular environment can be assumed to be time invariant or slowly varying. Dynamic extensions of the static modeling approach are described, and then applications of genome-scale models for design of synthetic microbial communities are reviewed. The review concludes with a summary of metagenomic tools for analyzing community metabolism and an outlook for future research.

  9. Coincident site lattice-matched InGaN on (111) spinel substrates

    International Nuclear Information System (INIS)

    Norman, A. G.; Dippo, P. C.; Moutinho, H. R.; Simon, J.; Ptak, A. J.

    2012-01-01

    Coincident site lattice-matched wurtzite (0001) In 0.31 Ga 0.69 N, emitting in the important green wavelength region, is demonstrated by molecular beam epitaxy on a cubic (111) MgAl 2 O 4 spinel substrate. The coincident site lattice matching condition involves a 30 deg. rotation between the lattice of the InGaN epitaxial layer and the lattice of the spinel. This work describes an alternative approach towards realizing more compositionally homogenous InGaN films with low dislocation density emitting in the ''green gap'' of low efficiency currently observed for semiconductor light emitting diodes (LEDs). This approach could lead to higher efficiency green LEDs presently of great interest for solid-state lighting applications.

  10. Comparative genomics of the Bifidobacterium breve taxon.

    Science.gov (United States)

    Bottacini, Francesca; O'Connell Motherway, Mary; Kuczynski, Justin; O'Connell, Kerry Joan; Serafini, Fausta; Duranti, Sabrina; Milani, Christian; Turroni, Francesca; Lugli, Gabriele Andrea; Zomer, Aldert; Zhurina, Daria; Riedel, Christian; Ventura, Marco; van Sinderen, Douwe

    2014-03-01

    Bifidobacteria are commonly found as part of the microbiota of the gastrointestinal tract (GIT) of a broad range of hosts, where their presence is positively correlated with the host's health status. In this study, we assessed the genomes of thirteen representatives of Bifidobacterium breve, which is not only a frequently encountered component of the (adult and infant) human gut microbiota, but can also be isolated from human milk and vagina. In silico analysis of genome sequences from thirteen B. breve strains isolated from different environments (infant and adult faeces, human milk, human vagina) shows that the genetic variability of this species principally consists of hypothetical genes and mobile elements, but, interestingly, also genes correlated with the adaptation to host environment and gut colonization. These latter genes specify the biosynthetic machinery for sortase-dependent pili and exopolysaccharide production, as well as genes that provide protection against invasion of foreign DNA (i.e. CRISPR loci and restriction/modification systems), and genes that encode enzymes responsible for carbohydrate fermentation. Gene-trait matching analysis showed clear correlations between known metabolic capabilities and characterized genes, and it also allowed the identification of a gene cluster involved in the utilization of the alcohol-sugar sorbitol. Genome analysis of thirteen representatives of the B. breve species revealed that the deduced pan-genome exhibits an essentially close trend. For this reason our analyses suggest that this number of B. breve representatives is sufficient to fully describe the pan-genome of this species. Comparative genomics also facilitated the genetic explanation for differential carbon source utilization phenotypes previously observed in different strains of B. breve.

  11. Acquiring Reference Genomes from Uncultured Microbes by Micromanipulation and Low-complexity Metagenomics

    DEFF Research Database (Denmark)

    Karst, Søren Michael; Albertsen, Mads; Nielsen, Jeppe Lund

    A pre-requisite for many of the –omics approaches applied in environmental microbiology today are high quality reference genomes. Until recently such genomes have been difficult to obtain from unculturable, complex microbial communities. However, lately the ‘single cell genomics’ approach based...... on isolation and amplification of genomic DNA from a single or few clonal cells has proven efficient for this purpose although very tedious. The aim of this study was to apply the methodology of single cell genomics to filamentous organisms and microcolonies of specific species from microbial communities...

  12. Genome-Wide Distribution, Organisation and Functional Characterization of Disease Resistance and Defence Response Genes across Rice Species

    Science.gov (United States)

    Singh, Sangeeta; Chand, Suresh; Singh, N. K.; Sharma, Tilak Raj

    2015-01-01

    The resistance (R) genes and defense response (DR) genes have become very important resources for the development of disease resistant cultivars. In the present investigation, genome-wide identification, expression, phylogenetic and synteny analysis was done for R and DR-genes across three species of rice viz: Oryza sativa ssp indica cv 93-11, Oryza sativa ssp japonica and wild rice species, Oryza brachyantha. We used the in silico approach to identify and map 786 R -genes and 167 DR-genes, 672 R-genes and 142 DR-genes, 251 R-genes and 86 DR-genes in the japonica, indica and O. brachyanth a genomes, respectively. Our analysis showed that 60.5% and 55.6% of the R-genes are tandemly repeated within clusters and distributed over all the rice chromosomes in indica and japonica genomes, respectively. The phylogenetic analysis along with motif distribution shows high degree of conservation of R- and DR-genes in clusters. In silico expression analysis of R-genes and DR-genes showed more than 85% were expressed genes showing corresponding EST matches in the databases. This study gave special emphasis on mechanisms of gene evolution and duplication for R and DR genes across species. Analysis of paralogs across rice species indicated 17% and 4.38% R-genes, 29% and 11.63% DR-genes duplication in indica and Oryza brachyantha, as compared to 20% and 26% duplication of R-genes and DR-genes in japonica respectively. We found that during the course of duplication only 9.5% of R- and DR-genes changed their function and rest of the genes have maintained their identity. Syntenic relationship across three genomes inferred that more orthology is shared between indica and japonica genomes as compared to brachyantha genome. Genome wide identification of R-genes and DR-genes in the rice genome will help in allele mining and functional validation of these genes, and to understand molecular mechanism of disease resistance and their evolution in rice and related species. PMID:25902056

  13. A Semantic Analysis of XML Schema Matching for B2B Systems Integration

    Science.gov (United States)

    Kim, Jaewook

    2011-01-01

    One of the most critical steps to integrating heterogeneous e-Business applications using different XML schemas is schema matching, which is known to be costly and error-prone. Many automatic schema matching approaches have been proposed, but the challenge is still daunting because of the complexity of schemas and immaturity of technologies in…

  14. PIV uncertainty quantification by image matching

    International Nuclear Information System (INIS)

    Sciacchitano, Andrea; Scarano, Fulvio; Wieneke, Bernhard

    2013-01-01

    A novel method is presented to quantify the uncertainty of PIV data. The approach is a posteriori, i.e. the unknown actual error of the measured velocity field is estimated using the velocity field itself as input along with the original images. The principle of the method relies on the concept of super-resolution: the image pair is matched according to the cross-correlation analysis and the residual distance between matched particle image pairs (particle disparity vector) due to incomplete match between the two exposures is measured. The ensemble of disparity vectors within the interrogation window is analyzed statistically. The dispersion of the disparity vector returns the estimate of the random error, whereas the mean value of the disparity indicates the occurrence of a systematic error. The validity of the working principle is first demonstrated via Monte Carlo simulations. Two different interrogation algorithms are considered, namely the cross-correlation with discrete window offset and the multi-pass with window deformation. In the simulated recordings, the effects of particle image displacement, its gradient, out-of-plane motion, seeding density and particle image diameter are considered. In all cases good agreement is retrieved, indicating that the error estimator is able to follow the trend of the actual error with satisfactory precision. Experiments where time-resolved PIV data are available are used to prove the concept under realistic measurement conditions. In this case the ‘exact’ velocity field is unknown; however a high accuracy estimate is obtained with an advanced interrogation algorithm that exploits the redundant information of highly temporally oversampled data (pyramid correlation, Sciacchitano et al (2012 Exp. Fluids 53 1087–105)). The image-matching estimator returns the instantaneous distribution of the estimated velocity measurement error. The spatial distribution compares very well with that of the actual error with maxima in the

  15. Exploration of plant genomes in the FLAGdb++ environment

    Directory of Open Access Journals (Sweden)

    Leplé Jean-Charles

    2011-03-01

    Full Text Available Abstract Background In the contexts of genomics, post-genomics and systems biology approaches, data integration presents a major concern. Databases provide crucial solutions: they store, organize and allow information to be queried, they enhance the visibility of newly produced data by comparing them with previously published results, and facilitate the exploration and development of both existing hypotheses and new ideas. Results The FLAGdb++ information system was developed with the aim of using whole plant genomes as physical references in order to gather and merge available genomic data from in silico or experimental approaches. Available through a JAVA application, original interfaces and tools assist the functional study of plant genes by considering them in their specific context: chromosome, gene family, orthology group, co-expression cluster and functional network. FLAGdb++ is mainly dedicated to the exploration of large gene groups in order to decipher functional connections, to highlight shared or specific structural or functional features, and to facilitate translational tasks between plant species (Arabidopsis thaliana, Oryza sativa, Populus trichocarpa and Vitis vinifera. Conclusion Combining original data with the output of experts and graphical displays that differ from classical plant genome browsers, FLAGdb++ presents a powerful complementary tool for exploring plant genomes and exploiting structural and functional resources, without the need for computer programming knowledge. First launched in 2002, a 15th version of FLAGdb++ is now available and comprises four model plant genomes and over eight million genomic features.

  16. Assembling networks of microbial genomes using linear programming.

    Science.gov (United States)

    Holloway, Catherine; Beiko, Robert G

    2010-11-20

    Microbial genomes exhibit complex sets of genetic affinities due to lateral genetic transfer. Assessing the relative contributions of parent-to-offspring inheritance and gene sharing is a vital step in understanding the evolutionary origins and modern-day function of an organism, but recovering and showing these relationships is a challenging problem. We have developed a new approach that uses linear programming to find between-genome relationships, by treating tables of genetic affinities (here, represented by transformed BLAST e-values) as an optimization problem. Validation trials on simulated data demonstrate the effectiveness of the approach in recovering and representing vertical and lateral relationships among genomes. Application of the technique to a set comprising Aquifex aeolicus and 75 other thermophiles showed an important role for large genomes as 'hubs' in the gene sharing network, and suggested that genes are preferentially shared between organisms with similar optimal growth temperatures. We were also able to discover distinct and common genetic contributors to each sequenced representative of genus Pseudomonas. The linear programming approach we have developed can serve as an effective inference tool in its own right, and can be an efficient first step in a more-intensive phylogenomic analysis.

  17. Open Access Data Sharing in Genomic Research

    Directory of Open Access Journals (Sweden)

    Stacey Pereira

    2014-08-01

    Full Text Available The current emphasis on broad sharing of human genomic data generated in research in order to maximize utility and public benefit is a significant legacy of the Human Genome Project. Concerns about privacy and discrimination have led to policy responses that restrict access to genomic data as the means for protecting research participants. Our research and experience show, however, that a considerable number of research participants agree to open access sharing of their genomic data when given the choice. General policies that limit access to all genomic data fail to respect the autonomy of these participants and, at the same time, unnecessarily limit the utility of the data. We advocate instead a more balanced approach that allows for individual choice and encourages informed decision making, while protecting against the misuse of genomic data through enhanced legislation.

  18. Analysis of the genome sequence of Phomopsis longicolla: a fungal pathogen causing Phomopsis seed decay in soybean.

    Science.gov (United States)

    Li, Shuxian; Darwish, Omar; Alkharouf, Nadim W; Musungu, Bryan; Matthews, Benjamin F

    2017-09-05

    Phomopsis longicolla T. W. Hobbs (syn. Diaporthe longicolla) is a seed-borne fungus causing Phomopsis seed decay in soybean. This disease is one of the most devastating diseases reducing soybean seed quality worldwide. To facilitate investigation of the genomic basis of pathogenicity and to understand the mechanism of the disease development, the genome of an isolate, MSPL10-6, from Mississippi, USA was sequenced, de novo assembled, and analyzed. The genome of MSPL 10-6 was estimated to be approximately 62 Mb in size with an overall G + C content of 48.6%. Of 16,597 predicted genes, 9866 genes (59.45%) had significant matches to genes in the NCBI nr database, while 18.01% of them did not link to any gene ontology classification, and 9.64% of genes did not significantly match any known genes. Analysis of the 1221 putative genes that encoded carbohydrate-activated enzymes (CAZys) indicated that 715 genes belong to three classes of CAZy that have a direct role in degrading plant cell walls. A novel fungal ulvan lyase (PL24; EC 4.2.2.-) was identified. Approximately 12.7% of the P. longicolla genome consists of repetitive elements. A total of 510 potentially horizontally transferred genes were identified. They appeared to originate from 22 other fungi, 26 eubacteria and 5 archaebacteria. The genome of the P. longicolla isolate MSPL10-6 represented the first reported genome sequence in the fungal Diaporthe-Phomopsis complex causing soybean diseases. The genome contained a number of Pfams not described previously. Information obtained from this study enhances our knowledge about this seed-borne pathogen and will facilitate further research on the genomic basis and pathogenicity mechanism of P. longicolla and aids in development of improved strategies for efficient management of Phomopsis seed decay in soybean.

  19. Recurrent DNA inversion rearrangements in the human genome

    DEFF Research Database (Denmark)

    Flores, Margarita; Morales, Lucía; Gonzaga-Jauregui, Claudia

    2007-01-01

    Several lines of evidence suggest that reiterated sequences in the human genome are targets for nonallelic homologous recombination (NAHR), which facilitates genomic rearrangements. We have used a PCR-based approach to identify breakpoint regions of rearranged structures in the human genome...... to human genomic variation is discussed........ In particular, we have identified intrachromosomal identical repeats that are located in reverse orientation, which may lead to chromosomal inversions. A bioinformatic workflow pathway to select appropriate regions for analysis was developed. Three such regions overlapping with known human genes, located...

  20. The cacao Criollo genome v2.0: an improved version of the genome for genetic and functional genomic studies.

    Science.gov (United States)

    Argout, X; Martin, G; Droc, G; Fouet, O; Labadie, K; Rivals, E; Aury, J M; Lanaud, C

    2017-09-15

    Theobroma cacao L., native to the Amazonian basin of South America, is an economically important fruit tree crop for tropical countries as a source of chocolate. The first draft genome of the species, from a Criollo cultivar, was published in 2011. Although a useful resource, some improvements are possible, including identifying misassemblies, reducing the number of scaffolds and gaps, and anchoring un-anchored sequences to the 10 chromosomes. We used a NGS-based approach to significantly improve the assembly of the Belizian Criollo B97-61/B2 genome. We combined four Illumina large insert size mate paired libraries with 52x of Pacific Biosciences long reads to correct misassembled regions and reduced the number of scaffolds. We then used genotyping by sequencing (GBS) methods to increase the proportion of the assembly anchored to chromosomes. The scaffold number decreased from 4,792 in assembly V1 to 554 in V2 while the scaffold N50 size has increased from 0.47 Mb in V1 to 6.5 Mb in V2. A total of 96.7% of the assembly was anchored to the 10 chromosomes compared to 66.8% in the previous version. Unknown sites (Ns) were reduced from 10.8% to 5.7%. In addition, we updated the functional annotations and performed a new RefSeq structural annotation based on RNAseq evidence. Theobroma cacao Criollo genome version 2 will be a valuable resource for the investigation of complex traits at the genomic level and for future comparative genomics and genetics studies in cacao tree. New functional tools and annotations are available on the Cocoa Genome Hub ( http://cocoa-genome-hub.southgreen.fr ).